Quicksort vs heapsort

96,012

Solution 1

This paper has some analysis.

Also, from Wikipedia:

The most direct competitor of quicksort is heapsort. Heapsort is typically somewhat slower than quicksort, but the worst-case running time is always Θ(nlogn). Quicksort is usually faster, though there remains the chance of worst case performance except in the introsort variant, which switches to heapsort when a bad case is detected. If it is known in advance that heapsort is going to be necessary, using it directly will be faster than waiting for introsort to switch to it.

Solution 2

Heapsort is O(N log N) guaranted, what is much better than worst case in Quicksort. Heapsort doesn't need more memory for another array to putting ordered data as is needed by Mergesort. So why do comercial applications stick with Quicksort? What Quicksort has that is so special over others implementations?

I've tested the algorithms myself and I've seen that Quicksort has something special indeed. It runs fast, much faster than Heap and Merge algorithms.

The secret of Quicksort is: It almost doesn't do unnecessary element swaps. Swap is time consuming.

With Heapsort, even if all of your data is already ordered, you are going to swap 100% of elements to order the array.

With Mergesort, it's even worse. You are going to write 100% of elements in another array and write it back in the original one, even if data is already ordered.

With Quicksort you don't swap what is already ordered. If your data is completely ordered, you swap almost nothing! Although there is a lot of fussing about worst case, a little improvement on the choice of pivot, any other than getting the first or last element of array, can avoid it. If you get a pivot from the intermediate element between first, last and middle element, it is suficient to avoid worst case.

What is superior in Quicksort is not the worst case, but the best case! In best case you do the same number of comparisons, ok, but you swap almost nothing. In average case you swap part of the elements, but not all elements, as in Heapsort and Mergesort. That is what gives Quicksort the best time. Less swap, more speed.

The implementation below in C# on my computer, running on release mode, beats Array.Sort by 3 seconds with middle pivot and by 2 seconds with improved pivot (yes, there is an overhead to get a good pivot).

static void Main(string[] args)
{
    int[] arrToSort = new int[100000000];
    var r = new Random();
    for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);

    Console.WriteLine("Press q to quick sort, s to Array.Sort");
    while (true)
    {
        var k = Console.ReadKey(true);
        if (k.KeyChar == 'q')
        {
            // quick sort
            Console.WriteLine("Beg quick sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            QuickSort(arrToSort, 0, arrToSort.Length - 1);
            Console.WriteLine("End quick sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);
        }
        else if (k.KeyChar == 's')
        {
            Console.WriteLine("Beg Array.Sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            Array.Sort(arrToSort);
            Console.WriteLine("End Array.Sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
            for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);
        }
    }
}

static public void QuickSort(int[] arr, int left, int right)
{
    int begin = left
        , end = right
        , pivot
        // get middle element pivot
        //= arr[(left + right) / 2]
        ;

    //improved pivot
    int middle = (left + right) / 2;
    int
        LM = arr[left].CompareTo(arr[middle])
        , MR = arr[middle].CompareTo(arr[right])
        , LR = arr[left].CompareTo(arr[right])
        ;
    if (-1 * LM == LR)
        pivot = arr[left];
    else
        if (MR == -1 * LR)
            pivot = arr[right];
        else
            pivot = arr[middle];
    do
    {
        while (arr[left] < pivot) left++;
        while (arr[right] > pivot) right--;

        if(left <= right)
        {
            int temp = arr[right];
            arr[right] = arr[left];
            arr[left] = temp;

            left++;
            right--;
        }
    } while (left <= right);

    if (left < end) QuickSort(arr, left, end);
    if (begin < right) QuickSort(arr, begin, right);
}

Solution 3

For most situations, having quick vs. a little quicker is irrelevant... you simply never want it to occasionally get waayyy slow. Although you can tweak QuickSort to avoid the way slow situations, you lose the elegance of the basic QuickSort. So, for most things, I actually prefer HeapSort... you can implement it in its full simple elegance, and never get a slow sort.

For situations where you DO want max speed in most cases, QuickSort may be preferred over HeapSort, but neither may be the right answer. For speed-critical situations, it is worth examining closely the details of the situation. For example, in some of my speed-critical code, it is very common that the data is already sorted or near-sorted (it is indexing multiple related fields that often either move up and down together OR move up and down opposite each other, so once you sort by one, the others are either sorted or reverse-sorted or close... either of which can kill QuickSort). For that case, I implemented neither... instead, I implemented Dijkstra's SmoothSort... a HeapSort variant that is O(N) when already sorted or near-sorted... it is not so elegant, not too easy to understand, but fast... read http://www.cs.utexas.edu/users/EWD/ewd07xx/EWD796a.PDF if you want something a bit more challenging to code.

Solution 4

Quicksort-Heapsort in-place hybrids are really interesting, too, since most of them only needs n*log n comparisons in the worst case (they are optimal with respect to the first term of the asymptotics, so they avoid the worst-case scenarios of Quicksort), O(log n) extra-space and they preserve at least "a half" of the good behaviour of Quicksort with respect to already-ordered set of data. An extremely interesting algorithm is presented by Dikert and Weiss in http://arxiv.org/pdf/1209.4214v1.pdf:

  • Select a pivot p as the median of a random sample of sqrt(n) elements (this can be done in at most 24 sqrt(n) comparisons through the algorithm of Tarjan&co, or 5 sqrt(n) comparisons through the much more convoluted spider-factory algorithm of Schonhage);
  • Partition your array in two parts as in the first step of Quicksort;
  • Heapify the smallest part and use O(log n) extra bits to encode a heap in which every left child has a value greater than its sibling;
  • Recursively extract the root of the heap, sift down the lacune left by the root until it reaches a leaf of the heap, then fill the lacune with an appropriate element took from the other part of the array;
  • Recur over the remaining non-ordered part of the array (if p is chosen as the exact median, there is no recursion at all).

Solution 5

Heapsort has the benefit of having a worst running case of O(n*log(n)) so in cases where quicksort is likely to be performing poorly (mostly sorted data sets generally) heapsort is much preferred.

Share:
96,012

Related videos on Youtube

avd
Author by

avd

Updated on September 20, 2020

Comments

  • avd
    avd over 3 years

    Both quicksort and heapsort do in-place sorting. Which is better? What are the applications and cases in which either is preferred?

  • Justin Peel
    Justin Peel about 14 years
    Quicksort only performs poorly on a mostly sorted data set if a poor pivot choosing method is chosen. Namely, the bad pivot choosing method would be to always choose the first or last element as the pivot. If a random pivot is chosen each time and a good method of handling repeated elements is used, the chance of a worst-case quicksort is very small.
  • zellio
    zellio about 14 years
    @Justin - That is very true, I was speaking on a naive implementation.
  • David Thornley
    David Thornley about 14 years
    @Justin: True, but the chance of a major slowdown is always there, however slight. For some applications, I might want to ensure O(n log n) behavior, even if it's slower.
  • Ami
    Ami about 10 years
    Never use bubble sort. If you reasonably think that your data will be sorted, then you can use insertion sort, or even test the data to see if they are sorted. Don't use bubblesort.
  • Kobor42
    Kobor42 about 10 years
    if you have a very large RANDOM data set, your best bet is quicksort. If partially ordered, then not, but if you start working with huge datasets you should know at least this much about them.
  • Femi
    Femi about 10 years
    Although merge sort can be implemented with in-place sorting, the implementation is complex. AFAIK, most merge sort implementations are not in-place, but they are stable.
  • Femi
    Femi about 10 years
    It might be important to note that in typical implementations, neither quicksort nor heapsort are stable sorts.
  • Higgs
    Higgs almost 9 years
    +1 for considerations on the no. of swap, read/write operations required for different sorting algorithms
  • MikeC
    MikeC about 8 years
    Quick Sort is not a stable sort. Beyond that, questions of this nature encourage opinion-based responses and could lead to edit wars and arguments. Questions calling for opinion-based responses are explicitly discouraged by the SO guidelines. Answerers should avoid the temptation to answer them even if they have significant experience and wisdom in the are. Either flag them for closing or wait for someone with enough reputation to flag and close them. This comment is not a reflection on your knowledge or the validity of your answer.
  • Antimony
    Antimony almost 8 years
    For any deterministic, constant time pivot selection strategy, you can find an array that produces the O(n^2) worst case. It's not enough to eliminate just the minimum. You have to reliably chose pivots that are within a certain pecrentile band.
  • Antimony
    Antimony almost 8 years
    You only need O(logn) memory on average. The recursion overhead is trivial, assuming you don't get unlucky with the pivots, in which case you have bigger problems to worry about.
  • Antimony
    Antimony almost 8 years
    Quick sort isn't stable either.
  • DU Jiaen
    DU Jiaen over 6 years
    @DVK, According to your link cs.auckland.ac.nz/~jmor159/PLDS210/qsort3.html, the heap sort takes 2,842 comparisons for n=100, but it takes 53,113 comparisons for n=500. And that implies the ratio between n=500 and n=100 is 18 times, and it is NOT matching the heap sort algorithm with O(N logN) complexity. I guess it's pretty likely that their heap sort implementation has some kind of bugs inside.
  • DVK
    DVK over 6 years
    @DUJiaen - remember that O() is about asymptotic behavior at large N and has a possible multiplier
  • DU Jiaen
    DU Jiaen over 6 years
    This is NOT related to the multiplier. If an algorithm has a complexity of O(N log N), it should follow a trend of Time(N) = C1 * N * log(N). And if you take Time(500)/Time(100), it's obvious that C1 will be disappeared and the result should be closed to (500 log500) / (100 log100) = 6.7 But from your link, it is 18, which is too much out the of scale.
  • Chris
    Chris over 6 years
    I'm curious if this is the exact code you ran for your simulations between your hand-coded quick sort and C# built-in Array.sort? I tested this code and in all my tests, at best the hand-coded quick sort was the same as Array.sort. One thing I controlled for in my testing of this was to make two identical copies of the random array. After all, a given randomization could potentially be more favorable (lean towards best case) than another randomization. So I ran the identical sets through each one. Array.sort tied or beat every time (release build btw).
  • Marquinho Peli
    Marquinho Peli over 6 years
    Ok, after almost 3 years we can expect the framework has improved and a simple implementation like mine can only be equal. Also nowadays we have a more improved quick sort, and maybe .net framework has adopted it, as java framework did: the Dual pivot Quicksort - But the differences are really small, and the quicksort can be considered, by his simplicity, beauty and efficiency, one of the best sorting algorithms.
  • ddekany
    ddekany over 6 years
    Merge sort doesn't have to copy 100% of the elements, unless it's some very naive implementation from a textbook. It's simple to implement it so that you only need to copy 50% of them (the left side of the two merged arrays). It's also trivial to postpone copying until you actually have to "swap" two elements, so with already sorted data you won't have any memory overhead. So even the 50% is actually the worst case, and you can have anything between that and 0%.
  • PlsWork
    PlsWork about 6 years
    The link is dead
  • Marquinho Peli
    Marquinho Peli about 6 years
    Hi @ddekany, please point out some examples of the naive vs expert implementation of merge sort so people can confirm your assertions. From what I saw in college books, merge sort was doing this back and forth copy and these Swap/IO operations slow down the algorithm. Also from what I see here: stackoverflow.com/questions/7576521/… a lot of copying between one array and another is going on.
  • ddekany
    ddekany about 6 years
    @MarquinhoPeli I meant to say that you only need 50% more available memory compared to the size of the sorted list, not 100%, which seems to be a common misconception. So I was talking about the peak memory usage. I can't give a link, but it's easy to see if you try to merge the two already sorted half of an array in place (only the left half has the problem where you overwrite elements that you haven't yet consumed). How much memory copying you have to do during the whole sorting process is another question, but obviously the worst case can't be below 100% for any sorting algorithm.
  • Nimrod
    Nimrod over 5 years
    @Antimony In practice, nobody cares about worst case as even randomized pivot selection will take care of it. When your list is large (which is when it matters) the selection of a bad pivot is unlikely, though admittedly not exponentially unlikely.
  • Antimony
    Antimony over 5 years
    @Nimrod the selection of a bad pivot is 100% if someone is trying to DOS your server.
  • Nic Szerman
    Nic Szerman about 4 years
    @DUJiaen n=500 is not large enough to conclude that the statistics don't agree with predicted asymptotic behavior
  • gniourf_gniourf
    gniourf_gniourf over 3 years
    Quicksort is not necessarily recursive. In fact, it's always possible to transform a recursive algorithm into a non recursive one. Sure, the classical presentations of QS all involve recursion, but it ain't necessarily so in practice.
  • GDI512
    GDI512 over 2 years
    @JustinPeel On a very large amount of random numbers in a small range (8 or 16 bit unsigned integers, for example) quicksort will always hit its worst case performance regardless of your choice of pivot