Why should Insertion Sort be used after threshold crossover in Merge Sort

algorithm sorting quicksort mergesort divide-and-conquer

13,125

Solution 1

Insertion sort is faster in practice, than bubblesort at least. Their asympotic running time is the same, but insertion sort has better constants (fewer/cheaper operations per iteration). Most notably, it requires only a linear number of swaps of pairs of elements, and in each inner loop it performs comparisons between each of n/2 elements and a "fixed" element that can be stores in a register (while bubble sort has to read values from memory). I.e. insertion sort does less work in its inner loop than bubble sort.
The answer claims that 10000 n lg n > 10 n² for "reasonable" n. This is true up to about 14000 elements.

Solution 2

If you bail out of each branch of your divide-and-conquer Quicksort when it hits the threshold, your data looks like this:

[the least 30-ish elements, not in order] [the next 30-ish ] ... [last 30-ish]

Insertion sort has the rather pleasing property that you can call it just once on that whole array, and it performs essentially the same as it does if you call it once for each block of 30. So instead of calling it in your loop, you have the option to call it last. This might not be faster, especially since it pulls the whole data through cache an extra time, but depending how the code is structured it might be convenient.

Neither bubble sort nor selection sort has this property, so I think the answer might quite simply be "convenience". If someone suspects selection sort might be better then the burden of proof lies on them to "prove" that it's faster.

Note that this use of insertion sort also has a drawback -- if you do it this way and there's a bug in your partition code then provided it doesn't lose any elements, just partition them incorrectly, you'll never notice.

Edit: apparently this modification is by Sedgewick, who wrote his PhD on QuickSort in 1975. It was analyzed more recently by Musser (the inventor of Introsort). Reference https://en.wikipedia.org/wiki/Introsort

Musser also considered the effect on caches of Sedgewick's delayed small sorting, where small ranges are sorted at the end in a single pass of insertion sort. He reported that it could double the number of cache misses, but that its performance with double-ended queues was significantly better and should be retained for template libraries, in part because the gain in other cases from doing the sorts immediately was not great.

In any case, I don't think the general advice is "whatever you do, don't use selection sort". The advice is, "insertion sort beats Quicksort for inputs up to a surprisingly non-tiny size", and this is pretty easy to prove to yourself when you're implementing a Quicksort. If you come up with another sort that demonstrably beats insertion sort on the same small arrays, none of those academic sources is telling you not to use it. I suppose the surprise is that the advice is consistently towards insertion sort, rather than each source choosing its own favorite (introductory teachers have a frankly astonishing fondness for bubble sort -- I wouldn't mind if I never hear of it again). Insertion sort is generally thought of as "the right answer" for small data. The issue isn't whether it "should be" fast, it's whether it actually is or not, and I've never particularly noticed any benchmarks dispelling this idea.

One place to look for such data would be in the development and adoption of Timsort. I'm pretty sure Tim Peters chose insertion for a reason: he wasn't offering general advice, he was optimizing a library for real use.

Solution 3

I am surprised no-one's mentioned the simple fact that insertion sort is simply much faster for "almost" sorted data. That's the reason it's used.

Solution 4

Here is an empirical proof the insertion sort is faster then bubble sort (for 30 elements, on my machine, the attached implementation, using java...).

I ran the attached code, and found out that the bubble sort ran on average of 6338.515 ns, while insertion took 3601.0

I used wilcoxon signed test to check the probability that this is a mistake and they should actually be the same - but the result is below the range of the numerical error (and effectively P_VALUE ~= 0)

private static void swap(int[] arr, int i, int j) { 
    int temp = arr[i];
    arr[i] = arr[j];
    arr[j] = temp;
}

public static void insertionSort(int[] arr) { 
    for (int i = 1; i < arr.length; i++) {
        int j = i;
        while (j > 0 && arr[j-1] > arr[j]) { 
            swap(arr, j, j-1);
            j--;
        }
    }
}
public static void bubbleSort(int[] arr) { 
    for (int i = 0 ; i < arr.length; i++) { 
        boolean bool = false;
        for (int j = 0; j < arr.length - i ; j++) { 
            if (j + 1 < arr.length && arr[j] > arr[j+1]) {
                bool = true;
                swap(arr,j,j+1);
            }
        }
        if (!bool) break;
    }
}

public static void main(String... args) throws Exception {
    Random r = new Random(1);
    int SIZE = 30;
    int N = 1000;
    int[] arr = new int[SIZE];
    int[] millisBubble = new int[N];
    int[] millisInsertion = new int[N];
    System.out.println("start");
    //warm up:
    for (int t = 0; t < 100; t++) { 
        insertionSort(arr);
    }
    for (int t = 0; t < N; t++) { 
        arr = generateRandom(r, SIZE);
        int[] tempArr = Arrays.copyOf(arr, arr.length);

        long start = System.nanoTime();
        insertionSort(tempArr);
        millisInsertion[t] = (int)(System.nanoTime()-start);

        tempArr = Arrays.copyOf(arr, arr.length);

        start = System.nanoTime();
        bubbleSort(tempArr);
        millisBubble[t] = (int)(System.nanoTime()-start);
    }
    int sum1 = 0;
    for (int x : millisBubble) {
        System.out.println(x);
        sum1 += x;
    }
    System.out.println("end of bubble. AVG = " + ((double)sum1)/millisBubble.length);
    int sum2 = 0;
    for (int x : millisInsertion) {
        System.out.println(x);
        sum2 += x;
    }
    System.out.println("end of insertion. AVG = " + ((double)sum2)/millisInsertion.length);
    System.out.println("bubble took " + ((double)sum1)/millisBubble.length + " while insertion took " + ((double)sum2)/millisBubble.length);
}

private static int[] generateRandom(Random r, int size) {
    int[] arr = new int[size];
    for (int i = 0 ; i < size; i++) 
        arr[i] = r.nextInt(size);
    return arr;
}

EDIT:
(1) optimizing the bubble sort (updated above) reduced the total time taking to bubble sort to: 6043.806 not enough to make a significant change. Wilcoxon test is still conclusive: Insertion sort is faster.

(2) I also added a selection sort test (code attached) and compared it against insertion. The results are: selection took 4748.35 while insertion took 3540.114.
P_VALUE for wilcoxon is still below the range of numerical error (effectively ~=0)

code for selection sort used:

public static void selectionSort(int[] arr) {
    for (int i = 0; i < arr.length ; i++) { 
        int min = arr[i];
        int minElm = i;
        for (int j = i+1; j < arr.length ; j++) { 
            if (arr[j] < min) { 
                min = arr[j];
                minElm = j;
            }
        }
        swap(arr,i,minElm);
    }
}

Solution 5

The easier one first: why insertion sort over selection sort? Because insertion sort is in O(n) for optimal input sequences, i.e. if the sequence is already sorted. Selection sort is always in O(n^2).

Why insertion sort over bubble sort? Both need only a single pass for already sorted input sequences, but insertion sort degrades better. To be more specific, insertion sort usually performs better with a small number of inversion than bubble sort does. Source This can be explained because bubble sort always iterates over N-i elements in pass i while insertion sort works more like "find" and only needs to iterate over (N-i)/2 elements in average (in pass N-i-1) to find the insertion position. So, insertion sort is expected to be about two times faster than insertion sort on average.

View more solutions

13,125

SexyBeast

Stackoverflow is. Therefore all programmers are.

Updated on June 04, 2022

Comments

SexyBeast almost 2 years

I have read everywhere that for divide and conquer sorting algorithms like Merge-Sort and Quicksort, instead of recursing until only a single element is left, it is better to shift to Insertion-Sort when a certain threshold, say 30 elements, is reached. That is fine, but why only Insertion-Sort? Why not Bubble-Sort or Selection-Sort, both of which has similar O(N^2) performance? Insertion-Sort should come handy only when many elements are pre-sorted (although that advantage should also come with Bubble-Sort), but otherwise, why should it be more efficient than the other two?

And secondly, at this link, in the 2nd answer and its accompanying comments, it says that O(N log N) performs poorly compared to O(N^2) upto a certain N. How come? N^2 should always perform worse than N log N, since N > log N for all N >= 2, right?
- amit over 11 years
  
  Insertion sort is considered fast algorithm for few elements, the reason is cache efficiency if I remember correctly.
- Haile over 11 years
  
  Also, note that Big O notation gives information about the asymptotic behaviour of functions. It's not true that an O(n^2) algorithm always performs worse than a O(n log n) one. For example if f(x) = x^2 and g(x) = 9999999n log n then for small n an algorithm with complexity f(x) will be faster than one with complexity g(x). Asymptotic notation only guarantees that there exist a number n such that forall m > n we have f(m) > g(m).
- Kwariz over 11 years
  
  Take care of the hidden constants when using big-oh notation, compare f(n)=10^-6.n^2 which is in O(n^2) and g(n)=10^10^10.n.log(n) which is in O(n log n)
- amit over 11 years
  
  @Cupidvogel: After carefully thinking : I believe the issue is NOT cache. Modern machines has ~32KB cache, while 30 elements occupy usually much less then it. Thus - for almost any sort algorithm for 30 elements - all of them are expected to be read from memory into cache once and stay there for the entire duration of the sort (no element is expected to be thrown out while sorting these 30 elements).
IVlad over 11 years

Citation or proof needed for 1.
Fred Foo over 11 years

@IVlad: e.g., algorithmist.com/index.php/Insertion_sort -- the number of swaps performed, i.e. the number of reads/writes to memory, is linear in insertion sort but quadratic in bubble sort. Those are likely the most expensive operations in these algorithms as most of the rest can be done in registers.
IVlad over 11 years

First, swaps and writes are not the same thing: insertion sort has linear swaps, but quadratic writes. Second, selection sort is the one with the least writes actually. So not a good explanation at all.
IVlad over 11 years

Oh come on, optimize that bubble sort, even wikipedia has it optimized :). Besides, selection sort would be the interesting one I think.
Fred Foo over 11 years

@IVlad: hmm, yes, overlooked that. It would seem the comparisons are faster in insertion sort because one of the operands can be cached in a register.
SexyBeast over 11 years

what is there to optimize in Bubble Sort?
IVlad over 11 years

And optimize insertion sort too... you don't need to call swap in the inner loop. @Cupidvogel - see its wiki entry.
SexyBeast over 11 years

Well, the first pass of the bubble sort puts the highest element in the last position, the 2nd pass places the 2nd-highest element in the 2nd last position. So after the nth pass, when the nth largest element has been placed, naturally the inner loop shouldn't cover 1 to n, but rather n-i. This is the standard practice, right?
IVlad over 11 years

@Cupidvogel - you can also end the algorithm early if no swaps were performed in an inner loop iteration. This can help a lot for random test data.
amit over 11 years

@IVlad: I am going to a meeting in a minute, but I promise I'll add selection sort later today. Regarding the optimizations: sorry, haven't programmed any of the naive sorting algorithms in ~4 years. I'll add them as well later on.
amit over 11 years

P.S. empirical proves are fun.
Ankit Roy over 11 years

+1 for the observation about selection sort always requiring O(n^2) time. But the 2nd para is unsatisfying: the Wikipedia page just repeats what you're saying, without really explaining why.
amit over 11 years

@IVlad: I optimized bubble sort a bit, and rechecked. Also tested selection sort. Wilcoxon is conclusive - insertion is the best among these.
IVlad over 11 years

Nice, I upvoted shortly after you posted anyway :). Was very cool seeing so many high-rep users here at once.