Understanding the Recursion of mergesort

algorithm recursion merge mergesort

44,432

Solution 1

I think the "sort" function name in MergeSort is a bit of a misnomer, it should really be called "divide".

Here is a visualization of the algorithm in process.

enter image description here

Each time the function recurses, it's working on a smaller and smaller subdivision of the input array, starting with the left half of it. Each time the function returns from recursion, it will continue on and either start working on the right half, or recurse up again and work on a larger half.

Like this

[************************]mergesort
[************]mergesort(lo,mid)
[******]mergesort(lo,mid)
[***]mergesort(lo,mid)
[**]mergesort(lo,mid)
 [**]mergesort(mid+1,hi)
[***]merge
   [***]mergesort(mid+1,hi)
   [**]mergesort*(lo,mid)
    [**]mergesort(mid+1,hi)
   [***]merge
[******]merge
      [******]mergesort(mid+1,hi)
      [***]mergesort(lo,mid)
      [**]mergesort(lo,mid)
       [**]mergesort(mid+1,hi)
      [***]merge
         [***]mergesort(mid+1,hi)
         [**]mergesort(lo,mid)
           [**]mergesort(mid+1,hi)
         [***]merge
      [******]merge
[************]merge
            [************]mergesort(mid+1,hi)
            [******]mergesort(lo,mid)
            [***]mergesort(lo,mid)
            [**]mergesort(lo,mid)
             [**]mergesort(mid+1,hi)
            [***]merge
               [***]mergesort(mid+1,hi)
               [**]mergesort(lo,mid)
                 [**]mergesort(mid+1,hi)
               [***]merge
            [******]merge
                  [******]mergesort(mid+1,hi)
                  [***]mergesort(lo,mid)
                  [**]mergesort*(lo,mid)
                    [**]mergesort(mid+1,hi)
                  [***]merge
                     [***]mergesort(mid+1,hi)    
                     [**]mergesort(lo,mid)
                      [**]mergesort(mid+1,hi)
                     [***]merge
                  [******]merge
            [************]merge
[************************]merge

Solution 2

MERGE SORT:

1) Split the array in half
2) Sort the left half
3) Sort the right half
4) Merge the two halves together

Solution 3

An obvious thing to do would be to try this merge sort on a small array, say size 8 (power of 2 is convenient here), on paper. Pretend you are a computer executing the code, and see if it starts to become a bit clearer.

Your question is a bit ambiguous because you don't explain what you find confusing, but it sounds like you are trying to unroll the recursive calls in your head. Which may or may not be a good thing, but I think it can easily lead to having too much in your head at once. Instead of trying to trace the code from start to end, see if you can understand the concept abstractly. Merge sort:

Splits the array in half
Sorts the left half
Sorts the right half
Merges the two halves together

(1) should be fairly obvious and intuitive to you. For step (2) the key insight is this, the left half of an array... is an array. Assuming your merge sort works, it should be able to sort the left half of the array. Right? Step (4) is actually a pretty intuitive part of the algorithm. An example should make it trivial:

at the start
left: [1, 3, 5], right: [2, 4, 6, 7], out: []

after step 1
left: [3, 5], right: [2, 4, 6, 7], out: [1]

after step 2
left: [3, 5], right: [4, 6, 7], out: [1, 2]

after step 3
left: [5], right: [4, 6, 7], out: [1, 2, 3]

after step 4
left: [5], right: [6, 7], out: [1, 2, 3, 4]

after step 5
left: [], right: [6, 7], out: [1, 2, 3, 4, 5]

after step 6
left: [], right: [7], out: [1, 2, 3, 4, 5, 6]

at the end
left: [], right: [], out: [1, 2, 3, 4, 5, 6, 7]

So assuming that you understand (1) and (4), another way to think of merge sort would be this. Imagine someone else wrote mergesort() and you're confident that it works. Then you could use that implementation of mergesort() to write:

sort(myArray)
{
    leftHalf = myArray.subArray(0, myArray.Length/2);
    rightHalf = myArray.subArray(myArray.Length/2 + 1, myArray.Length - 1);

    sortedLeftHalf = mergesort(leftHalf);
    sortedRightHalf = mergesort(rightHalf);

    sortedArray = merge(sortedLeftHalf, sortedRightHalf);
}

Note that sort doesn't use recursion. It just says "sort both halves and then merge them". If you understood the merge example above then hopefully you see intuitively that this sort function seems to do what it says... sort.

Now, if you look at it more carefully... sort() looks pretty much exactly like mergesort()! That's because it is mergesort() (except it doesn't have base cases because it's not recursive!).

But that's how I like thinking of recursive functions--assume that the function works when you call it. Treat it as a black box that does what you need it to. When you make that assumption, figuring out how to fill in that black box is often easy. For a given input, can you break it down into smaller inputs to feed to your black box? After you solve that, the only thing that's left is handling the base cases at the start of your function (which are the cases where you don't need to make any recursive calls. For example, mergesort([]) just returns an empty array; it doesn't make a recursive call to mergesort()).

Finally, this is a bit abstract, but a good way to understand recursion is actually to write mathematical proofs using induction. The same strategy used to write an proof by induction is used to write a recursive function:

Math proof:

Show the claim is true for the base cases
Assume it is true for inputs smaller than some n
Use that assumption to show that it is still true for an input of size n

Recursive function:

Handle the base cases
Assume that your recursive function works on inputs smaller than some n
Use that assumption to handle an input of size n

Solution 4

Concerning the recursion part of the merge sort, I've found this page to be very very helpful. You can follow the code as it's being executed. It shows you what gets executed first, and what follows next.

Tom

Solution 5

the mergesort() simply divides the array in two halves until the if condition fails that is low < high. As you are calling mergesort() twice : one with low to pivot and second with pivot+1 to high, this will divide the sub arrays even more further.

Lets take an example :

a[] = {9,7,2,5,6,3,4}
pivot = 0+6/2 (which will be 3)
=> first mergesort will recurse with array {9,7,2} : Left Array
=> second will pass the array {5,6,3,4} : Right Array

It will repeat until you have 1 element in each left as well as right array. In the end you'll have something similar to this :

L : {9} {7} {2} R : {5} {6} {3} {4} (each L and R will have further sub L and R)
=> which on call to merge will become

L(L{7,9} R{2}) : R(L{5,6} R{3,4})
As you can see that each sub array are getting sorted in the merge function.

=> on next call to merge the next L and R sub arrays will get in order
L{2,7,9} : R{3,4,5,6}

Now both L and R sub array are sorted within
On last call to merge they'll be merged in order

Final Array would be sorted => {2,3,4,5,6,7,9}

See the merging steps in answer given by @roliu

View more solutions

44,432

Author by

2c2c

Updated on July 17, 2022

Comments

2c2c almost 2 years

Most of the mergesort implementations I see are similar to this. intro to algorithms book along with online implentations I search for. My recursion chops don't go much further than messing with Fibonacci generation (which was simple enough) so maybe it's the multiple recursions blowing my mind, but I can't even step through the code and understand whats going on even before I even hit the merge function.

How is it stepping through this? Is there some strategy or reading I should undergo to better understand the process here?

void mergesort(int *a, int*b, int low, int high)
{
    int pivot;
    if(low<high)
    {
        pivot=(low+high)/2;
        mergesort(a,b,low,pivot);
        mergesort(a,b,pivot+1,high);
        merge(a,b,low,pivot,high);
    }
}

and the merge(although frankly I'm mentally stuck before I even get to this part)

void merge(int *a, int *b, int low, int pivot, int high)
{
    int h,i,j,k;
    h=low;
    i=low;
    j=pivot+1;

    while((h<=pivot)&&(j<=high))
    {
        if(a[h]<=a[j])
        {
            b[i]=a[h];
            h++;
        }
        else
        {
            b[i]=a[j];
            j++;
        }
        i++;
    }
    if(h>pivot)
    {
        for(k=j; k<=high; k++)
        {
            b[i]=a[k];
            i++;
        }
    }
    else
    {
        for(k=h; k<=pivot; k++)
        {
            b[i]=a[k];
            i++;
        }
    }
    for(k=low; k<=high; k++) a[k]=b[k];
}