Explanation of the Median of Medians algorithm

22,207

Think of the following set of numbers:

5 2 6 3 1

The median of these numbers is 3. Now if you have a number n, if n > 3, then it is bigger than at least half of the numbers above. If n < 3, then it is smaller than at least half of the numbers above.

So that is the idea. That is, for each set of 5 numbers, you get their median. Now you have n / 5 numbers. This is obvious.

Now if you get the median of those numbers (call it m), it is bigger than half of them and smaller than the other half (by definition of median!). In other words, m is bigger than n / 10 numbers (which themselves were medians of small 5 element groups) and bigger than another n / 10 numbers (which again were medians of small 5 element groups).

In the example above, we saw that if the median is k and you have m > k, then m is also bigger than 2 other numbers (that were themselves smaller than k). This means that for each of those smaller 5 element groups where m was bigger than its median, m is also bigger than two other numbers. This makes it at least 3 numbers (2 numbers + the median itself) in each of those n / 10 small 5 element groups, that are smaller than m. Hence, m is at least bigger than 3n/10 numbers.

Similar logic for the number of elements m is bigger than.

Share:
22,207
SexyBeast
Author by

SexyBeast

Stackoverflow is. Therefore all programmers are.

Updated on April 22, 2020

Comments

  • SexyBeast
    SexyBeast about 4 years

    The Median of medians approach is very popular in quicksort type partitioning algorithms to yield a fairly good pivot, such that it partitions the array uniformly. Its logic is given in Wikipedia as:

    The chosen pivot is both less than and greater than half of the elements in the list of medians, which is around n/10 elements (1/2 * (n/5)) for each half. Each of these elements is a median of 5, making it less than 2 other elements and greater than 2 other elements outside the block. Hence, the pivot is less than 3(n/10) elements outside the block, and greater than another 3(n/10) elements outside the block. Thus the chosen median splits the elements somewhere between 30%/70% and 70%/30%, which assures worst-case linear behavior of the algorithm.

    Can somebody explain it a bit lucidly for me. I am finding it difficult to understand the logic.

  • SexyBeast
    SexyBeast over 11 years
    Just another question, how does this method guarantee that this number will be the median? The median is a number which partitions the array into the upper and lower half. So what does this 30-30-70 figure signify?
  • Shahbaz
    Shahbaz over 11 years
    Well, the median is in the middle, but m (in the text above) is not the median of all numbers. It is the median of only 1/5 of the numbers (which are themselves medians of smaller 5 element groups). Try reading the last paragraph with more attention. In the end, where it is concluded that m is bigger than at least 3n/10 of the numbers, well that translates to m is bigger than at least 30% of the numbers. So in the end, it goes like m is bigger than at least 30% and smaller than at least another 30%. There is 40% left which we are not sure of.
  • SexyBeast
    SexyBeast over 11 years
    Then how come it gives a 50-50 partition on average? The 50-50 partition is given by the normal median, right?
  • Shahbaz
    Shahbaz over 11 years
    It doesn't give 50-50 partition on average. It always gives somewhere between 30-70 and 70-30 (maybe you could call that 50-50 on average?), but that's not the point. For quicksort to give O(nlogn) time complexity, it needs to be able to divide the array into partitions proportional to the size of the array. That's what gives the logn recursion depth. 30-70 division already gives a 3n/10 and 7n/10 division which is proportional to n. So even though it's not a perfect 50-50, it will still yield logarithmic recursion depth (except it's not log in base 2, but base 10/7)