How to implement a Median-heap

62,870

Solution 1

You need two heaps: one min-heap and one max-heap. Each heap contains about one half of the data. Every element in the min-heap is greater or equal to the median, and every element in the max-heap is less or equal to the median.

When the min-heap contains one more element than the max-heap, the median is in the top of the min-heap. And when the max-heap contains one more element than the min-heap, the median is in the top of the max-heap.

When both heaps contain the same number of elements, the total number of elements is even. In this case you have to choose according your definition of median: a) the mean of the two middle elements; b) the greater of the two; c) the lesser; d) choose at random any of the two...

Every time you insert, compare the new element with those at the top of the heaps in order to decide where to insert it. If the new element is greater than the current median, it goes to the min-heap. If it is less than the current median, it goes to the max heap. Then you might need to rebalance. If the sizes of the heaps differ by more than one element, extract the min/max from the heap with more elements and insert it into the other heap.

In order to construct the median heap for a list of elements, we should first use a linear time algorithm and find the median. Once the median is known, we can simply add elements to the Min-heap and Max-heap based on the median value. Balancing the heaps isn't required because the median will split the input list of elements into equal halves.

If you extract an element you might need to compensate the size change by moving one element from one heap to another. This way you ensure that, at all times, both heaps have the same size or differ by just one element.

Solution 2

Here is a java implementaion of a MedianHeap, developed with the help of above comocomocomocomo 's explanation .

import java.util.Arrays;
import java.util.Comparator;
import java.util.PriorityQueue;
import java.util.Scanner;

/**
 *
 * @author BatmanLost
 */
public class MedianHeap {

    //stores all the numbers less than the current median in a maxheap, i.e median is the maximum, at the root
    private PriorityQueue<Integer> maxheap;
    //stores all the numbers greater than the current median in a minheap, i.e median is the minimum, at the root
    private PriorityQueue<Integer> minheap;

    //comparators for PriorityQueue
    private static final maxHeapComparator myMaxHeapComparator = new maxHeapComparator();
    private static final minHeapComparator myMinHeapComparator = new minHeapComparator();

    /**
     * Comparator for the minHeap, smallest number has the highest priority, natural ordering
     */
    private static class minHeapComparator implements Comparator<Integer>{
        @Override
        public int compare(Integer i, Integer j) {
            return i>j ? 1 : i==j ? 0 : -1 ;
        }
    }

    /**
     * Comparator for the maxHeap, largest number has the highest priority
     */
    private static  class maxHeapComparator implements Comparator<Integer>{
        // opposite to minHeapComparator, invert the return values
        @Override
        public int compare(Integer i, Integer j) {
            return i>j ? -1 : i==j ? 0 : 1 ;
        }
    }

    /**
     * Constructor for a MedianHeap, to dynamically generate median.
     */
    public MedianHeap(){
        // initialize maxheap and minheap with appropriate comparators
        maxheap = new PriorityQueue<Integer>(11,myMaxHeapComparator);
        minheap = new PriorityQueue<Integer>(11,myMinHeapComparator);
    }

    /**
     * Returns empty if no median i.e, no input
     * @return
     */
    private boolean isEmpty(){
        return maxheap.size() == 0 && minheap.size() == 0 ;
    }

    /**
     * Inserts into MedianHeap to update the median accordingly
     * @param n
     */
    public void insert(int n){
        // initialize if empty
        if(isEmpty()){ minheap.add(n);}
        else{
            //add to the appropriate heap
            // if n is less than or equal to current median, add to maxheap
            if(Double.compare(n, median()) <= 0){maxheap.add(n);}
            // if n is greater than current median, add to min heap
            else{minheap.add(n);}
        }
        // fix the chaos, if any imbalance occurs in the heap sizes
        //i.e, absolute difference of sizes is greater than one.
        fixChaos();
    }

    /**
     * Re-balances the heap sizes
     */
    private void fixChaos(){
        //if sizes of heaps differ by 2, then it's a chaos, since median must be the middle element
        if( Math.abs( maxheap.size() - minheap.size()) > 1){
            //check which one is the culprit and take action by kicking out the root from culprit into victim
            if(maxheap.size() > minheap.size()){
                minheap.add(maxheap.poll());
            }
            else{ maxheap.add(minheap.poll());}
        }
    }
    /**
     * returns the median of the numbers encountered so far
     * @return
     */
    public double median(){
        //if total size(no. of elements entered) is even, then median iss the average of the 2 middle elements
        //i.e, average of the root's of the heaps.
        if( maxheap.size() == minheap.size()) {
            return ((double)maxheap.peek() + (double)minheap.peek())/2 ;
        }
        //else median is middle element, i.e, root of the heap with one element more
        else if (maxheap.size() > minheap.size()){ return (double)maxheap.peek();}
        else{ return (double)minheap.peek();}

    }
    /**
     * String representation of the numbers and median
     * @return 
     */
    public String toString(){
        StringBuilder sb = new StringBuilder();
        sb.append("\n Median for the numbers : " );
        for(int i: maxheap){sb.append(" "+i); }
        for(int i: minheap){sb.append(" "+i); }
        sb.append(" is " + median()+"\n");
        return sb.toString();
    }

    /**
     * Adds all the array elements and returns the median.
     * @param array
     * @return
     */
    public double addArray(int[] array){
        for(int i=0; i<array.length ;i++){
            insert(array[i]);
        }
        return median();
    }

    /**
     * Just a test
     * @param N
     */
    public void test(int N){
        int[] array = InputGenerator.randomArray(N);
        System.out.println("Input array: \n"+Arrays.toString(array));
        addArray(array);
        System.out.println("Computed Median is :" + median());
        Arrays.sort(array);
        System.out.println("Sorted array: \n"+Arrays.toString(array));
        if(N%2==0){ System.out.println("Calculated Median is :" + (array[N/2] + array[(N/2)-1])/2.0);}
        else{System.out.println("Calculated Median is :" + array[N/2] +"\n");}
    }

    /**
     * Another testing utility
     */
    public void printInternal(){
        System.out.println("Less than median, max heap:" + maxheap);
        System.out.println("Greater than median, min heap:" + minheap);
    }

    //Inner class to generate input for basic testing
    private static class InputGenerator {

        public static int[] orderedArray(int N){
            int[] array = new int[N];
            for(int i=0; i<N; i++){
                array[i] = i;
            }
            return array;
        }

        public static int[] randomArray(int N){
            int[] array = new int[N];
            for(int i=0; i<N; i++){
                array[i] = (int)(Math.random()*N*N);
            }
            return array;
        }

        public static int readInt(String s){
            System.out.println(s);
            Scanner sc = new Scanner(System.in);
            return sc.nextInt();
        }
    }

    public static void main(String[] args){
        System.out.println("You got to stop the program MANUALLY!!");        
        while(true){
            MedianHeap testObj = new MedianHeap();
            testObj.test(InputGenerator.readInt("Enter size of the array:"));
            System.out.println(testObj);
        }
    }
}

Solution 3

Here my code based on the answer provided by comocomocomocomo :

import java.util.PriorityQueue;

public class Median {
private  PriorityQueue<Integer> minHeap = 
    new PriorityQueue<Integer>();
private  PriorityQueue<Integer> maxHeap = 
    new PriorityQueue<Integer>((o1,o2)-> o2-o1);

public float median() {
    int minSize = minHeap.size();
    int maxSize = maxHeap.size();
    if (minSize == 0 && maxSize == 0) {
        return 0;
    }
    if (minSize > maxSize) {
        return minHeap.peek();
    }if (minSize < maxSize) {
        return maxHeap.peek();
    }
    return (minHeap.peek()+maxHeap.peek())/2F;
}

public void insert(int element) {
    float median = median();
    if (element > median) {
        minHeap.offer(element);
    } else {
        maxHeap.offer(element);
    }
    balanceHeap();
}

private void balanceHeap() {
    int minSize = minHeap.size();
    int maxSize = maxHeap.size();
    int tmp = 0;
    if (minSize > maxSize + 1) {
        tmp = minHeap.poll();
        maxHeap.offer(tmp);
    }
    if (maxSize > minSize + 1) {
        tmp = maxHeap.poll();
        minHeap.offer(tmp);
    }
  }
}

Solution 4

Isn't a perfectly balanced binary search tree (BST) a median heap? It is true that even red-black BSTs aren't always perfectly balanced, but it might be close enough for your purposes. And log(n) performance is guaranteed!

AVL trees are more tighly balanced than red-black BSTs so they come even closer to being a true median heap.

Solution 5

Another way to do it without using a max-heap and a min-heap would be to use a median-heap right away.

In a max-heap, the parent is greater than the children. We can have a new type of heap where the parent is in the 'middle' of the children - the left child is smaller than the parent and the right child is greater than the parent. All even entries are left children and all odd entries are right children.

The same swim and sink operations which can be performed in a max-heap, can also be performed in this median-heap - with slight modifications. In a typical swim operation in a max-heap, the inserted entry swims up till it is smaller than a parent entry, here in a median-heap, it will swim up till it is lesser than a parent (if it is an odd entry) or greater than a parent (if it is an even entry).

Here's my implementation for this median-heap. I have used an array of Integers for simplicity.

package priorityQueues;
import edu.princeton.cs.algs4.StdOut;

public class MedianInsertDelete {

    private Integer[] a;
    private int N;

    public MedianInsertDelete(int capacity){

        // accounts for '0' not being used
        this.a = new Integer[capacity+1]; 
        this.N = 0;
    }

    public void insert(int k){

        a[++N] = k;
        swim(N);
    }

    public int delMedian(){

        int median = findMedian();
        exch(1, N--);
        sink(1);
        a[N+1] = null;
        return median;

    }

    public int findMedian(){

        return a[1];


    }

    // entry swims up so that its left child is smaller and right is greater

    private void swim(int k){


        while(even(k) && k>1 && less(k/2,k)){

            exch(k, k/2);

            if ((N > k) && less (k+1, k/2)) exch(k+1, k/2);
            k = k/2;
        }

        while(!even(k) && (k>1 && !less(k/2,k))){

            exch(k, k/2);
            if (!less (k-1, k/2)) exch(k-1, k/2);
            k = k/2;
        }

    }

// if the left child is larger or if the right child is smaller, the entry sinks down
    private void sink (int k){

        while(2*k <= N){
            int j = 2*k;
            if (j < N && less (j, k)) j++;
            if (less(k,j)) break;
            exch(k, j);
            k = j;
        }

    }

    private boolean even(int i){

        if ((i%2) == 0) return true;
        else return false;
    }

    private void exch(int i, int j){

        int temp = a[i];
        a[i] = a[j];
        a[j] = temp;
    }

    private boolean less(int i, int j){

        if (a[i] <= a[j]) return true;
        else return false;
    }


    public static void main(String[] args) {

        MedianInsertDelete medianInsertDelete = new MedianInsertDelete(10);

        for(int i = 1; i <=10; i++){

            medianInsertDelete.insert(i);
        }

        StdOut.println("The median is: " + medianInsertDelete.findMedian());

        medianInsertDelete.delMedian();


        StdOut.println("Original median deleted. The new median is " + medianInsertDelete.findMedian());




    }
}


Share:
62,870
Bruce
Author by

Bruce

Do it right the first time

Updated on July 09, 2022

Comments

  • Bruce
    Bruce almost 2 years

    Like a Max-heap and Min-heap, I want to implement a Median-heap to keep track of the median of a given set of integers. The API should have the following three functions:

    insert(int)  // should take O(logN)
    int median() // will be the topmost element of the heap. O(1)
    int delmedian() // should take O(logN)
    

    I want to use an array (a) implementation to implement the heap where the children of array index k are stored in array indices 2*k and 2*k + 1. For convenience, the array starts populating elements from index 1. This is what I have so far: The Median-heap will have two integers to keep track of number of integers inserted so far that are > current median (gcm) and < current median (lcm).

    if abs(gcm-lcm) >= 2 and gcm > lcm we need to swap a[1] with one of its children. 
    The child chosen should be greater than a[1]. If both are greater, 
    choose the smaller of two.
    

    Similarly for the other case. I can't come up with an algorithm for how to sink and swim elements. I think it should take into consideration how close the number is to the median, so something like:

    private void swim(int k) {
        while (k > 1 && absless(k, k/2)) {   
            exch(k, k/2);
            k = k/2;
        }
    }
    

    I can't come up with the entire solution though.

    • greybeard
      greybeard over 7 years
      This will get difficult without a limit to the multiplicity of any given value.
  • Bruce
    Bruce about 11 years
    what if both heaps have same number of elements?
  • comocomocomocomo
    comocomocomocomo about 11 years
    Then the total number of elements is even. Act according to your definition of median for this case: a) Choose always the lower; b) choose always the higher; c) choose at random; d) the median is the mean of these two middle elements...
  • phoeagon
    phoeagon about 11 years
    Then you need to maintain a median value each time you manipulate the set. Since it takes O(logN) to retrieve an element of arbitrary rank in a BST. Still it would suffice...I know..
  • Bruce
    Bruce about 11 years
    I meant while inserting an element what if both heaps have same size?
  • Bruce
    Bruce about 11 years
    Yes, but a median heap will give the median in constant time.
  • angelatlarge
    angelatlarge about 11 years
    @Bruce: That is true only in that sense that it is true for BSTs: once you build the structure, getting at the median number (without removing it) is O(0), however, if you do remove it, then you have to rebuild the heap/tree, which takes O(logn) for both.
  • angelatlarge
    angelatlarge about 11 years
    When you think about it, the d) median solution is a bit weird, because when you call remove() on a heap you expect it to give you the element that was actually deleted. If you compute median by averaging, then remove() will return one number (your computed median) but will actually delete a different number. So if you add n elements to this kind of MedianHeap, and then removeMedian and put it into another data structure for every element, the elements in the second data structure will not be the same as those that went into the MedianHeap.
  • comocomocomocomo
    comocomocomocomo about 11 years
    @angelatlarge: You mean option a), don't you? But yes, I agree. The mean of the two middle elements is more accurate in some sense, but it is not one of the elements ---unless the two middle elements are equal, of course---.
  • isubuz
    isubuz about 10 years
    For sake of completeness we should also add that: In order to construct the median heap for a list of elements, we should first use a linear time algorithm and find the median. Once the median is known, we can simply add elements to the Min-heap and Max-heap based on the median value. Balancing the heaps isn't required because the median will split the input list of elements into equal halves.
  • Erdem
    Erdem about 9 years
    a linear time algorithm and find the median: en.wikipedia.org/wiki/Median_of_medians
  • Alma Alma
    Alma Alma almost 8 years
    @angelatlarge I like your idea. But depending on how you define median, it might be more expensive than O(0). If there are even number of elements and you define median as the mean of two middle elements, then you have to find one more element beside the root.
  • Enrico Giurin
    Enrico Giurin about 7 years
    You don,t need to pass the Comparator to the minHeap since it's already ordered by integer natural ordering ascending.
  • cmal
    cmal almost 6 years
    "When the min-heap contains one more element than the max-heap, the median is in the top of the min-heap." should the top of the min-heap the smallest number?
  • comocomocomocomo
    comocomocomocomo almost 6 years
    @cmal, the top of the min-heap is the smallest number… only among those in the min-heap, which just contains half plus one elements of the full set
  • salt-die
    salt-die almost 3 years
    Unfortunately, the median doesn't always find its way to the top of the heap this way. I think you can only guarantee that the top of the heap is within log2(n) hops of the median in the sorted array. Consider the following heap that maintains this invariant, but doesn't have the median at the top: ``` 70 17 87 11 93 25 92 2 67 85 95 8 52 0 94 1 6 39 78 3 ```