O(klogk) time algorithm to find kth smallest element from a binary heap

algorithm search data-structures heap time-complexity

25,065

Solution 1

Well, your intuition was right that we need extra data structure to achieve O(klogk) because if we simply perform operations on the original heap, the term logn will remain in the resulting complexity.

Guessing from the targeted complexity O(klogk), I feel like creating and maintaining a heap of size k to help me achieve the goal. As you may be aware, building a heap of size k in top-down fashion takes O(klogk), which really reminds me of our goal.

The following is my try (not necessarily elegant or efficient) in an attempt to attain O(klogk):

We create a new min heap, initializing its root to be the root of the original heap.
We update the new min heap by deleting the current root and inserting the two children of the current root in the original heap. We repeat this process k times.
The resulting heap will consist of k nodes, the root of which is the kth smallest element in the original heap.

Notes: Nodes in the new heap should store indexes of their corresponding nodes in the original heap, rather than the node values themselves. In each iteration of step 2, we really add a net of one more node into the new heap (one deleted, two inserted), k iterations of which will result in our new heap of size k. During the ith iteration, the node to be deleted is the ith smallest element in the original heap.

Time Complexity: in each iteration, it takes O(3logk) time to delete one element from and insert two into the new heap. After k iterations, it is O(3klogk) = O(klogk).

Hope this solution inspires you a bit.

Solution 2

Assuming that we're using a minheap, so that a root node is always smaller than its children nodes.

Create a sorted list toVisit, which contains the nodes which we will traverse next. This is initially just the root node.
Create an array smallestNodes. Initially this is empty.
While length of smallestNodes < k:
    Remove the smallest Node from toVisit
    add that node to smallestNodes
    add that node's children to toVisit

When you're done, the kth smallest node is in smallestNodes[k-1].

Depending on the implementation of toVisit, you can get insertion in log(k) time and removal in constant time (since you're only removing the topmost node). That makes O(k*log(k)) total.

25,065

Author by

Admin

Updated on February 03, 2020

Comments

Admin over 4 years

We have an n-node binary heap which contains n distinct items (smallest item at the root). For a k<=n, find a O(klogk) time algorithm to select kth smallest element from the heap.

O(klogn) is obvious, but couldn't figure out a O(klogk) one. Maybe we can use a second heap, not sure.
Jim Mischel over 12 years

Insertion isn't log(k), but rather log(n), where n is the number of nodes already in the heap. Inserting k nodes will be k*log(n).
amit over 12 years

@JimMischel: no, in toVisit there are no more then 2k nodes at any point [since we add 2 elements for each element we remove, and we do it k times], so the insertion and deletion from toVisit is O(log2k) = O(logk). for each operation on the original list, we just extract the direct children of a specific node, which is O(1). we overall do k times O(logk) ops, which is indeed O(klogk).
amit over 12 years

though a sorted list is not a good data structure for toVisit, since insertion is O(k) in this list. You will need a heap to actually obtain O(klogk) [skip list/ balanced BST/B+ tree are also valid options, though harder to implement, heap will be enough here].
Jim Mischel over 12 years

@amit: Thank you. I misunderstood the description of the algorithm.
amit over 12 years

This is basically @Kevin 's solution
Vikas Mangal about 9 years

@Terry Li: Instead of creating a new min heap, if we create a new max heap of size k and keep on inserting the elements from original min heap to new max heap. When the max heap is full, its root will have the kth smallest element and that heap will have the smallest k elements. Is my thinking right ?
madCode almost 9 years

Instead of a sorted list, can't you just use a Queue and add to the queue smallest-largest children nodes to visit?
jiangok over 8 years

Isn't deleting current root O(lgn) instead of O(lgk) because heapifying the original min heap is required after the deletion. So the overall complexity becomes kO(lgn) instead of kO(lgk)?
sharmakeshav about 7 years

@VikasMangal No, that wouldn't work in klogk. That would again be a klogn algorithm because you will have to heapify the original min heap k times.
sharmakeshav about 7 years

@jiangok There's no need to modify the original heap. You just read elements from the original heap and then modify your newly created heap. The new heap will be of max size k, thus it'll take O(logk) to heapify it.
user5054 over 6 years

@TerryLi So, the new heap will be composed of pointers to the original heap nodes rather than the actual nodes? So, heapifying code for the new heap will be a little different?
Chaos almost 3 years

@madCode Consider the min-heap [1, 2, 100, 3, 4, 101, 102]. Find 4th smallest value (which will be the value 4). Using a queue and the smallest - largest children approach, we get [1, 2, 100, 3, 4]. This doesn't give us the right 4th smallest value.
Jet C. over 2 years

To find kth smallest element, isn't it should be a new max heap instead of a min heap?
jacob12 almost 2 years

After taking the current node's children to put in the auxiliary heap, this means that one of those two nodes must be the next minimum. But note that you have more nodes in the same level that we overpass and might even be more minimal. So we need to search the minimal node in the i-th level. So why we look only at the two children of a node when we have to look at nodes at the whole level and pick the minimal from there?