Heap vs Binary Search Tree (BST)

118,599

Solution 1

Summary

          Type      BST (*)   Heap
Insert    average   log(n)    1
Insert    worst     log(n)    log(n) or n (***)
Find any  worst     log(n)    n
Find max  worst     1 (**)    1
Create    worst     n log(n)  n
Delete    worst     log(n)    log(n)

All average times on this table are the same as their worst times except for Insert.

  • *: everywhere in this answer, BST == Balanced BST, since unbalanced sucks asymptotically
  • **: using a trivial modification explained in this answer
  • ***: log(n) for pointer tree heap, n for dynamic array heap

Advantages of binary heap over a BST

  • average time insertion into a binary heap is O(1), for BST is O(log(n)). This is the killer feature of heaps.

    There are also other heaps which reach O(1) amortized (stronger) like the Fibonacci Heap, and even worst case, like the Brodal queue, although they may not be practical because of non-asymptotic performance: Are Fibonacci heaps or Brodal queues used in practice anywhere?

  • binary heaps can be efficiently implemented on top of either dynamic arrays or pointer-based trees, BST only pointer-based trees. So for the heap we can choose the more space efficient array implementation, if we can afford occasional resize latencies.

  • binary heap creation is O(n) worst case, O(n log(n)) for BST.

Advantage of BST over binary heap

  • search for arbitrary elements is O(log(n)). This is the killer feature of BSTs.

    For heap, it is O(n) in general, except for the largest element which is O(1).

"False" advantage of heap over BST

  • heap is O(1) to find max, BST O(log(n)).

    This is a common misconception, because it is trivial to modify a BST to keep track of the largest element, and update it whenever that element could be changed: on insertion of a larger one swap, on removal find the second largest. Can we use binary search tree to simulate heap operation? (mentioned by Yeo).

    Actually, this is a limitation of heaps compared to BSTs: the only efficient search is that for the largest element.

Average binary heap insert is O(1)

Sources:

Intuitive argument:

  • bottom tree levels have exponentially more elements than top levels, so new elements are almost certain to go at the bottom
  • heap insertion starts from the bottom, BST must start from the top

In a binary heap, increasing the value at a given index is also O(1) for the same reason. But if you want to do that, it is likely that you will want to keep an extra index up-to-date on heap operations How to implement O(logn) decrease-key operation for min-heap based Priority Queue? e.g. for Dijkstra. Possible at no extra time cost.

GCC C++ standard library insert benchmark on real hardware

I benchmarked the C++ std::set (Red-black tree BST) and std::priority_queue (dynamic array heap) insert to see if I was right about the insert times, and this is what I got:

enter image description here

  • benchmark code
  • plot script
  • plot data
  • tested on Ubuntu 19.04, GCC 8.3.0 in a Lenovo ThinkPad P51 laptop with CPU: Intel Core i7-7820HQ CPU (4 cores / 8 threads, 2.90 GHz base, 8 MB cache), RAM: 2x Samsung M471A2K43BB1-CRC (2x 16GiB, 2400 Mbps), SSD: Samsung MZVLB512HAJQ-000L7 (512GB, 3,000 MB/s)

So clearly:

  • heap insert time is basically constant.

    We can clearly see dynamic array resize points. Since we are averaging every 10k inserts to be able to see anything at all above system noise, those peaks are in fact about 10k times larger than shown!

    The zoomed graph excludes essentially only the array resize points, and shows that almost all inserts fall under 25 nanoseconds.

  • BST is logarithmic. All inserts are much slower than the average heap insert.

  • BST vs hashmap detailed analysis at: What data structure is inside std::map in C++?

GCC C++ standard library insert benchmark on gem5

gem5 is a full system simulator, and therefore provides an infinitely accurate clock with with m5 dumpstats. So I tried to use it to estimate timings for individual inserts.

enter image description here

Interpretation:

  • heap is still constant, but now we see in more detail that there are a few lines, and each higher line is more sparse.

    This must correspond to memory access latencies are done for higher and higher inserts.

  • TODO I can't really interpret the BST fully one as it does not look so logarithmic and somewhat more constant.

    With this greater detail however we can see can also see a few distinct lines, but I'm not sure what they represent: I would expect the bottom line to be thinner, since we insert top bottom?

Benchmarked with this Buildroot setup on an aarch64 HPI CPU.

BST cannot be efficiently implemented on an array

Heap operations only need to bubble up or down a single tree branch, so O(log(n)) worst case swaps, O(1) average.

Keeping a BST balanced requires tree rotations, which can change the top element for another one, and would require moving the entire array around (O(n)).

Heaps can be efficiently implemented on an array

Parent and children indexes can be computed from the current index as shown here.

There are no balancing operations like BST.

Delete min is the most worrying operation as it has to be top down. But it can always be done by "percolating down" a single branch of the heap as explained here. This leads to an O(log(n)) worst case, since the heap is always well balanced.

If you are inserting a single node for every one you remove, then you lose the advantage of the asymptotic O(1) average insert that heaps provide as the delete would dominate, and you might as well use a BST. Dijkstra however updates nodes several times for each removal, so we are fine.

Dynamic array heaps vs pointer tree heaps

Heaps can be efficiently implemented on top of pointer heaps: Is it possible to make efficient pointer-based binary heap implementations?

The dynamic array implementation is more space efficient. Suppose that each heap element contains just a pointer to a struct:

  • the tree implementation must store three pointers for each element: parent, left child and right child. So the memory usage is always 4n (3 tree pointers + 1 struct pointer).

    Tree BSTs would also need further balancing information, e.g. black-red-ness.

  • the dynamic array implementation can be of size 2n just after a doubling. So on average it is going to be 1.5n.

On the other hand, the tree heap has better worst case insert, because copying the backing dynamic array to double its size takes O(n) worst case, while the tree heap just does new small allocations for each node.

Still, the backing array doubling is O(1) amortized, so it comes down to a maximum latency consideration. Mentioned here.

Philosophy

  • BSTs maintain a global property between a parent and all descendants (left smaller, right bigger).

    The top node of a BST is the middle element, which requires global knowledge to maintain (knowing how many smaller and larger elements are there).

    This global property is more expensive to maintain (log n insert), but gives more powerful searches (log n search).

  • Heaps maintain a local property between parent and direct children (parent > children).

    The top node of a heap is the big element, which only requires local knowledge to maintain (knowing your parent).

Comparing BST vs Heap vs Hashmap:

  • BST: can either be either a reasonable:

    • unordered set (a structure that determines if an element was previously inserted or not). But hashmap tends to be better due to O(1) amortized insert.
    • sorting machine. But heap is generally better at that, which is why heapsort is much more widely known than tree sort
  • heap: is just a sorting machine. Cannot be an efficient unordered set, because you can only check for the smallest/largest element fast.

  • hash map: can only be an unordered set, not an efficient sorting machine, because the hashing mixes up any ordering.

Doubly-linked list

A doubly linked list can be seen as subset of the heap where first item has greatest priority, so let's compare them here as well:

  • insertion:
    • position:
      • doubly linked list: the inserted item must be either the first or last, as we only have pointers to those elements (unless we have a pointer to the position of interest e.g. during iteration)
      • binary heap: the inserted item can end up in any position. Less restrictive than linked list.
    • time:
      • doubly linked list: O(1) worst case since we have pointers to the items, and the update is really simple
      • binary heap: O(1) average, thus worse than linked list. Tradeoff for having more general insertion position.
  • search: O(n) for both

An use case for this is when the key of the heap is the current timestamp: in that case, new entries will always go to the beginning of the list. So we can even forget the exact timestamp altogether, and just keep the position in the list as the priority.

This can be used to implement an LRU cache. Just like for heap applications like Dijkstra, you will want to keep an additional hashmap from the key to the corresponding node of the list, to find which node to update quickly.

Comparison of different Balanced BST

Although the asymptotic insert and find times for all data structures that are commonly classified as "Balanced BSTs" that I've seen so far is the same, different BBSTs do have different trade-offs. I haven't fully studied this yet, but it would be good to summarize these trade-offs here:

  • Red-black tree. Appears to be the most commonly used BBST as of 2019, e.g. it is the one used by the GCC 8.3.0 C++ implementation
  • AVL tree. Appears to be a bit more balanced than BST, so it could be better for find latency, at the cost of slightly more expensive finds. Wiki summarizes: "AVL trees are often compared with red–black trees because both support the same set of operations and take [the same] time for the basic operations. For lookup-intensive applications, AVL trees are faster than red–black trees because they are more strictly balanced. Similar to red–black trees, AVL trees are height-balanced. Both are, in general, neither weight-balanced nor mu-balanced for any mu < 1/2; that is, sibling nodes can have hugely differing numbers of descendants."
  • WAVL. The original paper mentions advantages of that version in terms of bounds on rebalancing and rotation operations.

See also

Similar question on CS: https://cs.stackexchange.com/questions/27860/whats-the-difference-between-a-binary-search-tree-and-a-binary-heap

Solution 2

Heap just guarantees that elements on higher levels are greater (for max-heap) or smaller (for min-heap) than elements on lower levels, whereas BST guarantees order (from "left" to "right"). If you want sorted elements, go with BST.

Solution 3

When to use a heap and when to use a BST

Heap is better at findMin/findMax (O(1)), while BST is good at all finds (O(logN)). Insert is O(logN) for both structures. If you only care about findMin/findMax (e.g. priority-related), go with heap. If you want everything sorted, go with BST.

First few slides from here explain things very clearly.

Solution 4

As mentioned by others, Heap can do findMin or findMax in O(1) but not both in the same data structure. However I disagree that Heap is better in findMin/findMax. In fact, with a slight modification, the BST can do both findMin and findMax in O(1).

In this modified BST, you keep track of the the min node and max node everytime you do an operation that can potentially modify the data structure. For example in insert operation you can check if the min value is larger than the newly inserted value, then assign the min value to the newly added node. The same technique can be applied on the max value. Hence, this BST contain these information which you can retrieve them in O(1). (same as binary heap)

In this BST (Balanced BST), when you pop min or pop max, the next min value to be assigned is the successor of the min node, whereas the next max value to be assigned is the predecessor of the max node. Thus it perform in O(1). However we need to re-balance the tree, thus it will still run O(log n). (same as binary heap)

I would be interested to hear your thought in the comment below. Thanks :)

Update

Cross reference to similar question Can we use binary search tree to simulate heap operation? for more discussion on simulating Heap using BST.

Solution 5

A binary search tree uses the definition: that for every node,the node to the left of it has a less value(key) and the node to the right of it has a greater value(key).

Where as the heap,being an implementation of a binary tree uses the following definition:

If A and B are nodes, where B is the child node of A,then the value(key) of A must be larger than or equal to the value(key) of B.That is, key(A) ≥ key(B).

http://wiki.answers.com/Q/Difference_between_binary_search_tree_and_heap_tree

I ran in the same question today for my exam and I got it right. smile ... :)

Share:
118,599

Related videos on Youtube

kc3
Author by

kc3

Updated on May 02, 2022

Comments

  • kc3
    kc3 about 2 years

    What is the difference between a heap and BST?

    When to use a heap and when to use a BST?

    If you want to get the elements in a sorted fashion, is BST better over heap?

    • Flow
      Flow over 10 years
      This question appears to be off-topic because it is about computer science and should be asked on cs.stackexchange.com
    • Ciro Santilli OurBigBook.com
      Ciro Santilli OurBigBook.com about 9 years
      @Flow it has been asked there at: cs.stackexchange.com/questions/27860/…
    • azizbro
      azizbro about 5 years
      I feel like it relates to both stack exchange and stack overflow. So having it here is fine
  • johncip
    johncip about 10 years
    While insert is logarithmic for both in the worst case, the average heap insert takes constant time. (Since most of the existing elements are on the bottom, in most cases a new element will only have to bubble up one or two levels, if at all.)
  • Yeo
    Yeo over 9 years
    @xysun I think BST is better in findMin & findMax stackoverflow.com/a/27074221/764592
  • Yeo
    Yeo over 9 years
    Why do you disagree? would you mind share to your thought below?
  • user2752467
    user2752467 over 9 years
    You could certainly store the maximum and/or minimum value of a BST, but then what happens if you want to pop it? You have to search the tree to remove it, then search again for the new max/min, both of which are O(log n) operations. That's the same order as insertions and removals in a priority heap, with a worse constant.
  • Yeo
    Yeo over 9 years
    @JustinLardinois Sorry, I forget to highlight this in my answer. In BST, when you do pop min, the next min value to be assigned is the successor of the min node. and if you pop the max, the next max value to be assigned is the predecessor of the max node. Thus it still perform in O(1).
  • Yeo
    Yeo over 9 years
    Correction: for popMin or popMax it is not O(1), but it is O(log n) because it has to be a Balanced BST which need to be rebalance every delete operation. Hence it the same as binary heap popMin or popMax which run O(log n)
  • Chaos
    Chaos about 9 years
    You can get the first min/max, but getting kth min/max would go back to normal BST complexity.
  • CODError
    CODError about 9 years
    Yes. It is from unsorted to sorted sequence. O(n) time for inorder traversal of a BST, which gives sorted sequence. While in Heaps, you extract min element and then re-heapify in O(log n) time. SO, it will take O(n logn) to extract n elements. And it will leave you with a sorted sequence.
  • greybeard
    greybeard about 9 years
    from unsorted to sorted sequence. O(n) time for inorder traversal of a BST, which gives sorted sequence. Well, from unsorted sequence to BST I don't know a method based on key comparison with less than O(n logn) time, which dominates the BST to sequence part. (Whereas there is O(n) heap construction.). I'd consider it fair (if pointless) to state heaps being close to unsortedness and BSTs sorted.
  • CODError
    CODError about 9 years
    What I am trying to explain here is that if you have a BST and also a Heap of n elements => then all elements could be printed in sorted order from both data structures and BST can do it in O(n) time (Inorder traversal), while Heap would take O(n logn) time. I don't understand what you are trying to say here. How do you say BST will give you sorted sequence in O(n logn).
  • CODError
    CODError about 9 years
    I think you are also considering time taken to build a BST and a Heap. But I assume you already have it, that you have build it over the time and now you want to get the sorted result. I am not getting your point?
  • CODError
    CODError about 9 years
    Edited... I hope you are satisfied now ;p and give a +1 if its correct.
  • Mooing Duck
    Mooing Duck about 9 years
    @Yeo: Heap is better for findMin xor findMax. If you need both, then BST is better.
  • foamroll
    foamroll about 9 years
    "heap, being an implementation of binary tree" - just pointing out that a heap is a kind of binary tree, not a kind of BST
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com almost 9 years
    I think this is just a common misconception. A binary tree can be easily modified to find min and max as pointed by Yeo. This is actually a restriction of the heap: the only efficient find is min or max. The true advantage of the heap is O(1) average insert as I explain: stackoverflow.com/a/29548834/895245
  • Daniel Andersson
    Daniel Andersson over 8 years
    "Heap just guarantees that elements on higher levels are greater (for max-heap) or smaller (for min-heap) than elements on lower levels, …" – the heap does not enforce this per level, but only in parent–child-chains. [1, 5, 9, 7, 15, 10, 11] represents a valid min-heap, but the 7 on level 3 is smaller than 9 on level 2. For a visualization, see e.g. the 25 and 19 elements in the sample Wikipedia image for heaps. (Also note that the inequality relations between elements are not strict, since elements are not necessarily unique.)
  • goonerify
    goonerify over 8 years
    Use a heap if you anticipate lots of deletions because this can unbalance the height of the tree to become √N instead of ln N, Also, if your data is not randomly ordered
  • Vic Seedoubleyew
    Vic Seedoubleyew almost 8 years
    Ciro Santilli's answer is far better : stackoverflow.com/a/29548834/2873507
  • Ankit Roy
    Ankit Roy over 7 years
    I +1ed, but the "paper" justifying average O(1) binary heap insertion is now a dead link, and the "slides" just state the claim without proof. Also I think it would help to clarify that "average case" here means the average assuming that inserted values come from some particular distribution, so I'm not sure how "killer" this feature really is.
  • Krishna
    Krishna over 6 years
    Sorry for late entry but I just want to get clarity. If the Binary Heap is sorted, then worst case for search would be log n right. So in that case, are sorted Binary Heaps better than Binary Search Trees(Red-Black BST). Thank you
  • gkalpak
    gkalpak over 6 years
    BST and balanced BST seem to be used interchangably. It should be made clear that the answer refers to balanced BST to avoid confusion.
  • flow2k
    flow2k almost 6 years
    What about max-extraction? In a priority queue, we'd want to extract the max as well. Wouldn't that be log(n)?
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com almost 6 years
    @flow2k for both max and min, it is trivial on BST with the modification mentioned, and for heap I think you just have to keep two heaps sorted differently for the O(1).
  • flow2k
    flow2k almost 6 years
    I'll think more about the BST case - thanks. But for the heap, I'm not understanding "keeping two heaps sorted". Isn't there just one heap? And the heap is not sorted?
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com almost 6 years
    @flow2k if you want both max and min I mean. By sorted I mean the inverting the insert comparison function.
  • Bulat
    Bulat almost 6 years
    @flow2k Heap isn't sorted, and this is its main difference to BST! Instead, you know only that each parent is higher (or lower) than children. You can consider that as family where each parent is older than children but there are no any other restrictions. So its guaranteed that root is oldest in the family, but nothing else. So, if you need to find youngest human in the family, you need to scan the entire heap. Or, alternatively, maintain the second heap with opposite comparison operator, i.e. one heap where each parent > children, and another heap where each parent < children.
  • flow2k
    flow2k almost 6 years
    @Bulat I feel we're digressing a little, but if we want both max and min at the same time, we might run into issues with maintaining two heaps if we're not careful - stackoverflow.com/a/1098454/7154924. It's probably better to use a max-min heap (due to Atkinson et al.), which is specifically designed for this purpose.
  • Varun Garg
    Varun Garg almost 6 years
    What happens when have to delete the min / max element? In heap, we can find it in O(1) but then we will have to balance it in O(n). Which one is better here?
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com over 5 years
    @VarunGarg no, heap delete is O(log(n)). I have edited the answer, review the delete algorithm. But this would be an issue if you are doing one delete per insert, you might as well use BST in that case.
  • Ankit Roy
    Ankit Roy over 5 years
    Thanks for updating the link about average-case complexity. From quickly skimming it, it seems they treat 2 different kinds: a uniformly random distribution on heaps, and the distribution on heaps resulting from heap-inserting the elements from a uniformly randomly distributed permutation of 1, ..., n. Interestingly, these turn out to be quite different distributions!
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com over 5 years
    @j_random_hacker if you do go into more details, possibly edit directly into "Average binary heap insert is" or add new answer, e.g. we can also quote something more precise on article or adding full proof. We can also extend the experiment graph to show the differences in behavior in practice.
  • Ricola
    Ricola over 5 years
    @CiroSantilli新疆改造中心六四事件法轮功 : I don't understand why the delete operation of a binary heap is O(log n). This only works if you have a pointer to the element in the heap, but in most use cases, you have the key and you need to find the element first which takes O(n).
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com over 5 years
    @Ricola yes, delete on the Heap refers only to root deletion, which is slightly misleading since it is on the same row as that of BST. On the other hand, one can consider that deleting something implies finding it in the first place, and that the cost of finding is calculated separately from the deletion cost.
  • Bobo
    Bobo over 5 years
    heap insertion is log(n) not o(1)
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com over 5 years
    @Bobo It is o1 average according to my research and experiments
  • Bobo
    Bobo over 5 years
    @CiroSantilli新疆改造中心六四事件法轮功 interesting. can you share your experiments and its results?
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com over 5 years
    @Bobo see the graphs in the answer at section "GCC C++ standard library insert benchmark on real hardware". Let me know if you find anything wrong with my methods :-)
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com over 4 years
    @MROB thanks for the comment. Do you have a reference for it? I've seen references talking about average bounds, but not insertion itself.
  • MROB
    MROB over 4 years
    @Ciro Santilli 新疆改造中心法轮功六四事件 you can see the proof in the original paper: Haeupler, Bernhard; Sen, Siddhartha; Tarjan, Robert E. (2015), "Rank-balanced trees" (PDF), ACM Transactions on Algorithms, 11 (4): Art. 30, 26, doi:10.1145/2689412, MR 336121
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com over 4 years
    @MROB thanks a lot. I did a quick search on the article citeseerx.ist.psu.edu/viewdoc/… but I could only find the author mentioning rebalancing and rotation bounds after inserts. Can you provide a precise quote on the amortized insert itself from the article? Or some reference / argument that explains that this bound implies the insert bound? In particular, in my naive intuition, the average insert has to go through a log(N) depth tree of choices. Sorry if this is obvious, but I don't have much time to research it.
  • MROB
    MROB over 4 years
    @Ciro Santilli 新疆改造中心法轮功六四事件 you are right! I was referring the rebalancing process. I have deleted my original comment. The insertion is the same as in AVL which takes O(logn)
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com over 4 years
    @MROB thanks for confirming! In any case, this made be learn a little bit about WAVL, so I added a new stub header to the question about it.
  • Hanhan Li
    Hanhan Li about 4 years
    I think the 'find' operation can be O(1) instead of O(n) for heap if we build an auxiliary dictionary that maps the values to the positions in the heap and update the dictionary correspondingly for any operation.
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com about 4 years
    @user3320467 yes that should work. It has the downside that inserts will have their speed bound by the hashmap (also O(1) but slower than heap as shown on benchmark).
  • user1735921
    user1735921 over 3 years
    by FAAAAR this answer is the best
  • Yordan Grigorov
    Yordan Grigorov about 3 years
    At one point you say that doubly-linked-lists limit your inserting possibilities to either the head or the tail, but that's not strictly so. You could insert anywhere in linear time.
  • Ciro Santilli OurBigBook.com
    Ciro Santilli OurBigBook.com about 3 years
    @YordanGrigorov yes, I meant as an isolated operation when you don't have the intermediate pointer, clarified further now
  • Vortex
    Vortex over 2 years
    The link to lecture CSE 373 is absolutely amazing. Thank you for mentioning it