Binary search vs binary search tree

18,275

Solution 1

Your analysis is wrong, both insertion and deletion is O(n) for a sorted array, because you have to physically move the data to make space for the insertion or compress it to cover up the deleted item.

Oh and the worst case for completely unbalanced binary search trees is O(n), not O(logn).

Solution 2

There's not much of a benefit in querying either one.

But constructing a sorted tree is a lot faster than constructing a sorted array, when you're adding elements one at a time. So there's no point in converting it to an array when you're done.

Solution 3

Note also that there are standard algorithms for maintaining balanced binary search trees. They get rid of the deficiencies in binary trees and maintain all of the other strengths. They are complicated, though, so you should learn about binary trees first.

Beyond that, the big-O may be the same, but the constants aren't always. With binary trees if you store the data correctly, you can get very good use of caching at multiple levels. The result is that if you are doing a lot of querying, most of your work stays inside of CPU cache which greatly speeds things up. This is particularly true if you are careful in how you structure your tree. See http://blogs.msdn.com/b/devdev/archive/2007/06/12/cache-oblivious-data-structures.aspx for an example of how clever layout of the tree can improve performance greatly. An array that you do a binary search of does not permit any such tricks to be used.

Share:
18,275
john
Author by

john

Updated on June 02, 2022

Comments

  • john
    john almost 2 years

    What is the benefit of a binary search tree over a sorted array with binary search? Just with mathematical analysis I do not see a difference, so I assume there must be a difference in the low-level implementation overhead. Analysis of average case run time is shown below.

    Sorted array with binary search
    search: O(log(n))
    insertion: O(log(n)) (we run binary search to find where to insert the element)
    deletion: O(log(n)) (we run binary search to find the element to delete)

    Binary search tree
    search: O(log(n))
    insertion: O(log(n))
    deletion: O(log(n))

    Binary search trees have a worst case of O(n) for operations listed above (if tree is not balanced), so this seems like it would actually be worse than sorted array with binary search.

    Also, I am not assuming that we have to sort the array beforehand (which would cost O(nlog(n)), we would insert elements one by one into the array, just as we would do for the binary tree. The only benefit of BST I can see is that it supports other types of traversals like inorder, preorder, postorder.

  • Rob Neuhaus
    Rob Neuhaus about 13 years
    If you don't have to support insertion or deletion (eg, you build up a data set once up front), a sorted array is going to be faster by a pretty signficant constant factor than binary search trees. You don't really have any space overhead with the array, and your caches work a lot better when your data is compact and you don't have to chase pointers around.
  • davidmw
    davidmw almost 9 years
    Except that's not what you said, Mehrdad. You said there would be no benefit querying either one and further, that there would be no point converting to an array while Rob Neuhaus said the exact opposite: that a sorted array would have a large constant-factor benefit over the tree. Rob is correct.
  • user541686
    user541686 almost 9 years
    @davidmw: His comment literally started with "if you don't have to support insertion"; my answer made the opposite conclusion because it was literally about the opposite case, "when you're adding elements one at a time"...
  • davidmw
    davidmw almost 9 years
    You're again missing the point. Both Rob Neuhaus and I weren't commenting on the second part of your claim, only the first, namely "There's not much of a benefit in querying either one." There is a constant-factor benefit to querying a sorted array. Your statement to the contrary is wrong. I agree with your second claim but wanted to correct your first claim. Do you disagree that there is a constant-factor benefit to querying a sorted array?
  • user541686
    user541686 almost 9 years
    @davidmw: Did you notice the entire question was about time complexity? When you're worried about time complexity, constant factors are negligible. That's the entire point. A constant factor benefit is therefore quite literally the kind of thing I would categorize under "not much benefit" when someone asks me about time complexity...
  • davidmw
    davidmw almost 9 years
    Of course I saw the entire question. The question states an interest in "low level implementation overhead." I agree that for querying this can only be a constant factor. The question comes in whether "a pretty significant constant factor" as Rob says is the same assertion as "not much difference." I'd disagree that a "pretty significant constant factor" should be categorized under "not much benefit" when we've already agreed that the big-O is the same. A "pretty significant constant factor" is quite literally a significant benefit in real programs. They are two different assertions.
  • Victor Polevoy
    Victor Polevoy about 8 years
    Can you please elaborate on how binary tree nodes can lay inside a cache? It uses pointers to a non-sequential memory which means here is no spatial locality and data-cache misses. Also, from your link, quote - "we have a static list of records, and we wish to find the record with a particular key. Traditionally, this problem is solved with either an array and binary search, or a binary search tree. Both of these approaches exhibit dismal cache behavior." which says you were wrong. Specify what kind of correctness in With binary trees if you store the data correctly you mean.
  • yyny
    yyny over 5 years
    Be careful to draw conclusions from this, though, since moving chunks of memory around in RAM is one of the most optimized operations (in modern computers).
  • Blindy
    Blindy over 5 years
    Doesn't matter, access is still linear.
  • yyny
    yyny over 5 years
    It kinda does for small enough arrays that fit in cache.
  • KRoy
    KRoy over 3 years
    Insertion in sorted array has O (logn) processor operation, but O(n) memory operation. While memory operation is done O (n) time ( for example using std::rotate ) , processor is free to do hyperthreading or pipelining . Also @yyny reported the benefits of cache and array access over pointer access.
  • KRoy
    KRoy over 3 years
    By using binary tree you just add the 3 pointers for each element that costs memory and exhausts the cache . Complex binary tree means more instruction and processor time and more branch prediction .
  • btilly
    btilly over 3 years
    @VictorPolevoy I don't know how I missed that comment. The article on cache oblivious data structures has the elaboration that you seek.
  • btilly
    btilly over 3 years
    @shuva All data structures have tradeoffs. But the case against trees is not as open and shut as you indicate. If it was, the C++ STL would not default to red-black trees for Map and Set data structures. And most database indexes would not be built on B+tree data structures. That said, there are plenty of use cases that they aren't good for.