Advantage of B+ trees over BSTs?

database data-structures tree binary-search-tree b-tree

13,017

Solution 1

The major advantage of the B+ tree (and B-trees in general) over binary search trees is that they play well with caches. If you have a binary search tree whose nodes are stored in more or less random order in memory, then each time you follow a pointer, the machine will have to pull in a new block of memory into the processor cache, which is dramatically slower than accessing memory already in cache.

The B+-tree and the B-tree work by having each node store a huge number of keys or values and have a large number of children. They are typically packed together in a way that makes it possible for a single node to fit nicely into cache (or, if stored on disk, to be pulled from the disk in a single read operation). You then have to do more work to find a key within the node or determine which child to read next, but because all memory accesses done on a single node can be done without going back to disk, the access times are very small. This means that even though in principle a BST might be better in terms of number of memory accesses, the B+-tree and the B-tree can performed better in terms of the runtime of those memory accesses.

The typical use case for a B+-tree or B-tree is in a database, where there is a huge amount of information and the data are so numerous that they can't all fit into main memory. Accordingly, the data can then be stored in a B+-tree or B-tree on a hard disk somewhere. This minimizes the number of disk reads necessary to pull in the data during lookups. Some filesystems (like ext4, I believe) use B-trees as well for the same reason - they minimize the number of disk lookups necessary, which is the real bottleneck.

Hope this helps!

Solution 2

The real life storage of data(e.g., in DB) requires a lot of data to be stored. Since data retrieval is the basic operation of concern, it is quite time-consuming to read data from disk than RAM.

Now, this is the catch...

BST stores lesser data in a node as compared to B+ Trees. This results in the increased height of BST than B+ trees. So they are stored on the disk rather than RAM.

Every time data has to be retrieved from the tree, the data from the disk has to be loaded into the main memory(which is, of course, a time-consuming process) while, in case of B+ trees, the data is already there in RAM and the required node is fetched directly and data is retrieved from that node which may contain many children(but the overall time for data retrieval is lesser in case of B+ trees because there is no need of loading data from disk to RAM).

13,017

riggspc

Updated on September 14, 2022

Comments

riggspc over 1 year

I'm learning about B+ trees in a class about databases and I was wondering what concrete advantages B+ trees would give over Binary Search Trees?

It seems like they both have O(logN) average complexity for most operations of note but B+ trees also have an additional (negligible?) search time at each child node where BSTs obviously only take O(1) time to figure out which child node to advance to.

What real-world advantages make B+ trees more popular in databases than BSTs?
Zephyr almost 7 years

I am not able to understand the statement " the B tree can perform better in terms of runtime of those memory accesses". Can you please explain it?
templatetypedef almost 7 years

@Xylene23 Not all memory accesses take the same amount of time to complete due to caching effects. A BST touches fewer memory locations on lookups than B trees, but the cost of those accesses is high because each access likely costs a cache miss. A B tree touches more total memory locations, but the costs of those accesses are lower because there will be fewer cache misses.
Zephyr almost 7 years

Thanks for explaining.