How sets, multisets, maps and multimaps work internally

11,886

These 4 containers are typically all implemented using "nodes". A node is an object that stores one element. In the [multi]set case, the element is just the value; in the [multi]map case each node stores one key and its associated value. A node also stores multiple pointers to other nodes. Unlike a list, the nodes in sets and maps form a tree. You'd typically arrange it such that branches on the "left" of a certain node have values less than that node, while branches on the "right" of a certain node have values higher than that node.

Operations like finding a map key/set value are now quite fast. Start at the root node of the tree. If that matches, you're done. If the root is larger, search in the left branch. If the root is smaller than the value you're looking for, follow the pointer to the right branch. Repeat until you find a value or an empty branch.

Inserting an element is done by creating a new node, finding the location in the tree where it should be placed, and then inserting the node there by adjusting the pointers around it. Finally, there is a "rebalancing" operation to prevent your tree from ending up all out of balance. Ideally each right and left branch is about the same size. Rebalancing works by shifting some nodes from the left to the right or vice versa. E.g. if you have values {1 2 3} and your root node would be 1, you'd have 2 and 3 on the left branch and an empty right branch:

1
 \
  2
   \
    3

This is rebalanced by picking 2 as the new root node:

  2
 / \
1   3

The STL containers use a smarter, faster rebalancing technique but that level of detail should not matter. It's not even specified in the standard which better technique should be used so implementations can differ.

Share:
11,886
faya
Author by

faya

Updated on June 09, 2022

Comments

  • faya
    faya almost 2 years

    How do multisets work? If a set can't have a value mapped to a key, does it only hold keys?

    Also, how do associative containers work? I mean vector and deque in the memory is located sequentially it means that deleting/removing (except beginning [deque] and end [vector, deque]) are slow if they are large.

    And list is a set of pointers which are not sequentially located in the memory which causes longer search but faster delete/remove.

    How are sets, maps, multisets and multimaps stored and how do they work?

  • SirGuy
    SirGuy over 5 years
    Your code snippet doesn't actually show any implementation, just the data structure. The implementation is where all the work is done, and where the usefulness of RB trees comes from.
  • ajaysinghnegi
    ajaysinghnegi almost 5 years
    So, are all these containers implemented using AVL trees?
  • MSalters
    MSalters almost 5 years
    @Jos: "It's not even specified in the standard which technique should be used so implementations can differ.".