What is the left-child, right-sibling representation of a tree? Why would you use it?

22,764

The left-child, right-sibling representation (LCRS) is a way of encoding a multi-way tree (a tree structure in which each node can have any number of children) using a binary tree (a tree structure in which each node can have at most two children).

Motivation

To motivate how this representation works, let's begin by considering a simple multi-way tree, like this one here:

                   A
                 //|\ \
               / / | \  \
              B C  D  E  F
              |   /|\   / \
              G  H I J K   L

(Apologies for the low-quality ASCII artwork!)

In this tree structure, we can navigate downward from any node in the tree to any of its children. For example, we can migrate from A to B, A to C, A to D, etc.

If we wanted to represent a node in a tree like this one, we would normally use some sort of node structure / node class like this one here (written in C++):

struct Node {
    DataType value;
    std::vector<Node*> children;
};

In the LCRS representation, we represent the multi-way tree in a way where each node requires at most two pointers. To do this, we will slightly reshape the tree. Rather than having each node in the tree store pointers to all of its children, we will structure the tree in a slightly different way by having each node store a pointer to just one of its children (in LCRS, the leftmost child). We will then have each node store a pointer to its right sibling, the next node in the tree that is the child of the same parent node. In the case of the above tree, we might represent the tree in the following way:

            A
           /
          /
         /
        B -> C -> D -> E -> F
       /         /         /
      G         H->I->J   K->L

Notice that in this new structure it is still possible to navigate from a parent node to its kth child (zero-indexed). The procedure for doing so is the following:

  • Descend into the left child of the current node. (This is the first node in the list of its children).
  • Follow that child's right sibling pointer k times. (This takes you to the kth child of the node).
  • Return that node.

For example, to find the third (zero-indexed child) of the root node A, we would descend to the leftmost child (B), then follow three right links (visiting B, C, D, and finally E). We then arrive at the node for E, which holds the third child of node A.

The main reason for representing the tree this way is that even though any node may have any number of children, the representation requires at most two pointers for each node: one node to store the leftmost child, and one pointer to store the right sibling. As a result, we can store the multiway tree using a much simpler node structure:

struct Node {
    DataType data;
    Node* leftChild;
    Node* rightSibling;
};

This node structure has exactly the same shape of a node in a binary tree (data plus two pointers). As a result, the LCRS representation makes it possible to represent an arbitrary multiway tree using just two pointers per node.

Analysis

Let us now look at the time and space complexity of the two different representations of the multiway tree and some basic operations on it.

Let's begin by looking at the total space usage required for the initial "dynamic array of children" representation. How much total memory usage will there be for an n-node tree? Well, we know the following:

  1. Each node, regardless of its number of children, contributes space for the raw data stored (sizeof(data)) and the space overhead of the dynamic array. The dynamic array (typically) has one pointer stored (which points at the allocated array), one machine word for the total number of elements in the dynamic array, and (optionally) one machine word for the allocated capacity of the array. This means that each node takes up sizeof(Data) + sizeof(Node *) + 2 * sizeof(machine word) space.

  2. Across all of the dynamic arrays in the tree, there will be n - 1 pointers to children, since of the n nodes in the tree n - 1 of them have parents. That adds in an extra (n - 1) * sizeof(Node *) factor.

Therefore, the total space usage is

n · (sizeof(Data) + sizeof(Node*) + 2 * sizeof(machine word)) + (n - 1) * sizeof(Node *)

= n * sizeof(Data) + (2n - 1) * sizeof(Node*) + 2n * sizeof(machine word)

Now, let's contrast that with an LCRS tree. Each node in an LCRS tree stores two pointers (2 * sizeof(Node*)) and one data element (sizeof(Data)), so its total space is

n * sizeof(Data) + 2n * sizeof(Node *)

And here we see the win: notice that we are not storing 2n * sizeof(machine word) extra memory to keep track of allocated array sizes. This means that the LCRS representation uses considerably less memory than a regular multiway tree.

However, basic operations on the LCRS tree structure tend to take longer than their corresponding operations on the normal multi-way tree. Specifically, in a multi-way tree represented in the standard form (each node stores an array of child pointers), the time required to navigate from one node to its kth child is given by the time required to follow a single pointers. On the other hand, in the LCRS representation, the time required to do this is given by the time required to follow k + 1 pointers (one left child pointer, then k right child pointers). As a result, if the tree has a large branching factor, it can be much slower to do lookups in an LCRS tree structure than in the corresponding multiway tree structure.

We can therefore think of the LCRS representation as offering a time-space tradeoff between data structure storage space and access times. The LCRS representation has less memory overhead than the original multiway tree, while the multiway tree gives constant-time lookups of each of its children.

Use Cases

Because of the time-space tradeoff involved in the LCRS representation, the LCRS representation is typically not used unless one of two criteria hold:

  1. Memory is extremely scarce, or
  2. Random access of a node's children is not required.

Case (1) might arise if you needed to store a staggeringly huge multiway tree in main memory. For example, if you needed to store a phylogenetic tree containing many different species subject to frequent updates, then the LCRS representation might be appropriate.

Case (2) arises in specialized data structures in which the tree structure is being used in very specific ways. For example, many types of heap data structures that use multiway trees can be space optimized by using the LCRS representation. The main reason for this is that in heap data structures, the most common operations tend to be

  1. Remove the root of a tree and process each of its children, or
  2. Join two trees together by making one tree a child of the other.

Operation (1) can be done very efficiently in the LCRS representation. In an LCRS representation, the root of the tree never has a right child (since it has no siblings), and therefore removing the root simply means peeling off the root node and descending into its left subtree. From there, processing each child can be done by simply walking down the right spine of the remaining tree and processing each node in turn.

Operation (2) can be done very efficiently as well. Recall from above that in an LCRS representation, the root of a tree never has a right child. Therefore, it is very easy to join together two trees in LCRS representation as follows. Beginning with two trees like this:

    R1             R2
   /               /
 (children 1)    (children 2)

We can fuse the trees together in this way:

             R1
            /
           R2
          /  \
(children 2) (children 1)

This can be done in O(1) time, and is quite simple. The fact that the LCRS representation has this property means that many types of heap priority queues, such as the binomial heap or rank-pairing heap are typically represented as LCRS trees.

Hope this helps!

Share:
22,764
templatetypedef
Author by

templatetypedef

I love teaching, learning, and that really great feeling you get when you finally understand something.

Updated on December 13, 2020

Comments

  • templatetypedef
    templatetypedef over 3 years

    Many data structures store multi-way trees as binary trees using a representation called the "left-child, right-sibling" representation. What does this mean? Why would you use it?

  • stackErr
    stackErr over 11 years
    This is a great answer! It helped me understand LCRS for my data structures course. Thanks!
  • Vibhuti
    Vibhuti almost 11 years
    Wow! awesome explanations of LCRS! Thanks a lot!
  • Mahesha999
    Mahesha999 about 8 years
    can you please tell me in which reference books this topic is discussed. Not finding it in any...
  • templatetypedef
    templatetypedef about 8 years
    @Mahesha999 I'm not sure of any reference books this would directly appear in. It's commonly discussed in passing when talking about data structures like binomial heaps, but I can't imagine anyone writing an in-depth book chapter on it.
  • Mahesha999
    Mahesha999 about 8 years
    I seriously didnt get "If we were to store ... most schemes)". 1. 'two machine words (allocated size and logical size)'...why? those pointers in an array not pointing to actual children can be kept null 2. 'one extra pointer (to the dynamically-allocated elements)'...didnt get it at all. Which dynamically-allocated elements? may be am missing some implementation details. (Am assuming a node contains array pointers to its children) 3. Also how 'totaling at most twice the total number of pointers, in most schemes'?
  • templatetypedef
    templatetypedef about 8 years
    @Mahesha999 If you have a static tree, then you can just store the array of pointers itself. However, if you have a dynamic tree where you're anticipating inserting and deleting child nodes, chances are you'd use a dynamic array to store those pointers, since otherwise every addition or deletion of a child would necessitate a newly-allocated array.
  • overexchange
    overexchange over 7 years
    @stackErr Great answer? space complexity is .. + 2.sizeof(Node *)? There are 11 pointers in your 12 node LCRS tree. Did not get the asymptotic analysis, from the answer. In LCRS tree, I guess it is again n.sizeof(data) + (n-1)sizeof(Node *). It is just that, child pointers are replaced by sibling pointers
  • templatetypedef
    templatetypedef over 7 years
    @overexchange You're correct - my initial analysis was incorrect (there was a missing factor of n on the sizeof(Node*) term). I've redone the analysis and tried to make clearer how the analysis proceeds, which ends up arriving at essentially the same conclusion (the LCRS representation is definitely more space efficient). Can you double-check my work?
  • overexchange
    overexchange over 7 years
    @templatetypedef Wherever you edited, can you please mention Edit, it helps me read your answer again. Anyways, I took complete day today to avoid the confusion after reading your answer. Am learning tree for the first time today. I am yet to learn asymptotic analysis process.
  • templatetypedef
    templatetypedef over 7 years
    @overexchange I updated the section starting with "analysis."
  • overexchange
    overexchange over 7 years
    @templatetypedef I think, you are right LCRS tree takes less space compared to multi walk tree. Mainly the difference matters in maintaining space for dynamic array in multi walk tree representation.
  • overexchange
    overexchange over 7 years
    @templatetypedef Apart from LCRS & Multi walk, Do you know any other representation? seriously used in the software market? If yes, Please share that information
  • Zephyr
    Zephyr over 6 years
    @templatetypedef How will you represent a tree which has only two nodes where root has only one right node ? Will you make that right child in original tree as left child in this representation?
  • templatetypedef
    templatetypedef over 6 years
    @Zephyr Let’s extend this a bit. Imagine I have a multiway tree where I have a root node with one child, and I want that child to be the 137th child of the root, with the first 136 children missing. We’d similarly have trouble representing that tree structure. The issue here is that the LCRS representation is designed to encode multiway trees that have the children stored in order, but not with the idea that we can assign arbitrary numbers to the positions of those children. That is, it’s assumed the children exist in sequential order with no gaps, so you couldn’t have your missing left child.
  • templatetypedef
    templatetypedef over 6 years
    @Zephyr It’s not too hard to think of a way to represent such a tree, though. You could add in dummy nodes marked as “I’m not actually a node” to mark missing nodes, or you could tag each node with an index to mark which logical position that node belongs to.
  • dexter2406
    dexter2406 over 3 years
    A stupid question: if a node doesn't have LC/RS, then the corresponding pointer is set to NULL, right?