Is it possible to make efficient pointer-based binary heap implementations?

13,415

Solution 1

Solution 1: Maintain a pointer to the last node

In this approach a pointer to the last node is maintained, and parent pointers are required.

  • When inserting, starting at the last node navigate to the node below which a new last node will be inserted. Insert the new node and remember it as the last node. Move it up the heap as needed.

  • When removing, starting at the last node navigate to the second-to-last node. Remove the original last node and remember the the new last node just found. Move the original last node into the place of the deleted node and then move it up or down the heap as needed.

It is possible to navigate to the mentioned nodes in O(log(n)) time and O(1) space. Here is a description of the algorithms but the code is available below:

  • For insert: If the last node is a left child, proceed with inserting the new node as the right child of the parent. Otherwise... Start at the last node. Move up as long as the current node is a right child. If the root was not reached, move to the sibling node at the right (which necessarily exists). Then (whether or not the root was reached), move down to the left as long as possible. Proceed by inserting the new node as the left child of the current node.

  • For remove: If the last node is the root, proceed by removing the root. Otherwise... Start at the last node. Move up as long as the current node is a left child. If the root was not reached, move to the sibling left node (which necessarily exists). Then (whether or not the root was reached), move down to the right as long as possible. We have arrived at the second-to-last node.

However, there are some things to be careful about:

  • When removing, there are two special cases: when the last node is being removed (unlink the node and change the last node pointer), and when the second-to-last node is being removed (not really special but the possibility must be considered when replacing the deleted node with the last node).

  • When moving nodes up or down the heap, if the move affects the last node, the last-node pointer must be corrected.

Long ago I have made an implementation of this. In case it helps someone, here is the code. Algorithmically it should be correct (has also been subjected to stress testing with verification), but there is no warranty of course.

Solution 2: Reach the last node from the root

This solution requires maintaining the node count (but not parent pointers or the last node). The last (or second-to-last) node is found by navigating from the root towards it.

Assume the nodes are numbered starting from 1, as per the typical notation for binary heaps. Pick any valid node number and represent it in binary. Ignore the first (most significant) 1 bit. The remaining bits define the path from the root to that node; zero means left and one means right.

For example, to reach node 11 (=1011b), start at the root then go left (0), right (1), right (1).

This algorithm can be used in insert to find where to place the new node (follow the path for node node_count+1), and in remove to find the second-to-last-node (follow the path for node node_count-1).

This approach is used in libuv for timer management; see their implementation of the binary heap.

Usefulness of Pointer-based Binary Heaps

Many answers here and even literature say that an array-based implementation of a binary heap is strictly superior. However I contest that because there are situations where the use of an array is undesirable, typically because the upper size of the array is not known in advance and on-demand reallocations of an array are not deemed acceptable, for example due to latency or possibility of allocation failure.

The fact that libuv (a widely used event loop library) uses a binary heap with pointers only further speaks for this.

It is worth noting that the Linux kernel uses (pointer-based) red-black trees as a priority queue in a few cases, for example for CPU scheduling and timer management (for the same purpose as in libuv). I find it likely that changing these to use a pointer-based binary heap will improve performance.

Hybrid Approach

It is possible to combine Solution 1 and Solution 2 into a hybrid approach which dynamically picks either of the algorithms (for finding the last or second-to-last node), the one with a lower cost, measured in the number of edges that need to be traversed. Assume we want to navigate to node number N, and highest_bit(X) means the 0-based index of the highest-order bit in N (0 means the LSB).

  • The cost of navigating from the root (Solution 2) is highest_bit(N).

  • The cost of navigating from the previous node which is on the same level (Solution 1) is: 2 * (1 + highest_bit((N-1) xor N)).

Note that in the case of a level change the second equation will yield a wrong (too large) result, but in that case traversal from the root is more efficient anyway (for which the estimate is correct) and will be chosen, so there is no need for special handling.

Some CPUs have an instruction for highest_bit allowing very efficient implementation of these estimates. An alternative approach is to maintain the highest bit as a bit mask and do these calculation with bit masks instead of bit indices. Consider for example that 1 followed by N zeroes squared is equal to 1 followed by 2N zeroes).

In my testing it has turned out that Solution 1 is on average faster than Solution 2, and the hybrid approach appeared to have about the same average performance as Solution 2. Therefore the hybrid approach is only useful if one needs to minimize the worst-case time, which is (twice) better in Solution 2; since Solution 1 will in the worst case traverse the entire height of the tree up and then down.

Code for Solution 1

Note that the traversal code in insert is slightly different from the algorithm described above but still correct.

struct Node {
    Node *parent;
    Node *link[2];
};

struct Heap {
    Node *root;
    Node *last;
};

void init (Heap *h)
{
    h->root = NULL;
    h->last = NULL;
}

void insert (Heap *h, Node *node)
{
    // If the heap is empty, insert root node.
    if (h->root == NULL) {
        h->root = node;
        h->last = node;
        node->parent = NULL;
        node->link[0] = NULL;
        node->link[1] = NULL;
        return;
    }

    // We will be finding the node to insert below.

    // Start with the current last node and move up as long as the
    // parent exists and the current node is its right child.
    Node *cur = h->last;
    while (cur->parent != NULL && cur == cur->parent->link[1]) {
        cur = cur->parent;
    }

    if (cur->parent != NULL) {
        if (cur->parent->link[1] != NULL) {
            // The parent has a right child. Attach the new node to
            // the leftmost node of the parent's right subtree.
            cur = cur->parent->link[1];
            while (cur->link[0] != NULL) {
                cur = cur->link[0];
            }
        } else {
            // The parent has no right child. This can only happen when
            // the last node is a right child. The new node can become
            // the right child.
            cur = cur->parent;
        }
    } else {
        // We have reached the root. The new node will be at a new level,
        // the left child of the current leftmost node.
        while (cur->link[0] != NULL) {
            cur = cur->link[0];
        }
    }

    // This is the node below which we will insert. It has either no
    // children or only a left child.
    assert(cur->link[1] == NULL);

    // Insert the new node, which becomes the new last node.
    h->last = node;
    cur->link[cur->link[0] != NULL] = node;
    node->parent = cur;
    node->link[0] = NULL;
    node->link[1] = NULL;

    // Restore the heap property.
    while (node->parent != NULL && value(node->parent) > value(node)) {
        move_one_up(h, node);
    }
}

void remove (Heap *h, Node *node)
{
    // If this is the only node left, remove it.
    if (node->parent == NULL && node->link[0] == NULL && node->link[1] == NULL) {
        h->root = NULL;
        h->last = NULL;
        return;
    }

    // Locate the node before the last node.
    Node *cur = h->last;
    while (cur->parent != NULL && cur == cur->parent->link[0]) {
        cur = cur->parent;
    }
    if (cur->parent != NULL) {
        assert(cur->parent->link[0] != NULL);
        cur = cur->parent->link[0];
    }
    while (cur->link[1] != NULL) {
        cur = cur->link[1];
    }

    // Disconnect the last node.
    assert(h->last->parent != NULL);
    h->last->parent->link[h->last == h->last->parent->link[1]] = NULL;

    if (node == h->last) {
        // Deleting last, set new last.
        h->last = cur;
    } else {
        // Not deleting last, move last to node's place.
        Node *srcnode = h->last;
        replace_node(h, node, srcnode);
        // Set new last unless node=cur; in this case it stays the same.
        if (node != cur) {
            h->last = cur;
        }

        // Restore the heap property.
        if (srcnode->parent != NULL && value(srcnode) < value(srcnode->parent)) {
            do {
                move_one_up(h, srcnode);
            } while (srcnode->parent != NULL && value(srcnode) < value(srcnode->parent));
        } else {
            while (srcnode->link[0] != NULL || srcnode->link[1] != NULL) {
                bool side = srcnode->link[1] != NULL && value(srcnode->link[0]) >= value(srcnode->link[1]);
                if (value(srcnode) > value(srcnode->link[side])) {
                    move_one_up(h, srcnode->link[side]);
                } else {
                    break;
                }
            }
        }
    }
}

Two other functions are used: move_one_up moves a node one step up in the heap, and replace_node replaces moves an existing node (srcnode) into the place held by the node being deleted. Both work only by adjusting the links to and from the other nodes, there is no actual moving of data involved. These functions should not be hard to implement, and the mentioned link includes my implementations.

Solution 2

The pointer based implementation of the binary heap is incredibly difficult when compared to the array based implementation. But it is fun to code it. The basic idea is that of a binary tree. But the biggest challenge you will have is to keep it left-filled. You will have difficulty in finding the exact location as to where you must insert a node.

For that, you must know binary traversal. What we do is. Suppose our heap size is 6. We will take the number + 1, and convert it to bits. The binary representation of 7 is, "111". Now, remember to always omit the first bit. So, now we are left with "11". Read from left-to-right. The bit is '1', so, go to the right child of the root node. Then the string left is "1", the first bit is '1'. As you have only 1 bit left, this single bit tells you where to insert the new node. As it is '1' the new node must be the right child of the current node. So, the raw working of the process is that, convert the size of the heap into bits. Omit the first bit. According to the leftmost bit, go to the right child of the current node if it is '1', and to the left child of the current node if it is '0'.

After inserting the new node, you will bubble it up the heap. This tells you that you will be needing the parent pointer. So, you go once down the tree and once up the tree. So, your insertion operation will take O(log N).

As for the deletion, it is still a challenge to find the last node. I hope you are familiar with deletion in a heap where we swap it with the last node and do a heapify. But for that you need the last node, for that too, we use the same technique as we did for finding the location to insert the new node, but with a little twist. If you want to find the location of the last node, you must use the binary representation of the value HeapSize itself, not HeapSize + 1. This will take you to the last node. So, the deletion will also cost you O(log N).

I'm having trouble in posting the source code here, but you can refer to my blog for the source code. In the code, there is Heap Sort too. It is very simple. We just keep deleting the root node. Refer to my blog for explanation with figures. But I guess this explanation would do.

I hope my answer has helped you. If it did, let me know...! ☺

Solution 3

For those saying this is a useless exercise, there are a couple of (admittedly rare) use cases where a pointer-based solution is better. If the max size of the heap is unknown, then an array implementation will need to stop-and-copy into fresh storage when the array fills. In a system (e.g. embedded) where there are fixed response time constraints and/or where free memory exists, but not a big enough contiguous block, this may be not be acceptable. The pointer tree lets you allocate incrementally in small, fixed-size chunks, so it doesn't have these problems.

To answer the OP's question, parent pointers and/or elaborate tracking aren't necessary to determine where to insert the next node or find the current last one. You only need the bits in the binary rep of the heap's size to determine the left and right child pointers to follow.

Edit Just saw Vamsi Sangam@'s explanation of this algorithm. Nonetheless, here's a demo in code:

#include <stdio.h>
#include <stdlib.h>

typedef struct node_s {
  struct node_s *lft, *rgt;
  int data;
} NODE;

typedef struct heap_s {
  NODE *root;
  size_t size;
} HEAP;

// Add a new node at the last position of a complete binary tree.
void add(HEAP *heap, NODE *node) {
  size_t mask = 0;
  size_t size = ++heap->size;
  // Initialize the mask to the high-order 1 of the size.
  for (size_t x = size; x; x &= x - 1) mask = x;
  NODE **pp = &heap->root;
  // Advance pp right or left depending on size bits.
  while (mask >>= 1) pp = (size & mask) ? &(*pp)->rgt : &(*pp)->lft;
  *pp = node;
}   

void print(NODE *p, int indent) {
  if (!p) return;
  for (int i = 0; i < indent; i++) printf(" ");
  printf("%d\n", p->data);
  print(p->lft, indent + 1);
  print(p->rgt, indent + 1);
}   

int main(void) {
  HEAP h[1] = { NULL, 0 };
  for (int i = 0; i < 16; i++) {
    NODE *p = malloc(sizeof *p);
    p->lft = p->rgt = NULL;
    p->data = i;
    add(h, p);
  }   
  print(h->root, 0);
}   

As you'd hope, it prints:

0
 1
  3
   7
    15
   8
  4
   9
   10
 2
  5
   11
   12
  6
   13
   14

Sift-down can use the same kind of iteration. It's also possible to implement the sift-up without parent pointers using either recursion or an explicit stack to "save" the nodes in the path from root to the node to be sifted.

Solution 4

A binary heap is a complete binary tree obeying the heap property. That's all. The fact that it can be stored using an array, is just nice and convenient. But sure, you can implement it using a linked structure. It's a fun exercise! As such, it is mostly useful as an exercise or in more advanced datastructures( meldable, addressable priority queues for example ), as it is quite a bit more involved than doing the array version. For example, think about siftup/siftdown procedures, and all the edge cutting/sewing you'll need to get right. Anyways, it's not too hard, and once again, good fun!

Share:
13,415
Derek Chiang
Author by

Derek Chiang

Updated on July 18, 2022

Comments

  • Derek Chiang
    Derek Chiang almost 2 years

    Is it even possible to implement a binary heap using pointers rather than an array? I have searched around the internet (including SO) and no answer can be found.

    The main problem here is that, how do you keep track of the last pointer? When you insert X into the heap, you place X at the last pointer and then bubble it up. Now, where does the last pointer point to?

    And also, what happens when you want to remove the root? You exchange the root with the last element, and then bubble the new root down. Now, how do you know what's the new "last element" that you need when you remove root again?

  • WhozCraig
    WhozCraig over 10 years
    I see no reason why you cannot implement a Heap in a binary tree. A heap is an ordering, the container in which it resides is an implementation. The node indexing you refer to is a way of implementing that ordering in a linear sequence, but that doesn't mean it has to be that way to be a heap. Why you would want to do this in a pointer-based node tree is another (potentially psychiatric) issue altogether.
  • rliu
    rliu over 10 years
    Actually a heap is a tree-based implementation of an abstract data type called the "priority queue". It isn't the array based implementation of a priority queue nor is it an "ordering". It's a tree which happens to satisfy the heap property; it doesn't need to be binary even. edit Sorry I see now that he refers to binary heaps specifically in the question
  • Jim Mischel
    Jim Mischel over 10 years
  • Jim Mischel
    Jim Mischel over 10 years
    The point is that a "tree" is an abstract data type, regardless of its implementation. A heap implemented in an array is still a tree. I do agree, though, that it's inefficient in the extreme to implement a binary heap in a linked tree structure, especially when doing so in an array is much easier. All that said, your expanded answer caused me to remove my downvote.
  • Misguided
    Misguided over 9 years
    There is a reason to implement a heap in a binary tree: observe that insertion in a vector-based heap is O(n) in the worst case (that is, a grow operation), you could want to avoid the complexity overhead.
  • Hogan
    Hogan over 9 years
    @Julián - Worst case is not a data structure's performance. If you expect the worst case results all the time then you need to re-work your requirements based on expected input data.
  • Ambroz Bizjak
    Ambroz Bizjak over 7 years
    This is wrong. In the fourth case for insert, it is needed to move up the tree arbitrarily high until the node is a right child, then move to its left left sibling, and move down to left children as far as possible. The remove must have a similar issue.
  • Stargateur
    Stargateur over 7 years
    I saw that you cast to uint8_t * and your offset_t is of type int. An offset should be of type size_t. And char * is the correct type if you move your pointer of x bytes because char is guarantee to represent one byte. uint8_t is guarantee to be an octet not a byte. So you should cast to char *. Maybe I'm wrong I didn't read all your code.
  • Gene
    Gene over 7 years
    Never say never ;-) . There is a rare but real use case where the pointer tree has an advantage. See my answer.
  • Ambroz Bizjak
    Ambroz Bizjak over 7 years
    @Stargateur I appreciate you took a look at the code :) But, I know all that. The linked code is pretty old and I linked to a specific commit because the code was deleted long ago. The point of this answer is to show a working algorithm.