The most efficient way to test two binary trees for equality

41,566

Solution 1

Well for one thing you're always checking the branches, even if you spot that the roots are unequal. Your code would be simpler (IMO) and more efficient if you just returned false as soon as you spotted an inequality.

Another option to simplify things is to allow your equals method to accept null values and compare two nulls as being equal. That way you can avoid all those nullity checks in the different branches. This won't make it more efficient, but it'll be simpler:

public boolean equals(Node<T> root1, Node<T> root2) {
    // Shortcut for reference equality; also handles equals(null, null)
    if (root1 == root2) {
        return true;
    }
    if (root1 == null || root2 == null) {
        return false;
    }
    return root1.getData().equals(root2.getData()) &&
           equals(root1.getLeft(), root2.getLeft()) &&
           equals(root1.getRight(), root2.getRight());
} 

Note that currently this will fail if root1.getData() returns null. (That may or may not be possible with the way you're adding nodes.)

EDIT: As discussed in comments, you could use hash codes to make a very quick "early out" - but it would add complexity.

Either you need to make your trees immutable (which is a whole other discussion) or you need each node to know about its parent, so that when the node is changed (e.g. by adding a leaf or changing the value) it needs to update its hash code and ask its parent to update too.

Solution 2

Out of curiosity, do you consider the two trees to be equal if their contents are identical, even if their structure is not? For example, are these equal?

  B         C        C      A
 / \       / \      / \      \
A   D     B   D    A   D      B
   /     /          \          \
  C     A            B          C
                                 \
                                  D

These trees have the same contents in the same order, but because the structures are different, by your tests would not be equal.

If you want to test this equality, personally I'd just build an iterator for the tree using in-order traversal and iterate through the trees comparing them element by element.

Solution 3

First of all I'm making a few general assumptions. These are assumptions that are valid for most tree-based collection classes but it's always worth checking:

  1. You consider two trees to be equal if and only if they are equal both in terms of tree structure and in terms of data values at each node (as defined by data.equals(...))
  2. null data values are allowed at tree nodes (this could be either because you allow null explicitly or because your data structure only stores non-null values at leaf nodes)
  3. There aren't any particular unusual facts you know about the distribution of data values that you can take advantage of (for example, if you knew that the only possible data vales were null or the String "foo", then you don't need to compare two non-null String values)
  4. The trees will typically be of moderate size and reasonably well balanced. In particular, this ensures that the trees will never be so deep that you run the risk of StackOverflowExceptions caused by deep recursion.

Assuming these assumptions are correct, then the approach I would suggest is:

  • Do root reference equality check first. this quickly eliminates the case of either two nulls or the same tree being passed in for comparison with itself. Both are very common cases, and the reference equality check is extremely cheap.
  • Check the nulls next. Non-null is obviously not equal to null, which enables you to bail out early plus it establishes a non-null guarantee for later code! A very smart compiler could also theoretically use this guarantee to optimise away null pointer checks later (not sure if the JVM currently does this)
  • Check data reference equality and nulls next. This avoids descending all the way down the tree branches which you would do even in the case of unequal data if you went down the tree branches first.
  • Check data.equals() next. Again you want to check data equality before tree branches. You do this after checking for nulls since data.equals() is potentially more expensive and you want to guarantee you won't get a NullPointerException
  • Check the equality of branches recursively as the last step. It doesn't matter if you do left or right first unless there is a greater likelihood of one side being unequal, in which case you should check that side first. This might be the case if e.g. most changes were being appended to the right branch of the tree....
  • Make the comparison a static method. This is because you want to use it recursively in a way that will accept nulls as either of the two parameters (hence it isn't suitable for an instance method as this cannot be null). In addition, the JVM is very good at optimising static methods.

My implementation would therefore be something like:

public static boolean treeEquals(Node a, Node b) {
    // check for reference equality and nulls
    if (a == b) return true; // note this picks up case of two nulls
    if (a == null) return false;
    if (b == null) return false;

    // check for data inequality
    if (a.data != b.data) {
        if ((a.data == null) || (b.data == null)) return false;
        if (!(a.data.equals(b.data))) return false;
    }

    // recursively check branches
    if (!treeEquals(a.left, b.left)) return false;
    if (!treeEquals(a.right, b.right)) return false;

    // we've eliminated all possibilities for non-equality, so trees must be equal
    return true;
}

Solution 4

For any tree the most efficient way to represent it so that you can easily check for equality is the parent list - hold an array in which for each vertex you remember the index of its parent(actually hold a pair - the index of the father and the data value). Then you simply should do a compare of two continuous blocks of memory.

This will only work if the tree is static(i.e does not change over time). Also it will only consider the trees to be equal if the vertices indexes are the same in the two trees.

I believe in the common case when the two statements above are not true, your implementation should be about as fast as you can get.

EDIT: in fact your implementation can be improved if you follow the advises in Jon Skeet's answer(at least return false as soon as you know the trees are not equal)

Share:
41,566
aviad
Author by

aviad

I make things work... and then I make them smaller and faster

Updated on July 05, 2022

Comments

  • aviad
    aviad almost 2 years

    How would you implement in Java the binary tree node class and the binary tree class to support the most efficient (from run-time perspective) equal check method (also has to be implemented):

        boolean equal(Node<T> root1, Node<T> root2) {}
    

    or

        boolean equal(Tree t1, Tree t2) {}
    

    First, I created the Node class as follows:

        public class Node<T> {
            private Node<T> left;
            private Node<T> right;
            private T data;
    
            // standard getters and setters
        }
    

    and then the equals method that takes 2 root nodes as an arguments and runs the standard recursive comparison:

        public boolean equals(Node<T> root1, Node<T> root2) {
            boolean rootEqual = false;
            boolean lEqual = false;
            boolean rEqual = false;    
    
            if (root1 != null && root2 != null) {
                rootEqual = root1.getData().equals(root2.getData());
    
                if (root1.getLeft()!=null && root2.getLeft() != null) {
                    // compare the left
                    lEqual = equals(root1.getLeft(), root2.getLeft());
                }
                else if (root1.getLeft() == null && root2.getLeft() == null) {
                    lEqual = true;
                }
                if (root1.getRight() != null && root2.getRight() != null) {
                    // compare the right
                    rEqual = equals(root1.getRight(), root2.getRight());
                }
                else if (root1.getRight() == null && root2.getRight() == null) {
                    rEqual = true;
                }
    
                return (rootEqual && lEqual && rEqual);
            }
            return false;
        } 
    

    My second attempt was to implement the trees using arrays and indexes for traversing. Then the comparison could be done using the bitwise operations (AND) on two arrays - read chunk from 2 arrays and mask one by another using logical AND. I failed to get my code working so I do not post it here (I'd appreciate your implementation of the second idea as well as your improvements).

    Any thoughts how to do equality test for binary trees most efficiently?

    EDIT

    The question assumes structural equality. (Not the semantic equality)

    However, code that tests the semantic equality e.g. "Should we consider the two trees to be equal if their contents are identical, even if their structure is not?" Would be just iterating over the tree in-order and it should be straightforward.

  • aviad
    aviad about 12 years
    Thanks Jon, agreed... How about pasting the code snippet from my question with your improvements in your answer so I could upvote? (I want to give other people channce to share their thoughts before accepting :))
  • aviad
    aviad about 12 years
    Thanks, this idea is (partially) mentioned in the question. I upvote your answer and I'd appreciate seeing the code that does that.
  • stryba
    stryba about 12 years
    Also you could employ hashcode before using equals. You are still facing a O(n) runtime.
  • aviad
    aviad about 12 years
    good point! Upvote from me. I guess the structure is important but this clarification should be made... I'll add this to the question. Lets see what others say
  • aviad
    aviad about 12 years
    @JonSkeet, what do you think about Hounshell's comment (see the first sentence of the answer)?
  • Jon Skeet
    Jon Skeet about 12 years
    @stryba: How would hashCode() help? It would have to iterate over the tree, which would be an O(n) operation... and even then you'd still need to iterate over everything if the two hash codes were equal.
  • Jon Skeet
    Jon Skeet about 12 years
    @aviad: I'd assumed you wanted structural equality as otherwise you'd have written the code differently :) We can't really tell you what your requirements are.
  • Hounshell
    Hounshell about 12 years
    Ultimately the decision as to whether the trees are the same lies in the problem statement and how tolerant you are of false negatives. If it's a textbook problem, they probably mean the structure is identical. In the real world this is usually useless because it's common to build trees in a non-deterministic order. Then you're essentially checking that they're the same object; a reference check would suffice. Also, keep in mind that some trees mutate their structure on reads, not just writes, like Treaps and Splay Trees
  • aviad
    aviad about 12 years
    @Jon Skeet, this was my assumption. However, since the question was asked, I wonder how the solution would change if both content and structure should be tested.
  • Jon Skeet
    Jon Skeet about 12 years
    @aviad: If you don't care about the structure, then as Hounshell said, just iterating over the tree in-order should be straightforward.
  • stryba
    stryba about 12 years
    @JonSkeet, that entirely depends on the hashCode implementation doesn't it? 1. you could use it for the data object only 2. if you also want to use it for the nodes you might use an incremental hash which is only updated when nodes changes which will lead to an O(1) operation in cases the tree doesn't change. You could also not hash the entire structure but just some features such as the data objects hash plus the hashcodes of one level below, it doesn't have to be O(n), there are endless options here. I am just saying it could help as another early exit strategy.
  • Jon Skeet
    Jon Skeet about 12 years
    @stryba: For the incremental hash, you'd need to add another field to the node so that it knows about its parent - every change needs to propagate all the way up the tree. Yes, there are ways of making it simpler (although only the hash code part would be O(1) - you'd still need to check for actual equality in an O(n) way if the hashes matched). But your comment of just "You could employ hashCode before using equals" didn't mention any of this. I'll edit the answer to add it as an option.
  • Erik P.
    Erik P. about 12 years
    @JonSkeet: You don't need to add a 'parent' field if any modification happens through methods that always start at the root of your tree (and that update the stored hash code wherever necessary, of course). One can probably enforce this only if access to the (not publicly visible) Node<T> objects is mediated by a (publicly visible) Tree<T> object of some sort.
  • Jon Skeet
    Jon Skeet about 12 years
    @ErikP.: Yes, that's the case if you don't need to be able to treat it as an arbitrary tree from the outside, navigating to individual nodes... or you can provide a separate data structure just for the navigation/adding, of course.