Find whether a tree is a subtree of other

12,870

Solution 1

Traverse T1. If the current node is equal to the root node of T2, traverse both of the trees (T2 and the current subtree of T1) at the same time. Compare the current node. If they are always equal, T2 is a subtree of T1.

Solution 2

I suggest an algorithm, that uses preprocessing:

1) Pre-process the T1 tree, keeping the list of all possible subtree roots (the cache list will have a millions of entries);

2) Sort the cached list of roots by the descending order of data, kept in root. You may choose more elegant sorting function, for example, parse a character tree into string.

3) Use binary search to locate the necessary sub-tree. You may quickly reject subtrees, with the number of nodes, not equal to the T2 number of nodes (or with the different depth).

Note that steps 1) and 2) must be done only once, then you can test many sub-tree candidates, using the same preprocessed result.

Solution 3

If you are memory/storage bound (i.e. can't pre-manipulate and store the trees in alternate forms), you probably won't be able to do anything better than the brute force search suggested by some other answers (traverse P1 looking for matching root of P2, traverse both to determine whether the node is in-fact the root of a matching sub-tree, continue with original traversal if it isn't a match). This search operates in O(n * m) where n is the size of P1 and m is the size of P2. With depth checks and other potential optimizations depending on the tree data you have available, this man be optimized a bit, but it's still generally O(n * m). Depending on your specific circumstances, this may be the only reasonable approach.

If you're not memory/storage bound and don't mind a bit of complexity, I believe this could be improved to O(n + m) by reducing to a longest common substring problem with the help of a generalized suffix tree. Some discussion on this for a similar problem can be found here. Maybe when I have more time, I'll come back and edit with more specifics on an implementation.

Solution 4

If given the root of both trees, and given that the nodes are of the same type, why is then just ascertaining that the root of T2 is in T1 not sufficient?

I am assuming that "given a tree T" means given a pointer to the root of T and the data type of the node.

Regards.

Solution 5

If your trees are not sorted in any way, I don't see any other way than to do a brute-force search: walk through tree T1 and check to see if you reach a node which matches the first node of tree T2. If not, continue traversing T1. If so, check if the next nodes match, until you find the end of T2, in which case you have a hit: your tree T2 is indeed a subtree of T1.

If you know the depth of every single node of T1 (from leaf to node), you could skip any nodes which are not as deep as the subtree you are looking for. This could help you eliminate a lot of needless comparisons. Say that T1 and T2 are well balanced, then tree T1 will have a total depth of 20 (2**20 > 1,000,000) and tree T2 will have a depth of 7 (2**7 > 100). You'll just have to walk the 13 first layers of T1 (8192 nodes -- or is that 14 layers and 16384 nodes?) and will be able to skip about 90% of T1...

However, if by subtree you mean that leaf nodes of T2 are also leaf nodes of T1, then you could do a first traversal of T1 and compute the depth of every node (distance from leaf to node) and then only check the subtrees which have the same depth as T2.

Share:
12,870
sud03r
Author by

sud03r

Graduate Student at University of Waterloo

Updated on June 09, 2022

Comments

  • sud03r
    sud03r almost 2 years

    There are two binary trees T1 and T2 which store character data, duplicates allowed.
    How can I find whether T2 is a subtree of T1 ? .
    T1 has millions of nodes and T2 has hundreds of nodes.

  • Dijar
    Dijar almost 15 years
    Also take the depth into account. You only have to search depth's 0 to (T1_depth - T2_depth) assuming 0 is the root depth and T1_depth >= T2_depth.
  • Aditya
    Aditya over 11 years
    Sorry for digging up such an old question, but the question says that duplicates in data are allowed. In case there are in fact duplicates, this algorithm might not work out correctly. Am I missing something?
  • John Dvorak
    John Dvorak over 11 years
    @Aditya If you continue the search after each failed partial match then this algorithm is correct.
  • Michal
    Michal over 10 years
    @Jan Dvorak It is indeed correct but what is the complexity? O(n^2)? You'll visit the same node multiple times.
  • John Dvorak
    John Dvorak over 10 years
    @MichalWegorek only the root of T2 is compared against every node in T1. Only if there's a match, the subtrees are compared. This is only inefficient if the root node of T2 is present many times in T1. Do you have a better algorithm in that particular case?
  • Cristian Diaconescu
    Cristian Diaconescu over 8 years
    This is wrong for at least two reasons. 1. The definition of 'subtree' means, at least on Wikipedia, "A subtree of a tree T is a tree consisting of a node in T and all of its descendants in T."
  • Cristian Diaconescu
    Cristian Diaconescu over 8 years
    and 2, if OP needs the more lax definition of "print each tree on a transparent sheet; stack sheets; if small tree perfectly overlaps part of big tree, it's a subtree": traversing two trees in parallel, using a particular traversal algorithm, even if the nodes' values match at each point, doesn't guarantee that the two trees are identical. E.g. using pre-order (top-left-right) traversal, the two trees 1 (L: 2, R: nil) and 1 (L: nil, R: 2) would appear to be identical (each tree yields values [1,2] ) [to be cont'd]
  • Cristian Diaconescu
    Cristian Diaconescu over 8 years
    [cont'd] If you want to make sure the trees are indeed identical, you need to traverse them twice, once in-order (left-top-right) and once either pre- or post-order
  • KRoy
    KRoy about 8 years
    I will serialize both tree and find if the second is sub-string of the first. We have all kinds of optimized sub-string finding algorithms available in popular languages.