Calculate minimal operations to make two tree structures identical
Solution 1
There is not only a Wikipedia article on graph isomorphism (as Space_C0wb0y points out) but also a dedicated article on the graph isomorphism problem. It has a section Solved special cases
for which polynomial-time solutions are known. Trees is one of them and it cites the following two references:
- P.J. Kelly, "A congruence theorem for trees" Pacific J. Math., 7 (1957) pp. 961–968
- Aho, Alfred V.; Hopcroft, John; Ullman, Jeffrey D. (1974), The Design and Analysis of Computer Algorithms, Reading, MA: Addison–Wesley .
Solution 2
You weren't clear if you were comparing abstract syntax trees for source code, XML documents interpreted as trees, or some other type of tree.
There's a number of papers that discuss comparing syntax trees and computing minimum distances by various means. The ideas should be relevant.
A good paper is Change Distilling, which tries to compare the source code for two abstract syntax trees and report a minimal difference. The paper talks about a specific method, and also briefly mentions (and provides references) to a variety of similar techniques.
Few of these algorithms are actually realized in available tools for comparing computer program source text. Our Smart Differencer is one of them; it is driven by an explicit language grammar for many languages.
Solution 3
Although this question is old, i'll add a couple more references and algorithms below:
- X-Diff: An Effective Change Detection Algorithm for XML Documents , Yuan Wang, David J. DeWitt, Jin-Yi Cai
- KF-Diff+: Highly Efficient Change Detection Algorithm for XML Documents
- diffX: An Algorithm to Detect Changes in Multi-Version XML Documents
- Change Detection in XML Trees: a Survey, Luuk Peters
- Similarity in Tree Data Structures
Furthermore, there are libraries and frameworks on GitHub (in javascript) which implement diffing of Tree-like structures for example applications dealing with JSON data or XML Trees (e.g for client-side MVC/MVVM):
Solution 4
In case folks find this question and need something implemented for Node.js or the browser, I'm providing a link and code example for an implementation I've written that you can find on github here: (https://github.com/hoonto/jqgram.git) based on the existing PyGram Python code (https://github.com/Sycondaman/PyGram).
This is a tree edit distance approximation algorithm, but it is much, much faster than trying to find the true edit distance. The approximation performs in O(n log n) time and O(n) space whereas true edit distance is often O(n^3) or O(n^2) using known algorithms for true edit distance. See the academic paper from which the PQ-Gram algorithm comes: (http://www.vldb2005.org/program/paper/wed/p301-augsten.pdf)
So using jqgram:
Example:
var jq = require("jqgram").jqgram;
var root1 = {
"thelabel": "a",
"thekids": [
{ "thelabel": "b",
"thekids": [
{ "thelabel": "c" },
{ "thelabel": "d" }
]},
{ "thelabel": "e" },
{ "thelabel": "f" }
]
}
var root2 = {
"name": "a",
"kiddos": [
{ "name": "b",
"kiddos": [
{ "name": "c" },
{ "name": "d" },
{ "name": "y" }
]},
{ "name": "e" },
{ "name": "x" }
]
}
jq.distance({
root: root1,
lfn: function(node){ return node.thelabel; },
cfn: function(node){ return node.thekids; }
},{
root: root2,
lfn: function(node){ return node.name; },
cfn: function(node){ return node.kiddos; }
},{ p:2, q:3 },
function(result) {
console.log(result.distance);
});
And that gives you a number between 0 and 1. The closer to zero, the more closely related the two trees look to jqgram. One approach could be to use jqgram to narrow in on several closely related trees from among many trees given its speed, then utilize true edit distance on the few trees remaining that you need to take a closer inspection of, and for that you can find python implementations for reference or port of the Zhang & Shasha algorithm for example.
Note that the lfn and cfn parameters specify how each tree should determine the node label names and the children array for each tree root independently so that you can do funky things like comparing an object to a browser DOM for example. All you need to do is provide those functions along with each root and jqgram will do the rest, calling your lfn and cfn provided functions to build out the trees. So in that sense it is (in my opinion anyway) much easier to use than PyGram. Plus, its Javascript, so use it client or server-side!
ALSO, to answer with respect to cycle detection, check out the clone method inside of jqgram, there is cycle detection there, but credit for that goes to the author of node-clone from which that piece was slightly modified and included.
Solution 5
This is called the tree to tree correction problem or the tree to tree editing problem. Most of the literature dealing with this explicitly relates to comparing XML trees for some reason, so searching for "XML diffing algorithm" yields a lot of results. In addition to Nikos's list of links, I found these:
- Fine-grained Change Detection in Structured Text Documents (2014)
- Change Detection by Level (CDL): An Efficient Algorithm to Detect Change on XML Documents (2010)
-
Comparing XML Documents as Reference-aware Labeled Ordered Trees (2011)
The code for this - VTracker still exists!Edit: actually the interesting bit of code is not included. This pointed me to... - UMLDiff: An Algorithm for Object-Oriented Design Differencing (2005).
- Revisiting the tree edit distance and its backtracing: A tutorial (2018) - looks like a good tutorial for the Zhang-Shasha algorithm, which seems to be the "classic" solution, but has terrible time complexity because it compares every sub-tree with every other sub-tree.
I also strongly recommend reading Change Detection in XML Trees: a Survey but it is from 2005 so barely any of the tools it mentions exist anymore. Comparing XML Documents as Reference-aware Labeled Ordered Trees has the best intuitive description of some of the algorithms that I have found so far (start at section 2.1.2).
Unfortunately there doesn't seem to be much open source code available that does this and isn't ancient. Just a lot of overly-complex papers. :-/
Tomas Vana
Developer living in a lovely city of Vienna in Austria, Europe. At work doing mostly .NET and web development, intranet web applications and data warehousing solutions. For fun developing iPhone Apps, with the first one released recently at AppStore.
Updated on May 02, 2021Comments
-
Tomas Vana about 3 years
This is more of a CS question, but an interesting one :
Let's say we have 2 tree structures with more or less the same nodes reorganized. How would you find
- any
- in some sense minimal
sequence of operations
MOVE(A, B)
- moves node A under node B (with the whole subtree)INSERT(N, B)
- inserts a new node N under node BDELETE (A)
- deletes the node A (with the whole subtree)
that transforms one tree to the other.
There might obviously be cases where such transformation is not possible, trivial being root A with child B to root B with child A etc.). In such cases, the algorithm would simply deliver an result "not possible".
Even more spectacular version is a generalization for networks, i.e. when we assume that a node can occur multiple times in the tree (effectively having multiple "parents"), while cycles are forbidden.
Disclaimer : This is not a homework, actually it comes from a real business problem and I found it quite interesting wondering if somebody might know a solution.
-
Tomas Vana almost 13 yearsActually, in our case it's not source code, these are really trees. There is some semantic in those trees, but all in all not that important - they are directly manipulated by the users as a tree
-
coder14 over 7 yearsBroken link: I just spent 20 minutes looking for the "Change Distilling" paper. Here's the updated link: merlin.uzh.ch/publication/show/2531 The software project itself has moved to bitbucket.org/sealuzh/tools-changedistiller/wiki/Home (which is how I got the correct link to the PDF)
-
john ktejik almost 6 yearsdoes this allow multiple lfn ? I want to match more than the label, ie. also the stored value. node.value.
-
Timmmm over 4 yearsHighly recommend reading the
Change Detection in XML Trees: a Survey
paper - it lists dozens of algorithms for XML diffing (which is just tree diffing). -
Mengo almost 4 yearsI'm not be able to see this paper though, is the pdf link broken?
Change Detection in XML Trees: a Survey
-
Timmmm almost 4 yearsWorks for me. Did you click the
Download full-test PDF
button? Maybe try Sci-hub if it's blocked for some reason.