How to convert Directed Acyclic Graph (DAG) to Tree

15,225

Solution 1

There's the graph theoretical answer and the programmer's answer to this. I assume you can handle the programmers part yourself. For the graph theorethical answer:

  • A DAG is a set of modules where it never happens that A needs B, and at the same time, B (or one of the modules B needs) needs A, in modules-speak: no circular dependency. I've seen circular dependencies happen (search the Gentoo forums for examples), so you can't even be 100% sure you have a DAG, but let's assume you have. It it not very hard to do a check on circular dependencies, so I'd recommend you do so somewhere in your module loader.
  • In a tree, something that never can happen is that A depends on B and C and that both B and C depend on D (a diamond), but this can happen in a DAG.
  • Also, a tree has exactly one root node, but a DAG can have multiple "root" nodes (i.e. modules that nothing depends on). For example a program like the GIMP, the GIMP program will be the root node of the set of modules, but for GENTOO, almost any program with a GUI is a "root" node, while the libraries etc are dependencies of them. (I.E. both Konqueror and Kmail depend on Qtlib, but nothing depends on Konqueror, and nothing depends on Kmail)

The Graph theorethical answer to your question, as others pointed out, is that a DAG can't be converted to a tree, while every tree is a DAG.

However, (high-level programmers answer) if you want the tree for graphical representations, you're only interested in the dependencies of a specific module, not what's depending on that module. Let me give an example:

A depends on B and C
B depends on D and E
C depends on D and F

I can't show this as an ASCII-art tree, for the simple reason that this can't be converted into a tree. However, if you want to show what A depends on, you can show this:

A
+--B
|  +--D
|  +--E
+--C
   +--D
   +--F

As you see, you get double entries in your tree - in this case "only" D but if you do an "expand all" on the Gentoo tree, I guarantee you that your tree will have at least 1000 times the amount of nodes as there are modules. (there are at least 100 packages that depend on Qt, so everything Qt depends on will be present at least 100 times in the tree).

If you have a "large" or "complex" tree, It might be best to expand the tree dynamically, not in advance, otherwise you might have a very memory-intensive process.

The disadvantage of the tree above is that if you click open B, then D, you see that A and B depend on D, but not that also C depends on D. However, depending on your situation, this might not be important at all - if you maintain a list of loaded modules, on loading C you see that you have loaded D already, and it doesn't matter it wasn't loaded for C, but for B. It is loaded, that's all that matters. If you dynamically maintain what directly depends on a certain module, you can handle the opposite problem (unloading) as well.

However, what you can't do with a tree is what's in your final sentence: preserve topological order, i.e. if B gets loaded in the same container as C, you'll never get to load C in the same container as well. Or you might have to be put up with putting everything in one container (not that I fully understand what you mean with "loading into the same container")

Good luck!

Solution 2

A DAG and a tree are not the same thing mathematically. Thus, any conversion introduces ambiguity. A tree by definition has no cycles, period.

Solution 3

What you're looking for, in order to find the order to load your modules in, is the Topological sort of your DAG. If the edges go from a module to the modules it depends on (which I think is the most likely), you'll have to load the modules in the reverse order of the topological sort because a module will appear -before- all the modules on which it depends.

If you represent the DAG such that the edges go from the depended on modules to the modules that depend on them (you can get this by reversing all the edges in the graph above), you can just load the modules in the order of the topological sort.

Solution 4

It depends a lot on how you are representing your DAG. For example, it could be an adjacency matrix (A[i,j] = 1 if there's an edge from node i to node j, else 0), or as a system of pointers, or as an array of nodes and an array of edges....

Further, it's not clear what transformation you're trying to apply. A connected DAG is a tree, so I'm afraid you need to clarify your question a bit.

Solution 5

The answer is that you want to obtain a spanning tree. This is even defined for undirected graphs, so even if you have cycles you could ignore the direction of the edges, obtain an undirected graph and get the spannign tree of the latter. Which spanning tree you need is up to you as there are many possibilities, e.g. minimum spanning trees.

What I was looking for is an algorithm to obtain a 'redundant' spanning tree that keeps all edges but 'copies' nodes. Unfortunately I have not yet found what is the name for this, but I think the algorithm is straightforward if you go top-down and there are no cycles to get lost in. Still it would be good to have a name for this problem so as to look for ready-made fast implementations.

The generalization would be to have a forest of spanning trees.

Share:
15,225
Rohan West
Author by

Rohan West

Updated on June 09, 2022

Comments

  • Rohan West
    Rohan West almost 2 years

    I have been looking for C# examples to transform a DAG into a Tree.

    Does anyone have an examples or pointers in the right direction?

    Clarification Update

    I have a graph that contains a list of modules that my application is required to load. Each module has a list of modules it depends on. For example here are my modules, A, B C, D and E

    • A has no dependencies
    • B depends on A, C and E
    • C depends on A
    • D depends on A
    • E depends on C and A

    I want resolve dependencies and generate a tree that looks like this...

    --A

    --+--B

    -----+--C

    ---------+--D

    --+--E

    Topological Sort

    Thanks for the information, if I perform a Topological sort and reverse the output i will have the following order

    • A
    • B
    • C
    • D
    • E

    I want to maintain the hierarchical structure so that my modules are loaded into the correct context, for example... module E should be in the same container as B

    Thanks

    Rohan

    • ShuggyCoUk
      ShuggyCoUk about 15 years
      how do you wish to deal with diamonds... A -> B, A -> C, B & C -> D
    • Rohan West
      Rohan West about 15 years
      Thats a good question, I see a problem but dont know how to solve it, what would you do? My experiance with graph theory is very limited.
    • ShuggyCoUk
      ShuggyCoUk about 15 years
      You either 1) pick the first, 2) pick the last 3) duplicate the node. whichever is best is entirely application dependent, 3 is easiest followed by 1 followed by 2... it's hard to say what you want the tree for based on the question. diamond dependencies are a bitch in general
    • ShuggyCoUk
      ShuggyCoUk about 15 years
      Both BFS and DFS generate a tree from a DAG. They allow you to spot diamonds (they must do so to prevent traversing nodes more than once) the naive use of either would simply only include a node when it is first seen.
  • ypnos
    ypnos about 15 years
    You could always start with a fake root node.
  • Mitch Wheat
    Mitch Wheat about 15 years
    To be a pedant: Any connected graph with no cycles is a tree.
  • Rohan West
    Rohan West about 15 years
    I have updated the description regarding the Topological Sort, thanks
  • MarkusQ
    MarkusQ about 15 years
    I see your pedantry and raise you: the set of all connected directed acyclic graphs is a proper subset of the set of all connected acyclic graphs, therefore your statement ("Any connected graph with no cycles is a tree") formally implies mine ("A connected DAG is a tree").
  • mekanik
    mekanik about 15 years
    Neither does a DAG (hence the Acyclic part of the name). A DAG does allow diamonds and similarly topologies where two divergent branches re-converge. (But in no case can you get back to a parent). A tree does not have any nodes with more than one parent.
  • mekanik
    mekanik about 15 years
    Shouldn't it be "Any directed connected graph with no cycles CONTAINS a tree". If A depends on B and C; B depends on D and C depends on D, you don't have a tree, you have a graph that contains two trees.
  • dsimcha
    dsimcha about 15 years
    Exactly: A DAG has no directed cycles. A tree has no cycles at all.
  • Dave Rapin
    Dave Rapin over 13 years
    This detailed answer helped me think through a completely unrelated problem I'm working on ;)
  • supercat
    supercat over 10 years
    A DAG with a single root may be convertible to a tree if nodes have content but no identity (if multiple nodes share a child node, the references to that node would be replaced with references to distinct copies of it).
  • Andre Chenier
    Andre Chenier about 5 years
    hi @supercat, what you wrote is the answer to my unreplied question on SO. you gave me the idea of mirroring. please add this comment as an answer to my q for me to accept. regards. stackoverflow.com/questions/54940332/…