Is there a `connect by` alternative in MySQL?

10,013

Solution 1

As said in comments, there isn't a short way with mysql.

BUT!

If you have the possibility to change the database structure, you can deploy a better design to handle tree-like hierarchies.

If you follow THIS TUTORIAL from Bill Karwin (HERE is the original answer which references that slideshow tutorial), you can find 4 methods used to model an hierarchical structure:

  1. Adiacency List
  2. Path Enumeration
  3. Nested sets
  4. Closure table

Now, the best model possible is the 4th one (I leave descriptions of the other 3 models to the reader), which basically needs 2 tables: one for the elements and one for the paths. In the paths table (the closure table itself), you'll store every path from each node to every descendant (not just the direct childs!).

It's suggested to save also the path length for each row, because it makes easier to query for immediate childrens in the tree.

Even if this solution requires more space, it has the best overall performance and it's really easy to use: it doesn't rely on recursive queries at all AND it will grants referential integrity for the whole dataset!

For example, to get every child of the node #4:

select a.*
from nodes a
join paths b
on a.node_id = b.descendant
where b.ancestor = 4

Another example: get all the ancestors of the node #11

select a.*
from nodes a
join paths b
on a.node_id = b.ancestor
where b.descendant = 11

need to delete the subtree of the node #6

delete from paths where descendant in
(select descendant from paths where ancestor = 6)

Solution 2

In my application, I rarely need to ask for the entire subtree. So, to get away from the big issue of o2, I use 3-deep association / closure table lookups - populating the table with ancestor - descendent only for child, parent, and grandparents. Just recognize that you get what you put in - i.e. don't query for the entire tree without a stored procedure.

Share:
10,013
Adam Arold
Author by

Adam Arold

I do programming for a living. I like tinkering with stuff from bare metal to designing architectures. I also contribute to the Open Source community. Currently I'm based on the JVM (Kotlin/Clojure/Java) but I'm still looking for an Elixir (pun intended). Currently I'm hacking simulation-based games and world/flora/fauna/civilization generation (Dwarf Fortress style). If you think we can work together (programming/pixel art) drop me a mail. Check out my Hexagonal Grid library if interested: Hexameter If you need a multiplatform messaging bus: Riptide or a multiplatform terminal emulator for games: Zircon You can find my LinkedIn profile here I also created the unicorns tag on meta and #SOreadytohelp.

Updated on June 17, 2022

Comments

  • Adam Arold
    Adam Arold almost 2 years

    If I use Oracle there is the connect by keyword which can be used to create hierarchical queries. Currently I'm using MySQL on a project and I would like to know whether there is an alternative for connect by in MySQL?

    I tried google but for no avail so far. What I'm trying to achieve is to fetch a tree from the database with one query. There are two tables involved:

    areas and area_to_parent_join. The latter contains two ids one is area_id and the other is parent_id. So it is basically a self-join and I can create graphs using that model. In fact it is currently only used to create trees but this might change in the future. But in either case what I would like to have is just a spanning tree.

    Edit: areas might have more than 1.000.000 records which makes most of the space-intensive options unfeasible.

  • Adam Arold
    Adam Arold over 10 years
    I really like the 4th idea but the problem is that I might have 1.000.000 records. Since the closure table has O(n^2) worst case space complexity it means that storing all the paths for 1.000.000 records will take up 8381 terabytes of space. This is obviously not acceptable. Closure tables might work for small datasets but it does not scale well.
  • STT LCU
    STT LCU over 10 years
    @AdamArold I agree that the worst case is O(n^2) but it will never happen: a full closure table means that every node is connected to every other node. Is this your case? I don't think so! Since you're modeling a tree structure, O(n^2) won't ever happen. It's much, much closer to O(n) in reality. Give it a try (it even says so on the slides)
  • STT LCU
    STT LCU over 10 years
    the "weight" of the closure table depends much more on the tree topography than on the number of nodes itself: with 10 nodes, a tree with 1 root and 9 leaves has 17 paths in the closure table, one with 1 root, 7 intermediate nodes and 2 leaves has 34 paths and one with 1 root, 8 intermediate nodes and 1 leaf has 56 paths! (note that even the worst case for a tree is very far from the teorethical worst case for the closure table: 56 vs 100 while the average is even less).
  • Adam Arold
    Adam Arold over 10 years
    I think there is an error in your queries: node does not have a descendant or ancestor column.
  • STT LCU
    STT LCU over 10 years
    a full binary tree of 15 nodes has less than 50 entries in the closure table (against the theoretical worst case of 225)
  • STT LCU
    STT LCU over 10 years
    @AdamArold sure, sorry it was a typo
  • Adam Arold
    Adam Arold over 10 years
    Anyway +1 for the detailed answer. I've never heard of closure tables before. I'll wait a bit for other answers then I can accept yours.