How to represent a tree like structure in a db

database data-structures tree hierarchical-data

31,860

Solution 1

I showed a solution similar to your nodes & edges tables, in my answer to the StackOverflow question: What is the most efficient/elegant way to parse a flat table into a tree? I call this solution "Closure Table".

I did a presentation on different methods of storing and using trees in SQL, Models for Hierarchical Data with SQL and PHP. I demonstrated that with the right indexes (depending on the queries you need to run), the Closure Table design can have very good performance, even over large collections of edges (about 500K edges in my demo).

I also covered the design in my book, SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

Solution 2

Be sure to use some sort of low level-coding for the entity being treed to prevent looping. The entity might be a part, subject, folder, etc.

With an Entity file and and Entity-Xref file you can loop through one of say two relationships between the two files, a parent and a child relation.

A level is the level an entity found in a tree. A low-level-code for the entity is the lowest level an entity is found in any tree anywhere. Check to make sure the low level code of the entity you want to make a child is less than or equal to prevent a loop. after adding an entity as a child it will become at least one level lower.

31,860

Author by

Guy

Updated on April 02, 2020

Comments

Guy about 4 years

I'm starting a project and I'm in the designing phase: I.e., I haven't decided yet on which db framework I'm going to use. I'm going to have code that creates a "forest" like structure. That is, many trees, where each tree is a standard: nodes and edges. After the code creates these trees I want to save them in the db. (and then pull them out eventually)

The naive approach to representing the data in the db is a relational db with two tables: nodes and edges. That is, the nodes table will have a node id, node data, etc.. And the edges table will be a mapping of node id to node id.

Is there a better approach? Or given the (limited) assumptions I'm giving this is the best approach? How about if we add an assumption that the trees are relatively small - is it better to save the whole tree as a blob in the db? Which type of db should I use in that case? Please comment on speed/scalability.

Thanks
Guy almost 13 years

Lets say I decide to keep a datastructure that represents the tree in memory (since I have many trees with keys pointing to them. i.e., key-->tree). For example, I can make a JSON out of my tree and save it as string. Am I better off using a non-relational db in this case? and (since I've never used a non-relational db) which non-relational db can you recommend?
Bill Karwin almost 13 years

You can sure stuff a JSON string into a non-relational db, but you won't be able to do anything with it until you pull it out of the db. If you would prefer to fetch the whole tree, deserialize it to application objects, and write recursive functions to walk the tree, that's fine. If you want a query language with which you can fetch subtrees or paths through the tree, use relational modeling.
Mohammed H over 11 years

@Bill Karwin See stackoverflow.com/questions/13302187/sql-query-optimize related to your slide.
user151496 about 9 years

i do not understand how does the closure table keep the structure data of the selected sub-tree. you just get "a bunch of nodes having the same parent", no information on the structure itself with selects
Bill Karwin about 9 years

@user151496, if you query a subtree starting at node C, you get all the set of all nodes below C. You're right that alone is not enough to show the hierarchy. But you can join those descendent nodes to path segments connecting to their ancestors or descendants. I show examples in my presentation that I linked to.
user151496 about 9 years

i probably didn't undestand the query properly then :( do you mean the query at page 49? because it seems like it does not query the structure. nor the latter querry that selects immediate parent/child
Bill Karwin about 9 years

@user151496, right, the query on slide 49 queries the set of descendants of node #4, but it doesn't show how those nodes relate to each other. The query for immediate parent returns only one node, so you know which one it is. The query for immediate children return several, but you know they're all siblings of each other.
Gus about 8 years

@BillKarwin And if it is saved in an XML structure?
Bill Karwin about 8 years

@Gus, If you have an XML document, then store it in a TEXT column. If you want to be able to query for individual elements, you might be able to use builtin XML functions, for example for MySQL: dev.mysql.com/doc/refman/5.7/en/xml-functions.html. But that won't be indexed. If you want high-performance queries, break up the XML and store the elements in discrete rows.