How to represent a data tree in SQL?

56,107

Solution 1

I've bookmarked this slidshare about SQL-Antipatterns, which discusses several alternatives: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed

The recommendation from there is to use a Closure Table (it's explained in the slides).

Here is the summary (slide 77):

                  | Query Child | Query Subtree | Modify Tree | Ref. Integrity
Adjacency List    |    Easy     |     Hard      |    Easy     |      Yes
Path Enumeration  |    Easy     |     Easy      |    Hard     |      No
Nested Sets       |    Hard     |     Easy      |    Hard     |      No
Closure Table     |    Easy     |     Easy      |    Easy     |      Yes

Solution 2

The easiest implementation is adjacency list structure:

id  parent_id  data

However, some databases, particularly MySQL, have some issues in handling this model, because it requires an ability to run recursive queries which MySQL lacks.

Another model is nested sets:

id lft rgt data

where lft and rgt are arbitrary values that define the hierarchy (any child's lft, rgt should be within any parent's lft, rgt)

This does not require recursive queries, but it slower and harder to maintain.

However, in MySQL this can be improved using SPATIAL abitilies.

See these articles in my blog:

for more detailed explanations.

Solution 3

I'm suprised that nobody mentioned the materialized path solution, which is probably the fastest way of working with trees in standard SQL.

In this approach, every node in the tree has a column path, where the full path from the root to the node is stored. This involves very simple and fast queries.

Have a look at the example table node:

+---------+-------+
| node_id | path  |
+---------+-------+
| 0       |       |
| 1       | 1     |
| 2       | 2     |
| 3       | 3     |
| 4       | 1.4   |
| 5       | 2.5   |
| 6       | 2.6   |
| 7       | 2.6.7 |
| 8       | 2.6.8 |
| 9       | 2.6.9 |
+---------+-------+

In order to get the children of node x, you can write the following query:

SELECT * FROM node WHERE path LIKE CONCAT((SELECT path FROM node WHERE node_id = x), '.%')

Keep in mind, that the column path should be indexed, in order to perform fast with the LIKE clause.

Solution 4

If you are using PostgreSQL you can use ltree, a package in the contrib extension (comes by default) which implements the tree data structure.

From the docs:

CREATE TABLE test (path ltree);
INSERT INTO test VALUES ('Top');
INSERT INTO test VALUES ('Top.Science');
INSERT INTO test VALUES ('Top.Science.Astronomy');
INSERT INTO test VALUES ('Top.Science.Astronomy.Astrophysics');
INSERT INTO test VALUES ('Top.Science.Astronomy.Cosmology');
INSERT INTO test VALUES ('Top.Hobbies');
INSERT INTO test VALUES ('Top.Hobbies.Amateurs_Astronomy');
INSERT INTO test VALUES ('Top.Collections');
INSERT INTO test VALUES ('Top.Collections.Pictures');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Stars');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Galaxies');
INSERT INTO test VALUES ('Top.Collections.Pictures.Astronomy.Astronauts');
CREATE INDEX path_gist_idx ON test USING GIST (path);
CREATE INDEX path_idx ON test USING BTREE (path);

You can do queries like:

ltreetest=> SELECT path FROM test WHERE path <@ 'Top.Science';
                path
------------------------------------
 Top.Science
 Top.Science.Astronomy
 Top.Science.Astronomy.Astrophysics
 Top.Science.Astronomy.Cosmology
(4 rows)

Solution 5

It depends on how you will be querying and updating the data. If you store all the data in one row, it's basically a single unit that you can't query into or partially update without rewriting all the data.

If you want to store each element as a row, you should first read Managing Hierarchical Data in MySQL (MySQL specific, but the advice holds for many other databases too).

If you're only ever accessing an entire tree, the adjacency list model makes it difficult to retrieve all nodes under the root without using a recursive query. If you add an extra column that links back to the head then you can do SELECT * WHERE head_id = @id and get the whole tree in one non-recursive query, but it denormalizes the database.

Some databases have custom extensions that make storing and retrieving heirarchical data easier, for example Oracle has CONNECT BY.

Share:
56,107
Avi Harush
Author by

Avi Harush

Updated on July 08, 2022

Comments

  • Avi Harush
    Avi Harush almost 2 years

    I'm writing a data tree structure that is combined from a Tree and a TreeNode. Tree will contain the root and the top level actions on the data. I'm using a UI library to present the tree in a windows form where I can bind the tree to the TreeView.

    I will need to save this tree and nodes in the DB. What will be the best way to save the tree and to get the following features:

    1. Intuitive implementation.
    2. Easy binding. Will be easy to move from the tree to the DB structure and back (if any)

    I had 2 ideas. The first is to serialize the data into a one liner in a table. The second is to save in tables but then, when moving to data entities I will loose the row states on the table on changed nodes.

    Any ideas?