How to convert JSON data into a tree image?

25,303

Solution 1

For a tree like this there's no need to use a library: you can generate the Graphviz DOT language statements directly. The only tricky part is extracting the tree edges from the JSON data. To do that, we first convert the JSON string back into a Python dict, and then parse that dict recursively.

If a name in the tree dict has no children it's a simple string, otherwise, it's a dict and we need to scan the items in its "children" list. Each (parent, child) pair we find gets appended to a global list edges.

This somewhat cryptic line:

name = next(iter(treedict.keys()))

gets a single key from treedict. This gives us the person's name, since that's the only key in treedict. In Python 2 we could do

name = treedict.keys()[0]

but the previous code works in both Python 2 and Python 3.

from __future__ import print_function
import json
import sys

# Tree in JSON format
s = '{"Harry": {"children": ["Bill", {"Jane": {"children": [{"Diane": {"children": ["Mary"]}}, "Mark"]}}]}}'

# Convert JSON tree to a Python dict
data = json.loads(s)

# Convert back to JSON & print to stderr so we can verify that the tree is correct.
print(json.dumps(data, indent=4), file=sys.stderr)

# Extract tree edges from the dict
edges = []

def get_edges(treedict, parent=None):
    name = next(iter(treedict.keys()))
    if parent is not None:
        edges.append((parent, name))
    for item in treedict[name]["children"]:
        if isinstance(item, dict):
            get_edges(item, parent=name)
        else:
            edges.append((name, item))

get_edges(data)

# Dump edge list in Graphviz DOT format
print('strict digraph tree {')
for row in edges:
    print('    {0} -> {1};'.format(*row))
print('}')

stderr output

{
    "Harry": {
        "children": [
            "Bill",
            {
                "Jane": {
                    "children": [
                        {
                            "Diane": {
                                "children": [
                                    "Mary"
                                ]
                            }
                        },
                        "Mark"
                    ]
                }
            }
        ]
    }
}

stdout output

strict digraph tree {
    Harry -> Bill;
    Harry -> Jane;
    Jane -> Diane;
    Diane -> Mary;
    Jane -> Mark;
}

The code above runs on Python 2 & Python 3. It prints the JSON data to stderr so we can verify that it's correct. It then prints the Graphviz data to stdout so we can capture it to a file or pipe it directly to a Graphviz program. Eg, if the script is name "tree_to_graph.py", then you can do this in the command line to save the graph as a PNG file named "tree.png":

python tree_to_graph.py | dot -Tpng -otree.png

And here's the PNG output:

Tree made by Graphviz

Solution 2

Based on the answer of PM 2Ring I create a script which can be used via command line:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Convert a JSON to a graph."""

from __future__ import print_function
import json
import sys


def tree2graph(data, verbose=True):
    """
    Convert a JSON to a graph.

    Run `dot -Tpng -otree.png`

    Parameters
    ----------
    json_filepath : str
        Path to a JSON file
    out_dot_path : str
        Path where the output dot file will be stored

    Examples
    --------
    >>> s = {"Harry": [ "Bill", \
                       {"Jane": [{"Diane": ["Mary", "Mark"]}]}]}
    >>> tree2graph(s)
    [('Harry', 'Bill'), ('Harry', 'Jane'), ('Jane', 'Diane'), ('Diane', 'Mary'), ('Diane', 'Mark')]
    """
    # Extract tree edges from the dict
    edges = []

    def get_edges(treedict, parent=None):
        name = next(iter(treedict.keys()))
        if parent is not None:
            edges.append((parent, name))
        for item in treedict[name]:
            if isinstance(item, dict):
                get_edges(item, parent=name)
            elif isinstance(item, list):
                for el in item:
                    if isinstance(item, dict):
                        edges.append((parent, item.keys()[0]))
                        get_edges(item[item.keys()[0]])
                    else:
                        edges.append((parent, el))
            else:
                edges.append((name, item))
    get_edges(data)
    return edges


def main(json_filepath, out_dot_path, lr=False, verbose=True):
    """IO."""
    # Read JSON
    with open(json_filepath) as data_file:
        data = json.load(data_file)

    if verbose:
        # Convert back to JSON & print to stderr so we can verfiy that the tree
        # is correct.
        print(json.dumps(data, indent=4), file=sys.stderr)

    # Get edges
    edges = tree2graph(data, verbose)

    # Dump edge list in Graphviz DOT format
    with open(out_dot_path, 'w') as f:
        f.write('strict digraph tree {\n')
        if lr:
            f.write('rankdir="LR";\n')
        for row in edges:
            f.write('    "{0}" -> "{1}";\n'.format(*row))
        f.write('}\n')


def get_parser():
    """Get parser object for tree2graph.py."""
    from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter
    parser = ArgumentParser(description=__doc__,
                            formatter_class=ArgumentDefaultsHelpFormatter)
    parser.add_argument("-i", "--input",
                        dest="json_filepath",
                        help="JSON FILE to read",
                        metavar="FILE",
                        required=True)
    parser.add_argument("-o", "--output",
                        dest="out_dot_path",
                        help="DOT FILE to write",
                        metavar="FILE",
                        required=True)
    return parser


if __name__ == "__main__":
    import doctest
    doctest.testmod()
    args = get_parser().parse_args()
    main(args.json_filepath, args.out_dot_path, verbose=False)
Share:
25,303
Grimlock
Author by

Grimlock

Updated on July 09, 2022

Comments

  • Grimlock
    Grimlock almost 2 years

    I'm using treelib to generate trees, now I need easy-to-read version of trees, so I want to convert them into images. For example: enter image description here

    The sample JSON data, for the following tree:

    enter image description here

    With data:

    >>> print(tree.to_json(with_data=True))
    {"Harry": {"data": null, "children": [{"Bill": {"data": null}}, {"Jane": {"data": null, "children": [{"Diane": {"data": null}}, {"Mark": {"data": null}}]}}, {"Mary": {"data": null}}]}}
    

    Without data:

    >>> print(tree.to_json(with_data=False))
    {"Harry": {"children": ["Bill", {"Jane": {"children": [{"Diane": {"children": ["Mary"]}}, "Mark"]}}]}}
    

    Is there anyway to use graphviz or d3.js or some other python library to generate tree using this JSON data?

  • Grimlock
    Grimlock over 7 years
    This is exactly what I need, but I keep getting error: Warning: <stdin>: syntax error in line 1 near. Any idea what is the cause of this error?
  • Grimlock
    Grimlock over 7 years
    Maybe it's because my tree is made of integers instead of strings. So whenever I execute that command in console, I get error: Warning: <stdin>: syntax error in line 1 near 114 , where 114 is the root node.
  • Grimlock
    Grimlock over 7 years
    Tested everything from your code, it works. The problem is, I'm fetching data as s = tree.to_json(with_data=False), which is causing the above mentioned error, any idea, how can I solve it? @PM2Ring
  • PM 2Ring
    PM 2Ring over 7 years
    @Grimlock My code can handle integers. However, all keys in JSON must be strings, perhaps treelib is producing invalid JSON that causes json.loads to raise an error. Please add an example of your actual data onto the end of your question and I'll take a look at it.
  • Grimlock
    Grimlock over 7 years
    I solved the Warning: <stdin>: syntax error in line 1 near error by myself, but the thing is I'm using integers and the integers may repeat at time. What should I do at such times? Like in this example, there can be a children of "Diane" named "Diane". How to solve it?
  • PM 2Ring
    PM 2Ring over 7 years
    @Grimlock: You can't have two distinct nodes that have the same name. OTOH, Graphviz can handle loops, including nodes that point to themself. If your treelib code assigns the same name to two or more distinct nodes you need to fix that code.
  • Grimlock
    Grimlock over 7 years
    Yes, Graphviz is handling the loops, that's a problem for me. I need to have nodes with same values. So, where should I make the change?
  • PM 2Ring
    PM 2Ring over 7 years
    @Grimlock It sounds to me like you need to re-think your data structure. As I said earlier, two nodes in a graph can't have the same name. OTOH, in Graphviz you can attach a label to a node which is different to its name, you can even give nodes tooltips, which is handy if the Graphviz output is a SVG.