Walking/iterating over a nested dictionary of arbitrary depth (the dictionary represents a directory tree)

11,836

Solution 1

This is a preliminary code. Go through it and tell me where you face problems.

Parents={-1:"Root"}
def add_dir(level, parent, index, k):
    print "Directory"
    print "Level=%d, Parent=%s, Index=%d, value=%s" % (level, Parents[parent], index, k)
def add_file(parent, index, k):
    print "File"
    print "Parent=%s, Index=%d, value=%s" %  (Parents[parent], index, k)
def f(level=0, parent=-1, index=0, di={}):
    for k in di:
        index +=1
        if di[k]:
            Parents[index]=k
            add_dir(level, parent, index, k)
            f(level+1, index, index, di[k])
        else:
            add_file(parent, index, k)

a={
    'SomeRootDirectory': {
        'foo.txt': None,
        'bar.txt': None,
        'Stories': {
            'Horror': {
                'rickscott.txt' : None,
                'Trash' : {
                    'notscary.txt' : None,
                    },
                },
            'Cyberpunk' : None
            },
        'Poems' : {
            'doyoureadme.txt' : None
        }
    }
}

f(di=a)

Solution 2

Here is a function that prints all your file names. It goes through all the keys in the dictionary, and if they map to things that are not dictionaries (in your case, the filename), we print out the name. Otherwise, we call the function on the dictionary that is mapped to.

def print_all_files(directory):

    for filename in directory.keys():
        if not isinstance(directory[filename], dict):
            print filename
        else:
            print_all_files(directory[filename])

So this code can be modified to do whatever you want, but it's just an example of how you can avoid fixing the depth through use of recursion.

The key thing to understand is that each time print_all_files gets called, it has no knowledge of how deep it is in the tree. It just looks at the files that are right there, and prints the names. If there are directores, it just runs itself on them.

Solution 3

I realize this is an old question, but I was just looking for a simple, clean way to walk nested dicts and this is the closest thing my limited searching has come up with. oadams' answer isn't useful enough if you want more than just filenames and spicavigo's answer looks complicated.

I ended up just rolling my own that acts similar to how os.walk treats directorys, except that it returns all key/value information.

It returns an iterator and for each directory in the "tree" of nested dicts, the iterator returns (path, sub-dicts, values) where:

  • path is the path to the dict
  • sub-dicts is a tuple of (key,dict) pairs for each sub-dict in this dict
  • values is a tuple of (key,value) pairs for each (non-dict) item in this dict


def walk(d):
    '''
    Walk a tree (nested dicts).

    For each 'path', or dict, in the tree, returns a 3-tuple containing:
    (path, sub-dicts, values)

    where:
    * path is the path to the dict
    * sub-dicts is a tuple of (key,dict) pairs for each sub-dict in this dict
    * values is a tuple of (key,value) pairs for each (non-dict) item in this dict
    '''
    # nested dict keys
    nested_keys = tuple(k for k in d.keys() if isinstance(d[k],dict))
    # key/value pairs for non-dicts
    items = tuple((k,d[k]) for k in d.keys() if k not in nested_keys)

    # return path, key/sub-dict pairs, and key/value pairs
    yield ('/', [(k,d[k]) for k in nested_keys], items)

    # recurse each subdict
    for k in nested_keys:
        for res in walk(d[k]):
            # for each result, stick key in path and pass on
            res = ('/%s' % k + res[0], res[1], res[2])
            yield res

Here is the code I used to test it, though it has a a couple other unrelated (but neat) stuff in it:

import simplejson as json
from collections import defaultdict

# see https://gist.github.com/2012250
tree = lambda: defaultdict(tree)

def walk(d):
    '''
    Walk a tree (nested dicts).

    For each 'path', or dict, in the tree, returns a 3-tuple containing:
    (path, sub-dicts, values)

    where:
    * path is the path to the dict
    * sub-dicts is a tuple of (key,dict) pairs for each sub-dict in this dict
    * values is a tuple of (key,value) pairs for each (non-dict) item in this dict
    '''
    # nested dict keys
    nested_keys = tuple(k for k in d.keys() if isinstance(d[k],dict))
    # key/value pairs for non-dicts
    items = tuple((k,d[k]) for k in d.keys() if k not in nested_keys)

    # return path, key/sub-dict pairs, and key/value pairs
    yield ('/', [(k,d[k]) for k in nested_keys], items)

    # recurse each subdict
    for k in nested_keys:
        for res in walk(d[k]):
            # for each result, stick key in path and pass on
            res = ('/%s' % k + res[0], res[1], res[2])
            yield res

# use fancy tree to store arbitrary nested paths/values
mem = tree()

root = mem['SomeRootDirectory']
root['foo.txt'] = None
root['bar.txt'] = None
root['Stories']['Horror']['rickscott.txt'] = None
root['Stories']['Horror']['Trash']['notscary.txt'] = None
root['Stories']['Cyberpunk']
root['Poems']['doyoureadme.txt'] = None

# convert to json string
s = json.dumps(mem, indent=2)

#print mem
print s
print

# json.loads converts to nested dicts, need to walk them
for (path, dicts, items) in walk(json.loads(s)):
    # this will print every path
    print '[%s]' % path
    for key,val in items:
        # this will print every key,value pair (skips empty paths)
        print '%s = %s' % (path+key,val)
    print

The output looks like:

{
  "SomeRootDirectory": {
    "foo.txt": null,
    "Stories": {
      "Horror": {
        "rickscott.txt": null,
        "Trash": {
          "notscary.txt": null
        }
      },
      "Cyberpunk": {}
    },
    "Poems": {
      "doyoureadme.txt": null
    },
    "bar.txt": null
  }
}

[/]

[/SomeRootDirectory/]
/SomeRootDirectory/foo.txt = None
/SomeRootDirectory/bar.txt = None

[/SomeRootDirectory/Stories/]

[/SomeRootDirectory/Stories/Horror/]
/SomeRootDirectory/Stories/Horror/rickscott.txt = None

[/SomeRootDirectory/Stories/Horror/Trash/]
/SomeRootDirectory/Stories/Horror/Trash/notscary.txt = None

[/SomeRootDirectory/Stories/Cyberpunk/]

[/SomeRootDirectory/Poems/]
/SomeRootDirectory/Poems/doyoureadme.txt = None
Share:
11,836
floer_m
Author by

floer_m

Updated on June 22, 2022

Comments

  • floer_m
    floer_m almost 2 years

    I am almost certain there is a simple solution to this, but I have spent hours now reading and rereading the same set of related results that don't quite answer my problem.

    Context of this question (included for completion but feel free to skip this)

    This came up because I want a user to be able to select a group of files from within a directory (and also any subdirectory), and unfortunately Tkinter's default ability for selecting multiple files in a file dialog is broken on Windows 7 (http://bugs.python.org/issue8010).

    Thus I am attempting to represent a directory structure by an alternative method (still using Tkinter): constructing a facsimile of the directory structure, made of labeled and indented checkboxes (organized in a tree). Thus a directory like this:

    \SomeRootDirectory
        \foo.txt
        \bar.txt
        \Stories
            \Horror
                \scary.txt
                \Trash
                    \notscary.txt
            \Cyberpunk
        \Poems
            \doyoureadme.txt
    

    will look something like this (where # represents a checkbutton):

    SomeRootDirectory
        # foo.txt
        # bar.txt
        Stories
            Horror
                # scary.txt
                Trash
                    # notscary.txt
            Cyberpunk
        Poems
            # doyoureadme.txt
    

    Building the original dictionary from the directory structure is easy using a certain recipe I found at ActiveState (see below), but I hit a wall when I try to iterate over the nicely nested dictionary I am left with. And I think I need to iterate over it in order to populate a Tkinter frame with a pretty gridded representation of the tree. Then I hope to load in the various text files selected by the user, by interpreting which checkboxes were true or false. Everything seems fairly easy except iterating over the dictionary without fixing the depth.

    In more abstract terms

    To make these nested dictionaries I am using an ActiveState recipe -- http://code.activestate.com/recipes/577879/. It implements os.walk to make dictionaries such as this:

    a={
        'SomeRootDirectory': {
            'foo.txt': None,
            'bar.txt': None,
            'Stories': {
                'Horror': {
                    'horror.txt' : None,
                    'Trash' : {
                        'notscary.txt' : None,
                        },
                    },
                'Cyberpunk' : None
                },
            'Poems' : {
                'doyoureadme.txt' : None
            }
        }
    }
    

    After which point I am stumped. I am a Python newbie at time of writing

    Solution adapted from spicavigo's response

    #distinguish between directory and file
    dirtab = "/==="
    filetab = "|---"
    
    Parents={-1:"Root"}
    def add_dir(level, parent, index, k):
        print (dirtab*level)+k
    def add_file(level, parent, index, k):
        #Currently an empty folder gets passed to add_file; here's a quick treatment.
        if "." in k:
            print (filetab*level)+k
        else:
            print (dirtab*level)+k
    def addemup(level=0, parent=-1, index=0, di={}):
        for k in di:
            index +=1
            if di[k]:
                Parents[index]=k
                add_dir(level, parent, index, k)
                addemup(level+1, index, index, di[k])
            else:
                add_file(level, parent, index, k)
    
    addemup(di=a) #dictionary from above
    

    This produces something that I think will be very easy to revise into a Tkinter representation:

    SomeRootDirectory
    /===Poems
    |---|---doyoureadme.txt
    /===Stories
    /===/===Horror
    |---|---|---rickscott.txt
    /===/===/===Trash
    |---|---|---|---notscary.txt
    /===/===Cyberpunk
    |---foo.txt
    |---bar.txt
    

    Thanks, this community is incredible.

  • floer_m
    floer_m over 12 years
    Wow, that works wonderfully. I updated my question to include my solution (a slight adaptation of yours that is basically Tkinter-ready, I think).
  • gsamaras
    gsamaras almost 9 years
    An explanation would be nice!