Python - Flatten the list of dictionaries

11,190

Solution 1

You can do the following, using itertools.chain:

>>> from itertools import chain
# timeit: ~3.40
>>> [dict(chain(*map(dict.items, d.values()))) for d in data]
[{'l': 'Apple', 
  'b': 'Milk', 
  'd': 'Meatball', 
  'favourite': 'coke', 
  'dislike': 'juice'}, 
 {'l': 'Apple1', 
  'b': 'Milk1', 
  'dislike': 'juice3', 
  'favourite': 'coke2', 
  'd': 'Meatball2'}]

The usage of chain, map, * make this expression a shorthand for the following doubly nested comprehension which actually performs better on my system (Python 3.5.2) and isn't that much longer:

# timeit: ~2.04
[{k: v for x in d.values() for k, v in x.items()} for d in data]
# Or, not using items, but lookup by key
# timeit: ~1.67
[{k: x[k] for x in d.values() for k in x} for d in data]

Note:

RoadRunner's loop-and-update approach outperforms both these one-liners at timeit: ~1.37

Solution 2

You can do this with 2 nested loops, and dict.update() to add inner dictionaries to a temporary dictionary and add it at the end:

L = []
for d in data:
    temp = {}
    for key in d:
        temp.update(d[key])

    L.append(temp)

# timeit ~1.4
print(L)

Which Outputs:

[{'l': 'Apple', 'b': 'Milk', 'd': 'Meatball', 'favourite': 'coke', 'dislike': 'juice'}, {'l': 'Apple1', 'b': 'Milk1', 'd': 'Meatball2', 'favourite': 'coke2', 'dislike': 'juice3'}]

Solution 3

You can use functools.reduce along with a simple list comprehension to flatten out the list the of dicts

>>> from functools import reduce 

>>> data = [{'b': {'dislike': 'juice', 'favourite': 'coke'}, 'a': {'l': 'Apple', 'b': 'Milk', 'd': 'Meatball'}}, {'b': {'dislike': 'juice3', 'favourite': 'coke2'}, 'a': {'l': 'Apple1', 'b': 'Milk1', 'd': 'Meatball2'}}]
>>> [reduce(lambda x,y: {**x,**y},d.values()) for d in data]
>>> [{'dislike': 'juice', 'l': 'Apple', 'd': 'Meatball', 'b': 'Milk', 'favourite': 'coke'}, {'dislike': 'juice3', 'l': 'Apple1', 'd': 'Meatball2', 'b': 'Milk1', 'favourite': 'coke2'}]

Time benchmark is as follows:

>>> import timeit
>>> setup = """
      from functools import reduce
      data = [{'b': {'dislike': 'juice', 'favourite': 'coke'}, 'a': {'l': 'Apple', 'b': 'Milk', 'd': 'Meatball'}}, {'b': {'dislike': 'juice3', 'favourite': 'coke2'}, 'a': {'l': 'Apple1', 'b': 'Milk1', 'd': 'Meatball2'}}]
  """
>>> min(timeit.Timer("[reduce(lambda x,y: {**x,**y},d.values()) for d in data]",setup=setup).repeat(3,1000000))
>>> 1.525032774952706

Time benchmark of other answers on my machine

>>> setup = """
        data = [{'b': {'dislike': 'juice', 'favourite': 'coke'}, 'a': {'l': 'Apple', 'b': 'Milk', 'd': 'Meatball'}}, {'b': {'dislike': 'juice3', 'favourite': 'coke2'}, 'a': {'l': 'Apple1', 'b': 'Milk1', 'd': 'Meatball2'}}]
    """
>>> min(timeit.Timer("[{k: v for x in d.values() for k, v in x.items()} for d in data]",setup=setup).repeat(3,1000000))
>>> 2.2488374650129117

>>> min(timeit.Timer("[{k: x[k] for x in d.values() for k in x} for d in data]",setup=setup).repeat(3,1000000))
>>> 1.8990078769857064

>>> code = """
      L = []
      for d in data:
          temp = {}
          for key in d:
              temp.update(d[key])

          L.append(temp)
    """

>>> min(timeit.Timer(code,setup=setup).repeat(3,1000000))
>>> 1.4258553800173104

>>> setup = """
      from itertools import chain
      data = [{'b': {'dislike': 'juice', 'favourite': 'coke'}, 'a': {'l': 'Apple', 'b': 'Milk', 'd': 'Meatball'}}, {'b': {'dislike': 'juice3', 'favourite': 'coke2'}, 'a': {'l': 'Apple1', 'b': 'Milk1', 'd': 'Meatball2'}}]
    """
>>> min(timeit.Timer("[dict(chain(*map(dict.items, d.values()))) for d in data]",setup=setup).repeat(3,1000000))
>>> 3.774383604992181

Solution 4

If you have nested dictionaries with only 'a' and 'b' keys, then I suggest the following solution I find fast and very easy to understand (for readability purpose):

L = [x['a'] for x in data]
b = [x['b'] for x in data]

for i in range(len(L)):
    L[i].update(b[i])

# timeit ~1.4

print(L)
Share:
11,190
jezrael
Author by

jezrael

Please don't hesitate to contact me: import pandas as pd d = {0: [1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0], 1: [16, 1, 0, 15, 7, 14, 13, 7, 3, 10, 14, 8, 10, 3, 5, 4, 8, 13, 6, 2, 11, 11, 0, 12, 9, 16, 12, 15, 9, 1, 4, 5, 6, 2], 2: [32, 32, 32, 32, 32, 33, 46, 64, 97, 97, 99, 99, 100, 101, 101, 102, 103, 103, 103, 104, 105, 105, 106, 108, 109, 109, 110, 111, 111, 111, 112, 112, 121, 122]} print (pd.DataFrame(d).pivot(1,0,2).applymap(chr).agg(''.join)) Most time repeating dupes, not easy find: pivot dupe https://stackoverflow.com/q/47152691/ booelan indexing dupe https://stackoverflow.com/q/17071871 idxmax + groupby dupe https://stackoverflow.com/q/15705630 idxmin + groupby dupe https://stackoverflow.com/q/23394476 melt dupe https://stackoverflow.com/q/28654047 explode dupe https://stackoverflow.com/q/12680754 cumcount dupe https://stackoverflow.com/q/23435270 map dupe https://stackoverflow.com/q/24216425 groupby+size+unstack dupe https://stackoverflow.com/q/39132742 https://stackoverflow.com/q/38278603 sorting inplace dupe https://stackoverflow.com/q/42613581 factorize dupe https://stackoverflow.com/q/39357882 groupby+size dupe https://stackoverflow.com/q/19384532 groupby+ mean dupe https://stackoverflow.com/q/30482071 transform sum dupe https://stackoverflow.com/q/30244952 transform size dupe https://stackoverflow.com/q/37189878 keyerror dupe https://stackoverflow.com/q/43736163 merge/map dupe https://stackoverflow.com/q/53010406 value_count dupe https://stackoverflow.com/q/15411158 numpy select, where dupe https://stackoverflow.com/q/19913659 wide_to_long dupe https://stackoverflow.com/q/55766565 reset_index dupe https://stackoverflow.com/q/36932759

Updated on June 14, 2022

Comments

  • jezrael
    jezrael almost 2 years

    List of dictionaries:

    data = [{
             'a':{'l':'Apple',
                    'b':'Milk',
                    'd':'Meatball'},
             'b':{'favourite':'coke',
                  'dislike':'juice'}
             },
             {
             'a':{'l':'Apple1',
                    'b':'Milk1',
                    'd':'Meatball2'},
             'b':{'favourite':'coke2',
                  'dislike':'juice3'}
             }, ...
    ]
    

    I need to join all nested dictionaries to reach at the expected output:

     [{'d': 'Meatball', 'b': 'Milk', 'l': 'Apple', 'dislike': 'juice', 'favourite': 'coke'},
      {'d': 'Meatball2', 'b': 'Milk1', 'l': 'Apple1', 'dislike': 'juice3', 'favourite': 'coke2'}]
    

    I try nested list comprehension, but cannot join dict together:

    L = [y for x in data for y in x.values()]
    print (L)
    
    [{'d': 'Meatball', 'b': 'Milk', 'l': 'Apple'}, 
     {'dislike': 'juice', 'favourite': 'coke'}, 
    {'d': 'Meatball2', 'b': 'Milk1', 'l': 'Apple1'}, 
     {'dislike': 'juice3', 'favourite': 'coke2'}]
    

    I am looking for the fastest solution.