Group By & Aggregate List of Dictionaries in Python

11,220

Solution 1

Yes, use pandas. It's great. You can use the groupby functionality and aggregate by sums, then convert the output to a list of dicts if that is exactly what you want.

import pandas as pd

data = [{"startDate": 123, "endDate": 456, "campaignName": 'abc',
         "campaignCfid": 789, "budgetImpressions": 10},
        {"startDate": 123, "endDate": 456, "campaignName": 'abc',
         "campaignCfid": 789, "budgetImpressions": 50},
        {"startDate": 456, "endDate": 789, "campaignName": 'def',
         "campaignCfid": 123, "budgetImpressions": 80}]

df = pd.DataFrame(data)

grouped = df.groupby(['startDate', 'endDate', 'campaignCfid',
                      'campaignName']).agg(sum)

print grouped.reset_index().to_dict('records')

This prints:

[{'startDate': 123L, 'campaignCfid': 789L, 'endDate': 456L, 'budgetImpressions': 60L, 'campaignName': 'abc'}, {'startDate': 456L, 'campaignCfid': 123L, 'endDate': 789L, 'budgetImpressions': 80L, 'campaignName': 'def'}]

Solution 2

Just to demonstrate that sometimes python is perfectly fine to do this kind of stuff in:

In [11]: from collections import Counter
         from itertools import groupby

In [12]: data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 10}, {"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 50}, {"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]

In [13]: g = groupby(data, lambda x: x.pop('campaignName'))

In [14]: d = {}
         for campaign, campaign_data in g:
             c = Counter()
             for row in campaign_data: c.update(row)
             d[campaign] = c  # if you want a dict rather than Counter, return dict(c) here

In [15]: d
Out[15]:
{'abc': Counter({'campaignCfid': 1578, 'endDate': 912, 'startDate': 246, 'budgetImpressions': 60}),
 'def': Counter({'endDate': 789, 'startDate': 456, 'campaignCfid': 123, 'budgetImpressions': 80})}

If you already have this collection of lists/dicts, it doesn't really make sense to promote this to a DataFrame, it's often cheaper to stay in pure python.

Share:
11,220
user40721
Author by

user40721

Updated on June 14, 2022

Comments

  • user40721
    user40721 almost 2 years

    I have a list of dictionaries which I need to aggregate in Python:

    data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 10}, 
    {"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 50}, 
    {"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]
    

    and I'm looking to aggregate based on budgetImpressions.

    So the final result should be:

    data = [{"startDate": 123, "endDate": 456, "campaignName": "abc", "campaignCfid": 789, "budgetImpressions": 60}, 
    {"startDate": 456, "endDate": 789, "campaignName": "def", "campaignCfid": 123, "budgetImpressions": 80}]
    

    Note every entry with a certain campaignName will always have the same corresponding campaignCfid, startDate and endDate.

    Can this be done in Python? I've tried using itertools without much success. Would it be a better approach to use Pandas?

  • user40721
    user40721 almost 10 years
    Why is there an 'L' appended to all the values?
  • bananafish
    bananafish almost 10 years
    They are long integers.
  • Andy Hayden
    Andy Hayden almost 10 years
    @bananafish it's kind-of strange they are long in integers, I can't actually replicate that. What version of python, numpy, ... are you using?
  • bananafish
    bananafish almost 10 years
    @AndyHayden pandas 0.13.1, numpy 1.8.1 on Windows, using numpy-MKL from here
  • Andy Hayden
    Andy Hayden almost 10 years
    @bananafish I guess this is a Windows thing rather than MKL.
  • ChrisArmstrong
    ChrisArmstrong over 8 years
    This solution incorrectly counts all keys instead of just budgetImpressions
  • Sankalp
    Sankalp almost 6 years
    What is x.pop('campaignName') doing here?
  • Andy Hayden
    Andy Hayden almost 6 years
    @Sankalp pop looks up a value from the dict AND removes that key. Here this means that campaignName doesn't appear in the result Counters.