How to groupby based on two columns in pandas?

32,483

Solution 1

You need groupby with parameter as_index=False for return DataFrame and aggregating mean:

df = df.groupby(['id','product'], as_index=False)['quantity'].sum()
print (df)
   id product  quantity
0   1       A         5
1   1       B         2
2   2       A         1
3   2       B         1
4   3       B         3

Or add reset_index:

df = df.groupby(['id','product'])['quantity'].sum().reset_index()
print (df)
   id product  quantity
0   1       A         5
1   1       B         2
2   2       A         1
3   2       B         1
4   3       B         3

Solution 2

You can use pivot_table with aggfunc='sum'

df.pivot_table('quantity', ['id', 'product'], aggfunc='sum').reset_index()

   id product  quantity
0   1       A         5
1   1       B         2
2   2       A         1
3   2       B         1
4   3       B         3

Solution 3

You can use groupby and aggregate function

import pandas as pd
df = pd.DataFrame({
    'id': [1,1,1,2,2,3,3],
    'product': ['A','A','B','A','B','B','B'],
    'quantity': [2,3,2,1,1,2,1]
})

print df

   id  product  quantity
0   1     A      2
1   1     A      3
2   1     B      2
3   2     A      1
4   2     B      1
5   3     B      2
6   3     B      1


df = df.groupby(['id','product']).agg({'quantity':'sum'}).reset_index()
print df

    id  product  quantity
0   1     A       5
1   1     B       2
2   2     A       1
3   2     B       1
4   3     B       3
Share:
32,483
ARASH
Author by

ARASH

Updated on November 12, 2020

Comments

  • ARASH
    ARASH over 3 years

    A similar question might have been asked before, but I couldn't find the exact one fitting to my problem. I want to group by a dataframe based on two columns. For exmaple to make this

    id product quantity
    1  A       2
    1  A       3
    1  B       2
    2  A       1
    2  B       1
    3  B       2
    3  B       1
    

    Into this:

    id product quantity
    1  A       5
    1  B       2
    2  A       1
    2  B       1
    3  B       3
    

    Meaning that summation on "quantity" column for same "id" and same "product".