How do I Pandas group-by to get sum?

679,626

Solution 1

Use GroupBy.sum:

df.groupby(['Fruit','Name']).sum()

Out[31]: 
               Number
Fruit   Name         
Apples  Bob        16
        Mike        9
        Steve      10
Grapes  Bob        35
        Tom        87
        Tony       15
Oranges Bob        67
        Mike       57
        Tom        15
        Tony        1

Solution 2

Also you can use agg function,

df.groupby(['Name', 'Fruit'])['Number'].agg('sum')

Solution 3

If you want to keep the original columns Fruit and Name, use reset_index(). Otherwise Fruit and Name will become part of the index.

df.groupby(['Fruit','Name'])['Number'].sum().reset_index()

Fruit   Name       Number
Apples  Bob        16
Apples  Mike        9
Apples  Steve      10
Grapes  Bob        35
Grapes  Tom        87
Grapes  Tony       15
Oranges Bob        67
Oranges Mike       57
Oranges Tom        15
Oranges Tony        1

As seen in the other answers:

df.groupby(['Fruit','Name'])['Number'].sum()

               Number
Fruit   Name         
Apples  Bob        16
        Mike        9
        Steve      10
Grapes  Bob        35
        Tom        87
        Tony       15
Oranges Bob        67
        Mike       57
        Tom        15
        Tony        1

Solution 4

Both the other answers accomplish what you want.

You can use the pivot functionality to arrange the data in a nice table

df.groupby(['Fruit','Name'],as_index = False).sum().pivot('Fruit','Name').fillna(0)



Name    Bob     Mike    Steve   Tom    Tony
Fruit                   
Apples  16.0    9.0     10.0    0.0     0.0
Grapes  35.0    0.0     0.0     87.0    15.0
Oranges 67.0    57.0    0.0     15.0    1.0

Solution 5

df.groupby(['Fruit','Name'])['Number'].sum()

You can select different columns to sum numbers.

Share:
679,626
Trying_hard
Author by

Trying_hard

Updated on July 16, 2022

Comments

  • Trying_hard
    Trying_hard almost 2 years

    I am using this data frame:

    Fruit   Date      Name  Number
    Apples  10/6/2016 Bob    7
    Apples  10/6/2016 Bob    8
    Apples  10/6/2016 Mike   9
    Apples  10/7/2016 Steve 10
    Apples  10/7/2016 Bob    1
    Oranges 10/7/2016 Bob    2
    Oranges 10/6/2016 Tom   15
    Oranges 10/6/2016 Mike  57
    Oranges 10/6/2016 Bob   65
    Oranges 10/7/2016 Tony   1
    Grapes  10/7/2016 Bob    1
    Grapes  10/7/2016 Tom   87
    Grapes  10/7/2016 Bob   22
    Grapes  10/7/2016 Bob   12
    Grapes  10/7/2016 Tony  15
    

    I want to aggregate this by Name and then by fruit to get a total number of Fruit per Name. For example:

    Bob,Apples,16
    

    I tried grouping by Name and Fruit but how do I get the total number of Fruit?

  • Kingname
    Kingname over 6 years
    How can pandas knows that I want to sum the col named Number ?
  • Steven G
    Steven G over 6 years
    @Kingname it's the last column left if you take out NAME and FRUIT. if you add 2 columns left, it would sum both columns
  • Wassadamo
    Wassadamo over 5 years
    Date is not summed because it has dtype = string yes?
  • Gaurang Tandon
    Gaurang Tandon almost 5 years
    This differs from the accepted answer in that this returns a Series whereas the other returns a GroupBy object.
  • Jakub Kukul
    Jakub Kukul over 4 years
    @GaurangTandon to get DataFrame object instead (like in the accepted answer), use double square brackets around 'Number', i.e.: df.groupby(['Name', 'Fruit'])[['Number']].agg('sum')
  • SearchTools-Avi
    SearchTools-Avi over 4 years
    Very helpful in cleaning up badly-encoded query report.
  • Thomas Gak-Deluen
    Thomas Gak-Deluen over 4 years
    How to specify which column to sum?
  • Steven G
    Steven G over 4 years
    @tgdn df.groupby(['Name', 'Fruit'])['Number'].sum()
  • skdhfgeq2134
    skdhfgeq2134 over 4 years
    @StevenG For the answer provided to sum up a specific column, the output comes out as a Pandas series instead of Dataframe. From the comment by Jakub Kukul (in below answer), we can use double square brackets around 'Number' to get a Dataframe.
  • Viktor Avdulov
    Viktor Avdulov about 3 years
    @tgdn .sum('col_name')
  • DGKang
    DGKang over 2 years
    This was exactly what I was looking for!! I was lost how to add the sum value from groupby()[].sum() to the "unique" dataframe! Thanks!
  • Michael Currie
    Michael Currie almost 2 years
    This was what I always dreamed was possible... Thank you!