How do I Pandas group-by to get sum?
679,626
Solution 1
Use GroupBy.sum
:
df.groupby(['Fruit','Name']).sum()
Out[31]:
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Grapes Bob 35
Tom 87
Tony 15
Oranges Bob 67
Mike 57
Tom 15
Tony 1
Solution 2
Also you can use agg function,
df.groupby(['Name', 'Fruit'])['Number'].agg('sum')
Solution 3
If you want to keep the original columns Fruit
and Name
, use reset_index()
. Otherwise Fruit
and Name
will become part of the index.
df.groupby(['Fruit','Name'])['Number'].sum().reset_index()
Fruit Name Number
Apples Bob 16
Apples Mike 9
Apples Steve 10
Grapes Bob 35
Grapes Tom 87
Grapes Tony 15
Oranges Bob 67
Oranges Mike 57
Oranges Tom 15
Oranges Tony 1
As seen in the other answers:
df.groupby(['Fruit','Name'])['Number'].sum()
Number
Fruit Name
Apples Bob 16
Mike 9
Steve 10
Grapes Bob 35
Tom 87
Tony 15
Oranges Bob 67
Mike 57
Tom 15
Tony 1
Solution 4
Both the other answers accomplish what you want.
You can use the pivot
functionality to arrange the data in a nice table
df.groupby(['Fruit','Name'],as_index = False).sum().pivot('Fruit','Name').fillna(0)
Name Bob Mike Steve Tom Tony
Fruit
Apples 16.0 9.0 10.0 0.0 0.0
Grapes 35.0 0.0 0.0 87.0 15.0
Oranges 67.0 57.0 0.0 15.0 1.0
Solution 5
df.groupby(['Fruit','Name'])['Number'].sum()
You can select different columns to sum numbers.
Author by
Trying_hard
Updated on July 16, 2022Comments
-
Trying_hard almost 2 years
I am using this data frame:
Fruit Date Name Number Apples 10/6/2016 Bob 7 Apples 10/6/2016 Bob 8 Apples 10/6/2016 Mike 9 Apples 10/7/2016 Steve 10 Apples 10/7/2016 Bob 1 Oranges 10/7/2016 Bob 2 Oranges 10/6/2016 Tom 15 Oranges 10/6/2016 Mike 57 Oranges 10/6/2016 Bob 65 Oranges 10/7/2016 Tony 1 Grapes 10/7/2016 Bob 1 Grapes 10/7/2016 Tom 87 Grapes 10/7/2016 Bob 22 Grapes 10/7/2016 Bob 12 Grapes 10/7/2016 Tony 15
I want to aggregate this by
Name
and then by fruit to get a total number ofFruit
perName
. For example:Bob,Apples,16
I tried grouping by
Name
andFruit
but how do I get the total number of Fruit? -
Kingname over 6 yearsHow can pandas knows that I want to sum the col named
Number
? -
Steven G over 6 years@Kingname it's the last column left if you take out NAME and FRUIT. if you add 2 columns left, it would sum both columns
-
Wassadamo over 5 years
Date
is not summed because it hasdtype = string
yes? -
Gaurang Tandon almost 5 yearsThis differs from the accepted answer in that this returns a
Series
whereas the other returns aGroupBy
object. -
Jakub Kukul over 4 years@GaurangTandon to get
DataFrame
object instead (like in the accepted answer), use double square brackets around'Number'
, i.e.:df.groupby(['Name', 'Fruit'])[['Number']].agg('sum')
-
SearchTools-Avi over 4 yearsVery helpful in cleaning up badly-encoded query report.
-
Thomas Gak-Deluen over 4 yearsHow to specify which column to sum?
-
Steven G over 4 years@tgdn df.groupby(['Name', 'Fruit'])['Number'].sum()
-
skdhfgeq2134 over 4 years@StevenG For the answer provided to sum up a specific column, the output comes out as a Pandas series instead of Dataframe. From the comment by Jakub Kukul (in below answer), we can use double square brackets around 'Number' to get a Dataframe.
-
Viktor Avdulov about 3 years@tgdn .sum('col_name')
-
DGKang over 2 yearsThis was exactly what I was looking for!! I was lost how to add the
sum
value fromgroupby()[].sum()
to the "unique" dataframe! Thanks! -
Michael Currie almost 2 yearsThis was what I always dreamed was possible... Thank you!