Adding total row to pandas DataFrame groupby

11,135

Solution 1

You can try stacking it one level at a time:

m = df.groupby(['Level','Company','Item'])['Value'].sum().unstack(level=['Company','Item'])
m = m.assign(total=m.sum(1))
m = m.stack(level='Company')
m = m.assign(total=m.sum(1))
m = m.stack(level='Item')

The output has duplicate totals though:

Level  Company  Item 
1      X        a        100.0
                b        200.0
                total    300.0
       Y        a         35.0
                b        150.0
                c         35.0
                total    220.0
       total             520.0
                total    520.0
2      X        a         48.0
                b        100.0
                c         50.0
                total    198.0
       Y        a         80.0
                total     80.0
       total             278.0
                total    278.0
dtype: float64

Solution 2

You can use:

m=df.groupby(['Level','Company','Item'])['Value'].sum().unstack()
m.assign(total=m.sum(1)).stack().to_frame('Value')

                     Value
Level Company Item        
1     X       a      100.0
              b      200.0
              total  300.0
      Y       a       35.0
              b      150.0
              c       35.0
              total  220.0
2     X       a       48.0
              b      100.0
              c       50.0
              total  198.0
      Y       a       80.0
              total   80.0

Solution 3

Try this: Basically, this is creating two new dfs from the using the sum of the two groups and concating the three data frames

level = list(map(int, list('111112222')))
company = list('XXYYYXXXY')
item = list('ababcabca')
value = [100,200,35,150,35,48,100,50,80]
col = ['Level', 'Company', 'Item', 'Value']
df = pd.DataFrame([level,company,item,value]).T
df.columns = col

df1 = (df.groupby(['Level', 'Company', 'Item'])['Value'].sum())
df2 = (df1.sum(level=0).to_frame().assign(Company='total').set_index('Company', append=True))
df3 = (df1.groupby(['Level','Company']).sum().to_frame().assign(Item='total').set_index('Item', append=True))

dfx = pd.concat([df1.to_frame().reset_index(),
                 df2.reset_index(),
                 df3.reset_index()],sort=False)
print(dfx)

Output:

   Level Company   Item  Value
0      1       X      a    100
1      1       X      b    200
2      1       Y      a     35
3      1       Y      b    150
4      1       Y      c     35
5      2       X      a     48
6      2       X      b    100
7      2       X      c     50
8      2       Y      a     80
0      1   total    NaN    520
1      2   total    NaN    278
0      1       X  total    300
1      1       Y  total    220
2      2       X  total    198
3      2       Y  total     80

This is not sorted though as you expect. If I concat the 3 dfs without resetting the index I'm getting the expected sort order, but the index is a multi-index column

dfx = pd.concat([df1.to_frame(), df2, df3]).sort_index()

Output

               Value
(1, X, a)        100
(1, X, b)        200
(1, X, total)    300
(1, Y, a)         35
(1, Y, b)        150
(1, Y, c)         35
(1, Y, total)    220
(1, total)       520
(2, X, a)         48
(2, X, b)        100
(2, X, c)         50
(2, X, total)    198
(2, Y, a)         80
(2, Y, total)     80
(2, total)       278

I am not sure how to convert this to columns as in your df.

Share:
11,135
user12392864
Author by

user12392864

Updated on June 13, 2022

Comments

  • user12392864
    user12392864 almost 2 years

    I am aware of this link but I didn't manage to solve my problem.

    I have this below DataFrame from pandas.DataFrame.groupby().sum():

                                                              Value
    Level      Company         Item
        1            X            a                             100
                                  b                             200
                     Y            a                              35
                                  b                             150
                                  c                              35
        2            X            a                              48
                                  b                             100
                                  c                              50
                     Y            a                              80
    

    and would like to add total rows for each level of index that I have to get:

                                                              Value
    Level      Company         Item
        1            X            a                             100
                                  b                             200
                              Total                             300
                     Y            a                              35
                                  b                             150
                                  c                              35
                              Total                             520
                 Total                                          820
        2            X            a                              48
                                  b                             100
                                  c                              50
                 Total                                          198
                     Y            a                              80
                              Total                              80
                   Total                                        278
    Total                                                      1098
    

    As request

    level = list(map(int, list('111112222')))
    company = list('XXYYYXXXY')
    item = list('ababcabca')
    value = [100,200,35,150,35,48,100,50,80]
    col = ['Level', 'Company', 'Item', 'Value']
    df = pd.DataFrame([level,company,item,value]).T
    df.columns = col
    df.groupby(['Level', 'Company', 'Item']).sum()
    
  • davidbilla
    davidbilla over 4 years
    Is this what you're expecting?
  • user12392864
    user12392864 over 4 years
    Exactly what I wanted, thanks! Could you help on stackoverflow.com/questions/59364298/… ? @Yen
  • user12392864
    user12392864 over 4 years
    The last dataframe has a KeyError: ‘Level’
  • davidbilla
    davidbilla over 4 years
    Do you have a column called 'Level' in your dataframe? Can you post which statement exactly you are getting the error?
  • user12392864
    user12392864 over 4 years
    I do, on the initial one. The error is on the 4th line. df1 is a Series so you can’t groupby I guess this is the issue.
  • davidbilla
    davidbilla over 4 years
    Using your example, I am able to run this statement print(df1.groupby(['Level','Company']).sum().to_frame().assi‌​gn(Company='total').‌​set_index('Company', append=True)) without any errors.
  • user12392864
    user12392864 over 4 years
    is the df1 on df1.groupby(['Level','Company']).sum().to_frame().assign(Com‌​pany='total').set_in‌​dex('Company', append=True) the same than the df1 in this following line: df1 = (df.groupby(['Level', 'Company', 'Item'])['Value'].sum()). If yes, how could we groupby df1 once more ? After the first line, df1 becomes a pandas.Series and we can't groupby pandas.Series.
  • davidbilla
    davidbilla over 4 years
    Yes, df1 is the result of df1 = (df.groupby(['Level', 'Company', 'Item'])['Value'].sum()). See pandas.Series.groupby
  • user12392864
    user12392864 over 4 years
    Can I see your output please?
  • user12392864
    user12392864 over 4 years
  • user12392864
    user12392864 over 4 years
    thanks. It doesn’t work on Jupyter but does work on Pycharm.
  • davidbilla
    davidbilla over 4 years
    Super. That may be due to version differences. Are you using the same pandas version in both IDEs?