Pandas : Sum multiple columns and get results in multiple columns

10,006

Solution 1

When you pass a dictionary or callable to groupby it gets applied to an axis. I specified axis one which is columns.

d = dict(A='AB', B='AB', C='CD', D='CD')
df.groupby(d, axis=1).sum()

Solution 2

Use concat with sum:

df = df.set_index('idx')
df = pd.concat([df[['A', 'B']].sum(1), df[['C', 'D']].sum(1)], axis=1, keys=['AB','CD'])
print( df)
     AB  CD
idx        
J     3   4
K     9   8
L    15  12
M     3   7
N     9  11
O    15  15

Solution 3

Does this do what you need? By using axis=1 with DataFrame.apply, you can use the data that you want in a row to construct a new column. Then you can drop the columns that you don't want anymore.

In [1]: import pandas as pd
In [5]: df = pd.DataFrame(columns=['A', 'B', 'C', 'D'], data=[[1, 2, 3, 4], [1, 2, 3, 4]])

In [6]: df
Out[6]:
   A  B  C  D
0  1  2  3  4
1  1  2  3  4

In [7]: df['CD'] = df.apply(lambda x: x['C'] + x['D'], axis=1)

In [8]: df
Out[8]:
   A  B  C  D  CD
0  1  2  3  4   7
1  1  2  3  4   7

In [13]: df.drop(['C', 'D'], axis=1)
Out[13]:
   A  B  CD
0  1  2   7
1  1  2   7
Share:
10,006
Akio Omi
Author by

Akio Omi

Created website by my own. I was using Ruby for 10 years. One day in 2016, I converted to Python, and my impression is I can't go back to Ruby any more (even Ruby is newer than Python).

Updated on June 14, 2022

Comments

  • Akio Omi
    Akio Omi almost 2 years

    I have a "sample.txt" like this.

    idx A   B   C   D   cat
    J   1   2   3   1   x
    K   4   5   6   2   x
    L   7   8   9   3   y
    M   1   2   3   4   y
    N   4   5   6   5   z
    O   7   8   9   6   z
    

    With this dataset, I want to get sum in row and column. In row, it is not a big deal. I made result like this.

    ### MY CODE ###
    import pandas as pd
    
    df = pd.read_csv('sample.txt',sep="\t",index_col='idx')
    df.info()
    
    df2 = df.groupby('cat').sum()
    print( df2 )
    

    The result is like this.

          A   B   C   D
    cat                
    x     5   7   9   3
    y     8  10  12   7
    z    11  13  15  11
    

    But I don't know how to write a code to get result like this. (simply add values in column A and B as well as column C and D)

        AB  CD
    J   3   4
    K   9   8
    L   15  12
    M   3   7
    N   9   11
    O   15  15
    

    Could anybody help how to write a code?

    By the way, I don't want to do like this. (it looks too dull, but if it is the only way, I'll deem it)

    df2 = df['A'] + df['B']
    df3 = df['C'] + df['D']
    df = pd.DataFrame([df2,df3],index=['AB','CD']).transpose()
    print( df )
    
  • Bharath
    Bharath over 6 years
    Well this is new for me
  • Akio Omi
    Akio Omi over 6 years
    Thank you very much for your answer. Your answer is what I was looking for few hours.
  • Akio Omi
    Akio Omi over 6 years
    Thank you very much for your answer.
  • jezrael
    jezrael over 6 years
    I think pir solution is much better, so use it ;)
  • Akio Omi
    Akio Omi over 6 years
    But sometimes, I need to sum 'A' and 'B' and 'C'. Sometimes sum 'B' and 'C' and 'D'. Which columns and how many columns are variable.
  • Akio Omi
    Akio Omi over 6 years
    In your case, I have to keep 'A' and 'B' fixed.
  • Akio Omi
    Akio Omi over 6 years
    Yes, and piRSquared's was what I was looking for. But please allow me to say thank you to your quick response.
  • Akio Omi
    Akio Omi over 6 years
    Thank you for your quick answer. Though I take piRSquared's answer, I want to say thank you to you.