Pandas : Sum multiple columns and get results in multiple columns
Solution 1
When you pass a dictionary or callable to groupby
it gets applied to an axis. I specified axis one which is columns.
d = dict(A='AB', B='AB', C='CD', D='CD')
df.groupby(d, axis=1).sum()
Solution 2
Use concat
with sum
:
df = df.set_index('idx')
df = pd.concat([df[['A', 'B']].sum(1), df[['C', 'D']].sum(1)], axis=1, keys=['AB','CD'])
print( df)
AB CD
idx
J 3 4
K 9 8
L 15 12
M 3 7
N 9 11
O 15 15
Solution 3
Does this do what you need? By using axis=1 with DataFrame.apply, you can use the data that you want in a row to construct a new column. Then you can drop the columns that you don't want anymore.
In [1]: import pandas as pd
In [5]: df = pd.DataFrame(columns=['A', 'B', 'C', 'D'], data=[[1, 2, 3, 4], [1, 2, 3, 4]])
In [6]: df
Out[6]:
A B C D
0 1 2 3 4
1 1 2 3 4
In [7]: df['CD'] = df.apply(lambda x: x['C'] + x['D'], axis=1)
In [8]: df
Out[8]:
A B C D CD
0 1 2 3 4 7
1 1 2 3 4 7
In [13]: df.drop(['C', 'D'], axis=1)
Out[13]:
A B CD
0 1 2 7
1 1 2 7
Akio Omi
Created website by my own. I was using Ruby for 10 years. One day in 2016, I converted to Python, and my impression is I can't go back to Ruby any more (even Ruby is newer than Python).
Updated on June 14, 2022Comments
-
Akio Omi almost 2 years
I have a "sample.txt" like this.
idx A B C D cat J 1 2 3 1 x K 4 5 6 2 x L 7 8 9 3 y M 1 2 3 4 y N 4 5 6 5 z O 7 8 9 6 z
With this dataset, I want to get sum in row and column. In row, it is not a big deal. I made result like this.
### MY CODE ### import pandas as pd df = pd.read_csv('sample.txt',sep="\t",index_col='idx') df.info() df2 = df.groupby('cat').sum() print( df2 )
The result is like this.
A B C D cat x 5 7 9 3 y 8 10 12 7 z 11 13 15 11
But I don't know how to write a code to get result like this. (simply add values in column A and B as well as column C and D)
AB CD J 3 4 K 9 8 L 15 12 M 3 7 N 9 11 O 15 15
Could anybody help how to write a code?
By the way, I don't want to do like this. (it looks too dull, but if it is the only way, I'll deem it)
df2 = df['A'] + df['B'] df3 = df['C'] + df['D'] df = pd.DataFrame([df2,df3],index=['AB','CD']).transpose() print( df )
-
Bharath over 6 yearsWell this is new for me
-
Akio Omi over 6 yearsThank you very much for your answer. Your answer is what I was looking for few hours.
-
Akio Omi over 6 yearsThank you very much for your answer.
-
jezrael over 6 yearsI think
pir
solution is much better, so use it ;) -
Akio Omi over 6 yearsBut sometimes, I need to sum 'A' and 'B' and 'C'. Sometimes sum 'B' and 'C' and 'D'. Which columns and how many columns are variable.
-
Akio Omi over 6 yearsIn your case, I have to keep 'A' and 'B' fixed.
-
Akio Omi over 6 yearsYes, and piRSquared's was what I was looking for. But please allow me to say thank you to your quick response.
-
Akio Omi over 6 yearsThank you for your quick answer. Though I take piRSquared's answer, I want to say thank you to you.