How to sum negative and positive values separately when using groupby in pandas?

13,478

Solution 1

In [14]:
df.groupby(df['A'])['C'].agg([('negative' , lambda x : x[x < 0].sum()) , ('positive' , lambda x : x[x > 0].sum())])
Out[14]:
     negative   positive
A       
bar -1.418788   2.603452
foo -0.504695   2.880512

Solution 2

You may groupby on A and df['C'] > 0, and unstack the result:

>>> right = df.groupby(['A', df['C'] > 0])['C'].sum().unstack()
>>> right = right.rename(columns={True:'positive', False:'negative'})
>>> right
C    negative  positive
A                      
bar   -3.4423       NaN
foo   -2.6277     0.857

The NaN value is because all the A == bar rows have negative value for C.

if you want to add these to the original frame corresponding to values of groupby key, i.e. A, it would require a left join:

>>> df.join(right, on='A', how='left')
     A      B       C       D  negative  positive
0  foo    one  0.3742  0.3197   -2.6277     0.857
1  bar    one -0.3563 -0.6296   -3.4423       NaN
2  foo    two -0.3902 -1.3879   -2.6277     0.857
3  bar  three -0.7834 -0.9597   -3.4423       NaN
4  foo    two -1.2686 -0.2509   -2.6277     0.857
5  bar    two -2.3025 -1.2960   -3.4423       NaN
6  foo    one -0.9688  1.2477   -2.6277     0.857
7  foo  three  0.4828  1.0047   -2.6277     0.857
Share:
13,478
Alireza
Author by

Alireza

Updated on June 26, 2022

Comments

  • Alireza
    Alireza almost 2 years

    How to sum positive and negative values differently in pandas and put them let's say in positive and negative columns?

    I have this dataframe like below:

    df = pandas.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo'],
                       'B' : ['one', 'one', 'two', 'three', 'two', 'two', 'one', 'three'],
                       'C' : np.random.randn(8), 'D' : np.random.randn(8)})
    

    Output is as below:

    df
         A      B         C         D
    0  foo    one  0.374156  0.319699
    1  bar    one -0.356339 -0.629649
    2  foo    two -0.390243 -1.387909
    3  bar  three -0.783435 -0.959699
    4  foo    two -1.268622 -0.250871
    5  bar    two -2.302525 -1.295991
    6  foo    one -0.968840  1.247675
    7  foo  three  0.482845  1.004697
    

    I used the below code to get negatives:

    df['negative'] = df.groupby('A')['C'].apply(lambda x: x[x<0].sum()).reset_index()]
    

    But the problem is when I want to add it to one of dataframe columns called negative it gives error:

    ValueError: Wrong number of items passed 2, placement implies 1
    

    Again I know what it says that groupby has returned more than one column and cannot assign it to df['negatives'] but I don't know how to solve this part of the problem. I need to have positive col too.

    The desired outcome would be:

        A      Positive   Negative
    0  foo     0.374156  -0.319699
    1  bar     0.356339  -0.629649
    

    What is the right solution to the problem?

  • Alireza
    Alireza over 8 years
    The final df should be grouped by A too. I dont want to group by two criteria, I just want to group by column A in your right dataframe.
  • Alireza
    Alireza over 8 years
    When I groupby A I expect to see just foo and bar in the result. Your solution doesn't meet that. I dont need column B if it is removed I would be OK with that
  • behzad.nouri
    behzad.nouri over 8 years
    @AlirezaHos you do not make sense. you need to edit the question and show what is the output you are looking for
  • Nader Hisham
    Nader Hisham over 8 years
    post a sample of your desired result , and it would be better not to generate random data so that we make sure we have the same result
  • Alireza
    Alireza over 8 years
    I've added The desired outcome would part to question, please have a look.
  • PV8
    PV8 almost 4 years
    this solution is a way faster then the other one! thx for that