Groupby column and find min and max of each group

15,469

Solution 1

You can use a assign + abs, followed by groupby + agg:

df = (df.assign(Data_Value=df['Data_Value'].abs())
       .groupby(['Day'])['Data_Value'].agg([('Min' , 'min'), ('Max', 'max')])
       .add_prefix('Day'))

df 
       DayMin  DayMax
Day                  
01-01       0     115
01-02       0      79

Solution 2

Use

In [5265]: def maxmin(x):
      ...:     mx = x[x.Element == 'TMAX'].Data_Value.max()
      ...:     mn = x[x.Element == 'TMIN'].Data_Value.min()
      ...:     return pd.Series({'DayMin': mn, 'DayMax': mx})
      ...:

In [5266]: df.groupby('Day').apply(maxmin)
Out[5266]:
       DayMax  DayMin
Day
01-01     115       0
01-02      79       0

Also,

In [5268]: df.groupby('Day').apply(maxmin).reset_index()
Out[5268]:
     Day  DayMax  DayMin
0  01-01     115       0
1  01-02      79       0

Or, use query instead of x[x.Element == 'TMAX'] as x.query("Element == 'TMAX'")

Solution 3

Create duplicate columns and find min and max using agg i.e

ndf = df.assign(DayMin = df['Data_Value'].abs(),DayMax=df['Data_Value'].abs()).groupby('Day')\
     .agg({'DayMin':'min','DayMax':'max'})
     DayMax  DayMin
Day                  
01-01     115       0
01-02      79       0

Incase you want both TMIN and TMAX then groupby(['Day','Element'])

Share:
15,469
The Cat
Author by

The Cat

Recent computer science graduate working as a Java developer in London, the best city in the world.

Updated on June 05, 2022

Comments

  • The Cat
    The Cat almost 2 years

    I have the following dataset,

            Day    Element  Data_Value
    6786    01-01   TMAX    112
    9333    01-01   TMAX    101
    9330    01-01   TMIN    60
    11049   01-01   TMIN    0
    6834    01-01   TMIN    25
    11862   01-01   TMAX    113
    1781    01-01   TMAX    115
    11042   01-01   TMAX    105
    1110    01-01   TMAX    111
    651     01-01   TMIN    44
    11350   01-01   TMIN    83
    1798    01-02   TMAX    70
    4975    01-02   TMAX    79
    12774   01-02   TMIN    0
    3977    01-02   TMIN    60
    2485    01-02   TMAX    73
    4888    01-02   TMIN    31
    11836   01-02   TMIN    26
    11368   01-02   TMAX    71
    2483    01-02   TMIN    26
    

    I want to group by the Day and then find the overall min of TMIN an the max of TMAX and put these in to a data frame, so I get an output like...

    Day    DayMin    DayMax
    01-01  0         115
    01-02  0         79
    

    I know I need to do,

    df.groupby(by='Day')
    

    but I am a stuck with the next step - should create columns to store the TMAX and TMIN values?

  • Bharath
    Bharath over 6 years
    So idiot me. I totally forgot we can pass list to agg :). Its ohk I got column names atleast as a defence
  • cs95
    cs95 over 6 years
    @Bharathshetty Sorry, I didn't see your answer when editing mine. I would believe you need to take the absolute condition into account as well.
  • Zero
    Zero over 6 years
    You are making an assumption that TMIN will not have value greater than TMAX?
  • cs95
    cs95 over 6 years
    @Zero I didn't read anything into the data. Actually, I might've misread the question.
  • The Cat
    The Cat over 6 years
    It is also possible that TMIN is negative, can I avoid using abs() here?
  • cs95
    cs95 over 6 years
    @TheCat Welp, you did say you wanted the absolute... hence the use of abs. If you don't want it, just remove the assign call. Everything else remains.
  • The Cat
    The Cat over 6 years
    Oh yeh, my bad, I sort of meant rather the overall min/max rather than any kind of average.
  • cs95
    cs95 over 6 years
    @TheCat No problem. The rest should still be the same.