Groupby column and find min and max of each group
15,469
Solution 1
You can use a assign
+ abs
, followed by groupby
+ agg
:
df = (df.assign(Data_Value=df['Data_Value'].abs())
.groupby(['Day'])['Data_Value'].agg([('Min' , 'min'), ('Max', 'max')])
.add_prefix('Day'))
df
DayMin DayMax
Day
01-01 0 115
01-02 0 79
Solution 2
Use
In [5265]: def maxmin(x):
...: mx = x[x.Element == 'TMAX'].Data_Value.max()
...: mn = x[x.Element == 'TMIN'].Data_Value.min()
...: return pd.Series({'DayMin': mn, 'DayMax': mx})
...:
In [5266]: df.groupby('Day').apply(maxmin)
Out[5266]:
DayMax DayMin
Day
01-01 115 0
01-02 79 0
Also,
In [5268]: df.groupby('Day').apply(maxmin).reset_index()
Out[5268]:
Day DayMax DayMin
0 01-01 115 0
1 01-02 79 0
Or, use query
instead of x[x.Element == 'TMAX']
as x.query("Element == 'TMAX'")
Solution 3
Create duplicate columns and find min and max using agg i.e
ndf = df.assign(DayMin = df['Data_Value'].abs(),DayMax=df['Data_Value'].abs()).groupby('Day')\
.agg({'DayMin':'min','DayMax':'max'})
DayMax DayMin Day 01-01 115 0 01-02 79 0
Incase you want both TMIN and TMAX then groupby(['Day','Element'])
Author by
The Cat
Recent computer science graduate working as a Java developer in London, the best city in the world.
Updated on June 05, 2022Comments
-
The Cat almost 2 years
I have the following dataset,
Day Element Data_Value 6786 01-01 TMAX 112 9333 01-01 TMAX 101 9330 01-01 TMIN 60 11049 01-01 TMIN 0 6834 01-01 TMIN 25 11862 01-01 TMAX 113 1781 01-01 TMAX 115 11042 01-01 TMAX 105 1110 01-01 TMAX 111 651 01-01 TMIN 44 11350 01-01 TMIN 83 1798 01-02 TMAX 70 4975 01-02 TMAX 79 12774 01-02 TMIN 0 3977 01-02 TMIN 60 2485 01-02 TMAX 73 4888 01-02 TMIN 31 11836 01-02 TMIN 26 11368 01-02 TMAX 71 2483 01-02 TMIN 26
I want to group by the Day and then find the overall min of TMIN an the max of TMAX and put these in to a data frame, so I get an output like...
Day DayMin DayMax 01-01 0 115 01-02 0 79
I know I need to do,
df.groupby(by='Day')
but I am a stuck with the next step - should create columns to store the TMAX and TMIN values?
-
Bharath over 6 yearsSo idiot me. I totally forgot we can pass list to agg :). Its ohk I got column names atleast as a defence
-
cs95 over 6 years@Bharathshetty Sorry, I didn't see your answer when editing mine. I would believe you need to take the absolute condition into account as well.
-
Zero over 6 yearsYou are making an assumption that TMIN will not have value greater than TMAX?
-
cs95 over 6 years@Zero I didn't read anything into the data. Actually, I might've misread the question.
-
The Cat over 6 yearsIt is also possible that TMIN is negative, can I avoid using abs() here?
-
cs95 over 6 years@TheCat Welp, you did say you wanted the absolute... hence the use of
abs
. If you don't want it, just remove the assign call. Everything else remains. -
The Cat over 6 yearsOh yeh, my bad, I sort of meant rather the overall min/max rather than any kind of average.
-
cs95 over 6 years@TheCat No problem. The rest should still be the same.