selecting a particular row from groupby object in python

13,139

I cobbled this together using this: Python : Getting the Row which has the max value in groups using groupby

So basically we can groupby the 'id' column, then call transform on the 'year' column and create a boolean index where the year matches the max year value for each 'id':

In [103]:

df[df.groupby(['id'])['year'].transform(max) == df['year']]
Out[103]:
   id  marks  year
0   1     18  2013
2   3     16  2014
4   1     19  2013
6   2     18  2014
Share:
13,139
Shiva Prakash
Author by

Shiva Prakash

Data Science enthusiast with hands on experience in Python, R, SQL and Tableau.

Updated on July 18, 2022

Comments

  • Shiva Prakash
    Shiva Prakash almost 2 years
    id    marks  year 
    1     18      2013
    1     25      2012
    3     16      2014
    2     16      2013
    1     19      2013
    3     25      2013
    2     18      2014
    

    suppose now I group the above on id by python command.
    grouped = file.groupby(file.id)

    I would like to get a new file with only the row in each group with recent year that is highest of all the year in the group.

    Please let me know the command, I am trying with apply but it ll only given the boolean expression. I want the entire row with latest year.