Python Pandas sorting after groupby and aggregate

14,325

You can use sort_values, but first reset_index and then set_index:

#simplier aggregation
days_off_yearly = persons.groupby(["from_year", "name"])['out_days'].sum()
print(days_off_yearly)
from_year  name 
2010       John     17
2011       John     15
           John1    18
2012       John     10
           John4    11
           John6     4
Name: out_days, dtype: int64

print (days_off_yearly.reset_index()
                      .sort_values(['from_year','out_days'],ascending=False)
                      .set_index(['from_year','name']))
                 out_days
from_year name           
2012      John4        11
          John         10
          John6         4
2011      John1        18
          John         15
2010      John         17
Share:
14,325
Tomas Rasymas
Author by

Tomas Rasymas

Updated on June 16, 2022

Comments

  • Tomas Rasymas
    Tomas Rasymas almost 2 years

    I am trying to sort data (Pandas) after grouping and aggregating and I am stuck. My data:

    data = {'from_year': [2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012],
        'name': ['John', 'John1', 'John', 'John', 'John4', 'John', 'John1', 'John6'],
        'out_days': [11, 8, 10, 15, 11, 6, 10, 4]}
    persons = pd.DataFrame(data, columns=["from_year", "name", "out_days"])
    
    days_off_yearly = persons.groupby(["from_year", "name"]).agg({"out_days": [np.sum]})
    
    print(days_off_yearly)
    

    After that I have my data sorted:

                    out_days
                         sum
    from_year name          
    2010      John        17
    2011      John        15
              John1       18
    2012      John        10
              John4       11
              John6        4
    

    I want to sort my data by from_year and out_days sum and expecting data to be:

                    out_days
                         sum
    from_year name          
    2012      John4       11
              John        10
              John6        4    
    2011      John1       18
              John        15
    2010      John        17
    

    I am trying

    print(days_off_yearly.sort_values(["from_year", ("out_days", "sum")], ascending=False).head(10))
    

    But getting KeyError: 'from_year'.

    Any help appreciated.