Reindexing a level of a MultiIndex to arbitrary order in Pandas

10,029

You need to provide an MultiIndex that reorders

In [36]: index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
                                   ['one', 'two', 'three']],
                           labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
                                   [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
                           names=['first', 'second'])

In [37]: df = DataFrame(np.random.randn(10, 3), index=index,
                               columns=Index(['A', 'B', 'C'], name='exp'))

In [38]: df
Out[38]: 
exp                  A         B         C
first second                              
foo   one    -1.007742  2.594146  1.211697
      two     1.280218  0.799940  0.039380
      three  -0.501615 -0.136437  0.997753
bar   one    -0.201222  0.060552  0.480552
      two    -0.758227  0.457597 -0.648014
baz   two    -0.326620  1.046366 -2.047380
      three   0.395894  1.128850 -1.126649
qux   one    -0.353886 -1.200079  0.493888
      two    -0.124532  0.114733  1.991793
      three  -1.042094  1.079344 -0.153037

Simulate the reordering by doing a sort on the second level

In [39]: idx = df.sortlevel(level='second').index

In [40]: idx
Out[40]: 
MultiIndex
[(u'foo', u'one'), (u'bar', u'one'), (u'qux', u'one'), (u'foo', u'two'), (u'bar', u'two'), (u'baz', u'two'), (u'qux', u'two'), (u'foo', u'three'), (u'baz', u'three'), (u'qux', u'three')]

In [41]: df.reindex(idx)
Out[41]: 
exp                  A         B         C
first second                              
foo   one    -1.007742  2.594146  1.211697
bar   one    -0.201222  0.060552  0.480552
qux   one    -0.353886 -1.200079  0.493888
foo   two     1.280218  0.799940  0.039380
bar   two    -0.758227  0.457597 -0.648014
baz   two    -0.326620  1.046366 -2.047380
qux   two    -0.124532  0.114733  1.991793
foo   three  -0.501615 -0.136437  0.997753
baz   three   0.395894  1.128850 -1.126649
qux   three  -1.042094  1.079344 -0.153037

A different ordering

In [42]: idx = idx[5:] + idx[:5]

In [43]: idx
Out[43]: 
MultiIndex
[(u'bar', u'one'), (u'bar', u'two'), (u'baz', u'three'), (u'baz', u'two'), (u'foo', u'one'), (u'foo', u'three'), (u'foo', u'two'), (u'qux', u'one'), (u'qux', u'three'), (u'qux', u'two')]

In [44]: df.reindex(idx)
Out[44]: 
exp                  A         B         C
first second                              
bar   one    -0.201222  0.060552  0.480552
      two    -0.758227  0.457597 -0.648014
baz   three   0.395894  1.128850 -1.126649
      two    -0.326620  1.046366 -2.047380
foo   one    -1.007742  2.594146  1.211697
      three  -0.501615 -0.136437  0.997753
      two     1.280218  0.799940  0.039380
qux   one    -0.353886 -1.200079  0.493888
      three  -1.042094  1.079344 -0.153037
      two    -0.124532  0.114733  1.991793
Share:
10,029

Related videos on Youtube

Chris Fonnesbeck
Author by

Chris Fonnesbeck

Updated on September 16, 2022

Comments

  • Chris Fonnesbeck
    Chris Fonnesbeck over 1 year

    I have some code that summarizes a DataFrame containing the famous Titanic dataset as follows:

    titanic['agecat'] = pd.cut(titanic.age, [0, 13, 20, 64, 100], 
                   labels=['child', 'adolescent', 'adult', 'senior'])
    titanic.groupby(['agecat', 'pclass','sex']
                    )['survived'].mean()
    

    This produces the following DataFrame with a MultiIndex based on the groupby call:

    agecat      pclass  sex   
    adolescent  1       female    1.000000
                        male      0.200000
                2       female    0.923077
                        male      0.117647
                3       female    0.542857
                        male      0.125000
    adult       1       female    0.965517
                        male      0.343284
                2       female    0.868421
                        male      0.078125
                3       female    0.441860
                        male      0.159184
    child       1       female    0.000000
                        male      1.000000
                2       female    1.000000
                        male      1.000000
                3       female    0.483871
                        male      0.324324
    senior      1       female    1.000000
                        male      0.142857
                2       male      0.000000
                3       male      0.000000
    Name: survived, dtype: float64
    

    However, I want the agecat level of the MultiIndex to be naturally ordered, rather than alphabetical ordered, that is: ['child', 'adolescent', 'adult', 'senior']. However, if I try using reindex to do this:

    titanic.groupby(['agecat', 'pclass','sex'])['survived'].mean().reindex(
        ['child', 'adolescent', 'adult', 'senior'], level='agecat')
    

    it does not have any effect on the resulting DataFrame's MultiIndex. Should this be working, or am I using the wrong approach?

  • Jeff
    Jeff over 10 years
    I think what you are proposing, should work, see the comment here: github.com/pydata/pandas/blob/master/pandas/core/index.py#L1‌​346, pls open an enhancement issue for this
  • Chris Warth
    Chris Warth over 9 years
    Unfortunately the OP is right, Dataframe.reindex() is broken when using the level keyword, even in the latest pandas development branch as of this date. See github.com/pydata/pandas/issues/4088