Reindexing a level of a MultiIndex to arbitrary order in Pandas
10,029
You need to provide an MultiIndex that reorders
In [36]: index = MultiIndex(levels=[['foo', 'bar', 'baz', 'qux'],
['one', 'two', 'three']],
labels=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3],
[0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
names=['first', 'second'])
In [37]: df = DataFrame(np.random.randn(10, 3), index=index,
columns=Index(['A', 'B', 'C'], name='exp'))
In [38]: df
Out[38]:
exp A B C
first second
foo one -1.007742 2.594146 1.211697
two 1.280218 0.799940 0.039380
three -0.501615 -0.136437 0.997753
bar one -0.201222 0.060552 0.480552
two -0.758227 0.457597 -0.648014
baz two -0.326620 1.046366 -2.047380
three 0.395894 1.128850 -1.126649
qux one -0.353886 -1.200079 0.493888
two -0.124532 0.114733 1.991793
three -1.042094 1.079344 -0.153037
Simulate the reordering by doing a sort on the second level
In [39]: idx = df.sortlevel(level='second').index
In [40]: idx
Out[40]:
MultiIndex
[(u'foo', u'one'), (u'bar', u'one'), (u'qux', u'one'), (u'foo', u'two'), (u'bar', u'two'), (u'baz', u'two'), (u'qux', u'two'), (u'foo', u'three'), (u'baz', u'three'), (u'qux', u'three')]
In [41]: df.reindex(idx)
Out[41]:
exp A B C
first second
foo one -1.007742 2.594146 1.211697
bar one -0.201222 0.060552 0.480552
qux one -0.353886 -1.200079 0.493888
foo two 1.280218 0.799940 0.039380
bar two -0.758227 0.457597 -0.648014
baz two -0.326620 1.046366 -2.047380
qux two -0.124532 0.114733 1.991793
foo three -0.501615 -0.136437 0.997753
baz three 0.395894 1.128850 -1.126649
qux three -1.042094 1.079344 -0.153037
A different ordering
In [42]: idx = idx[5:] + idx[:5]
In [43]: idx
Out[43]:
MultiIndex
[(u'bar', u'one'), (u'bar', u'two'), (u'baz', u'three'), (u'baz', u'two'), (u'foo', u'one'), (u'foo', u'three'), (u'foo', u'two'), (u'qux', u'one'), (u'qux', u'three'), (u'qux', u'two')]
In [44]: df.reindex(idx)
Out[44]:
exp A B C
first second
bar one -0.201222 0.060552 0.480552
two -0.758227 0.457597 -0.648014
baz three 0.395894 1.128850 -1.126649
two -0.326620 1.046366 -2.047380
foo one -1.007742 2.594146 1.211697
three -0.501615 -0.136437 0.997753
two 1.280218 0.799940 0.039380
qux one -0.353886 -1.200079 0.493888
three -1.042094 1.079344 -0.153037
two -0.124532 0.114733 1.991793
Related videos on Youtube
Author by
Chris Fonnesbeck
Updated on September 16, 2022Comments
-
Chris Fonnesbeck over 1 year
I have some code that summarizes a DataFrame containing the famous Titanic dataset as follows:
titanic['agecat'] = pd.cut(titanic.age, [0, 13, 20, 64, 100], labels=['child', 'adolescent', 'adult', 'senior']) titanic.groupby(['agecat', 'pclass','sex'] )['survived'].mean()
This produces the following DataFrame with a MultiIndex based on the
groupby
call:agecat pclass sex adolescent 1 female 1.000000 male 0.200000 2 female 0.923077 male 0.117647 3 female 0.542857 male 0.125000 adult 1 female 0.965517 male 0.343284 2 female 0.868421 male 0.078125 3 female 0.441860 male 0.159184 child 1 female 0.000000 male 1.000000 2 female 1.000000 male 1.000000 3 female 0.483871 male 0.324324 senior 1 female 1.000000 male 0.142857 2 male 0.000000 3 male 0.000000 Name: survived, dtype: float64
However, I want the
agecat
level of the MultiIndex to be naturally ordered, rather than alphabetical ordered, that is:['child', 'adolescent', 'adult', 'senior']
. However, if I try usingreindex
to do this:titanic.groupby(['agecat', 'pclass','sex'])['survived'].mean().reindex( ['child', 'adolescent', 'adult', 'senior'], level='agecat')
it does not have any effect on the resulting DataFrame's MultiIndex. Should this be working, or am I using the wrong approach?
-
Jeff over 10 yearsI think what you are proposing, should work, see the comment here: github.com/pydata/pandas/blob/master/pandas/core/index.py#L1346, pls open an enhancement issue for this
-
Chris Warth over 9 yearsUnfortunately the OP is right,
Dataframe.reindex()
is broken when using thelevel
keyword, even in the latest pandas development branch as of this date. See github.com/pydata/pandas/issues/4088