selecting from multi-index pandas
Solution 1
One way is to use the get_level_values
Index method:
In [11]: df
Out[11]:
0
A B
1 4 1
2 5 2
3 6 3
In [12]: df.iloc[df.index.get_level_values('A') == 1]
Out[12]:
0
A B
1 4 1
In 0.13 you'll be able to use xs
with drop_level
argument:
df.xs(1, level='A', drop_level=False) # axis=1 if columns
Note: if this were column MultiIndex rather than index, you could use the same technique:
In [21]: df1 = df.T
In [22]: df1.iloc[:, df1.columns.get_level_values('A') == 1]
Out[22]:
A 1
B 4
0 1
Solution 2
You can also use query
which is very readable in my opinion and straightforward to use:
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [10, 20, 50, 80], 'C': [6, 7, 8, 9]})
df = df.set_index(['A', 'B'])
C
A B
1 10 6
2 20 7
3 50 8
4 80 9
For what you had in mind you can now simply do:
df.query('A == 1')
C
A B
1 10 6
You can also have more complex queries using and
df.query('A >= 1 and B >= 50')
C
A B
3 50 8
4 80 9
and or
df.query('A == 1 or B >= 50')
C
A B
1 10 6
3 50 8
4 80 9
You can also query on different index levels, e.g.
df.query('A == 1 or C >= 8')
will return
C
A B
1 10 6
3 50 8
4 80 9
If you want to use variables inside your query, you can use @
:
b_threshold = 20
c_threshold = 8
df.query('B >= @b_threshold and C <= @c_threshold')
C
A B
2 20 7
3 50 8
Solution 3
You can use DataFrame.xs()
:
In [36]: df = DataFrame(np.random.randn(10, 4))
In [37]: df.columns = [np.random.choice(['a', 'b'], size=4).tolist(), np.random.choice(['c', 'd'], size=4)]
In [38]: df.columns.names = ['A', 'B']
In [39]: df
Out[39]:
A b a
B d d d d
0 -1.406 0.548 -0.635 0.576
1 -0.212 -0.583 1.012 -1.377
2 0.951 -0.349 -0.477 -1.230
3 0.451 -0.168 0.949 0.545
4 -0.362 -0.855 1.676 -2.881
5 1.283 1.027 0.085 -1.282
6 0.583 -1.406 0.327 -0.146
7 -0.518 -0.480 0.139 0.851
8 -0.030 -0.630 -1.534 0.534
9 0.246 -1.558 -1.885 -1.543
In [40]: df.xs('a', level='A', axis=1)
Out[40]:
B d d
0 -0.635 0.576
1 1.012 -1.377
2 -0.477 -1.230
3 0.949 0.545
4 1.676 -2.881
5 0.085 -1.282
6 0.327 -0.146
7 0.139 0.851
8 -1.534 0.534
9 -1.885 -1.543
If you want to keep the A
level (the drop_level
keyword argument is only available starting from v0.13.0):
In [42]: df.xs('a', level='A', axis=1, drop_level=False)
Out[42]:
A a
B d d
0 -0.635 0.576
1 1.012 -1.377
2 -0.477 -1.230
3 0.949 0.545
4 1.676 -2.881
5 0.085 -1.282
6 0.327 -0.146
7 0.139 0.851
8 -1.534 0.534
9 -1.885 -1.543
Solution 4
Understanding how to access multi-indexed pandas DataFrame can help you with all kinds of task like that.
Copy paste this in your code to generate example:
# hierarchical indices and columns
index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],
names=['year', 'visit'])
columns = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']],
names=['subject', 'type'])
# mock some data
data = np.round(np.random.randn(4, 6), 1)
data[:, ::2] *= 10
data += 37
# create the DataFrame
health_data = pd.DataFrame(data, index=index, columns=columns)
health_data
Will give you table like this:
Standard access by column
health_data['Bob']
type HR Temp
year visit
2013 1 22.0 38.6
2 52.0 38.3
2014 1 30.0 38.9
2 31.0 37.3
health_data['Bob']['HR']
year visit
2013 1 22.0
2 52.0
2014 1 30.0
2 31.0
Name: HR, dtype: float64
# filtering by column/subcolumn - your case:
health_data['Bob']['HR']==22
year visit
2013 1 True
2 False
2014 1 False
2 False
health_data['Bob']['HR'][2013]
visit
1 22.0
2 52.0
Name: HR, dtype: float64
health_data['Bob']['HR'][2013][1]
22.0
Access by row
health_data.loc[2013]
subject Bob Guido Sue
type HR Temp HR Temp HR Temp
visit
1 22.0 38.6 40.0 38.9 53.0 37.5
2 52.0 38.3 42.0 34.6 30.0 37.7
health_data.loc[2013,1]
subject type
Bob HR 22.0
Temp 38.6
Guido HR 40.0
Temp 38.9
Sue HR 53.0
Temp 37.5
Name: (2013, 1), dtype: float64
health_data.loc[2013,1]['Bob']
type
HR 22.0
Temp 38.6
Name: (2013, 1), dtype: float64
health_data.loc[2013,1]['Bob']['HR']
22.0
Slicing multi-index
idx=pd.IndexSlice
health_data.loc[idx[:,1], idx[:,'HR']]
subject Bob Guido Sue
type HR HR HR
year visit
2013 1 22.0 40.0 53.0
2014 1 30.0 52.0 45.0
Solution 5
You can use DataFrame.loc
:
>>> df.loc[1]
Example
>>> print(df)
result
A B C
1 1 1 6
2 9
2 1 8
2 11
2 1 1 7
2 10
2 1 9
2 12
>>> print(df.loc[1])
result
B C
1 1 6
2 9
2 1 8
2 11
>>> print(df.loc[2, 1])
result
C
1 7
2 10
Related videos on Youtube
silencer
Updated on November 08, 2021Comments
-
silencer over 2 years
I have a multi-index data frame with columns 'A' and 'B'.
Is there is a way to select rows by filtering on one column of the multi-index without resetting the index to a single column index?
For Example.
# has multi-index (A,B) df #can I do this? I know this doesn't work because the index is multi-index so I need to specify a tuple df.ix[df.A ==1]
-
Andy Hayden over 10 yearspossible duplicate of How to update a subset of a MultiIndexed pandas DataFrame
-
cs95 almost 4 yearsRelated: Select rows in pandas MultiIndex DataFrame (a broad discussion on the same topic by me).
-
-
Andy Hayden over 10 yearsHa, I had just updated my answer with that, Note: only available in 0.13.
-
Phillip Cloud over 10 yearsOh, good to know. I never remember which little conveniences are added in each version.
-
Andy Hayden over 10 yearsLol, in fact this question is a dupe of the one which inspired that convenience! :)
-
obchardon almost 6 yearsGreat answer, way more readable indeed. Do you know if it is possible to query two field on different index level like:
df.query('A == 1 or C >= 8')
-
Cleb almost 6 years@obchardon: That seems to work fine; I edited my answer using your example.
-
Solly over 5 yearsI have times and strings as multiindex which makes problems in the string expression. However,
df.query()
works just fine with variables if they are refered to with an '@' inside the expression in query, e.g.df.query('A == @var
) for a variablevar
in the environment. -
Cleb over 5 years@Solly: Thanks, I added this to the answer.
-
M__ over 4 yearsThis is the best of the modern approaches IMO, where df.loc[2, 1]['result'] will handle multi-columns
-
Lamma about 4 yearsWhere is the multi indexing here though?
-
Cleb about 4 years@Lamma:
A
andB
are the multi-index;C
is just a "normal" column. -
Lamma about 4 years@Cleb But you have not set it as a multi index? Or am in wrng assume you need
pd.MultiIndex.from_xxx
for it to be counded as multi index? -
Cleb about 4 years@Lamma:
df.set_index(['A', 'B'])
takes care of the multi-index :) -
Lamma about 4 years@Cleb How does that make them a sub column of C?
-
Lamma about 4 years@Cleb Secondly do oyu know how to make a column that is multi index with no lower level index to the the row index so can can call
df['apples']
and it iwll show the row with apples in the indexed column? -
Cleb about 4 years@Lamma: I don't know what you mean by "sub column". I think it is best if you opened a new question describing the problem in detail using an example. The comment section is not the best place for that.
-
Coddy over 3 yearsthis gives
ValueError: cannot handle a non-unique multi-index!
error -
Coddy over 3 yearsthis works with any number of integers for some reason. e.g.
df.loc[0], df.loc[1]....df.loc[n]
-
ASB about 3 yearsIt is really readable and straightforward, but I timed it and found it's considerably slower than the other alternatives.
-
Cleb about 3 years@ASB: The efficiency depends on the size of your dataset; see detailed timings in this answer.
-
user3697498 about 3 yearsin your access by column, how would you do say Bob&HR with Guido &HR in one go?
-
Hrvoje about 3 years@user3697498 you can use pandas query with multiple conditions: kanoki.org/2020/01/21/…
-
Pascal about 3 yearsI am wondering if this would also allow to select multiple items from one multi-index level? Trying something like
df.xs(['a','b'], level='A', axis=1)
leads to an error:KeyError: 'b'
-
Alexis Cllmb almost 2 yearsOh wow
query
is so cool! I tried passing an@col_name
and it didn't work, however you can do:values = [1, 2] col_name = 'var_1' df.query(f'{col_name} == @values[0] or {col_name} == @values[1]')
and it works like a charm!