Pandas Python, select columns based on rows conditions
Solution 1
Use gt
and any
to filter the df:
In [287]:
df.ix[:,df.gt(2).any()]
Out[287]:
2
0 1.590124
1 2.500397
Here we use ix
to select all rows, the first :
and the next arg is a boolean mask of the columns that meet the condition:
In [288]:
df.gt(2)
Out[288]:
0 1 2 3
0 False False False False
1 False False True False
In [289]:
df.gt(2).any()
Out[289]:
0 False
1 False
2 True
3 False
dtype: bool
In your example what you did was select the cell value for the first row and second column, you then tried to use this to mask the columns but this just returned the first column hence why it didn't work:
In [291]:
df.iloc[(0,1)]
Out[291]:
1.3296030000000001
In [293]:
df.columns[df.iloc[(0,1)]>2]
Out[293]:
'0'
Solution 2
Use mask
created with df > 2
with any
and then select columns by ix
:
import pandas as pd
np.random.seed(18)
df = pd.DataFrame(np.random.randn(2, 4))
print(df)
0 1 2 3
0 0.079428 2.190202 -0.134892 0.160518
1 0.442698 0.623391 1.008903 0.394249
print ((df>2).any())
0 False
1 True
2 False
3 False
dtype: bool
print (df.ix[:, (df>2).any()])
1
0 2.190202
1 0.623391
EDIT by comment:
You can check your solution per partes:
It seems it works, but it always select second column (1
, python count from 0
) column if condition True
:
print (df.iloc[(0,1)])
2.19020235741
print (df.iloc[(0,1)] > 2)
True
print (df.columns[df.iloc[(0,1)]>2])
1
print (df[df.columns[df.iloc[(0,1)]>2]])
0 2.190202
1 0.623391
Name: 1, dtype: float64
And first column (0
) column if False
, because boolean True
and False
are casted to 1
and 0
:
np.random.seed(15)
df = pd.DataFrame(np.random.randn(2, 4))
print (df)
0 1 2 3
0 -0.312328 0.339285 -0.155909 -0.501790
1 0.235569 -1.763605 -1.095862 -1.087766
print (df.iloc[(0,1)])
0.339284706046
print (df.iloc[(0,1)] > 2)
False
print (df.columns[df.iloc[(0,1)]>2])
0
print (df[df.columns[df.iloc[(0,1)]>2]])
0 -0.312328
1 0.235569
Name: 0, dtype: float64
If change column names:
np.random.seed(15)
df = pd.DataFrame(np.random.randn(2, 4))
df.columns = ['a','b','c','d']
print (df)
a b c d
0 -0.312328 0.339285 -0.155909 -0.501790
1 0.235569 -1.763605 -1.095862 -1.087766
print (df.iloc[(0,1)] > 2)
False
print (df[df.columns[df.iloc[(0,1)]>2]])
0 -0.312328
1 0.235569
Name: a, dtype: float64
Solution 3
Quick update, as .ix
is now deprecated (since 0.20.0
). For lastest versions of pandas, .loc
will do the trick:
df.loc[:, df.gt(2).any()]
hans glick
Updated on June 23, 2022Comments
-
hans glick almost 2 years
I have a dataframe:
import pandas as pd df = pd.DataFrame(np.random.randn(2, 4)) print(df) 0 1 2 3 0 1.489198 1.329603 1.590124 1.123505 1 0.024017 0.581033 2.500397 0.156280
I want to select the columns which for there is at least one row with a value greater than
2
. I tried the following, but it did not work as expected.df[df.columns[df.iloc[(0,1)]>2]]
In this toy example my expected output would be:
2 1.590124 2.500397
-
hans glick almost 8 yearsOk thkx to both of you, anyone knows what my script is supposed to do?
-
hans glick almost 8 yearsOk thkx to both of you, anyone knows what my script is supposed to do?
-
EdChum almost 8 yearsthis
df.iloc[(0,1)]
selected the first row and second column cell value and tested if it was greater than 2 which it's not so it won't work -
jezrael almost 8 yearsI try explain it by samples, please check it.
-
hans glick almost 8 yearsthk you EdChum for your answer
-
hans glick almost 8 yearsEdChum, what if I want to select columns which for at least one row is LOWER than a specific value or is EQUAL to a specific value? Or how to reverse a dataframe of boolean entries?
-
EdChum almost 8 yearsuse
le
or<=
if you prefer