Pandas Python, select columns based on rows conditions

python pandas dataframe conditional-statements

18,105

Solution 1

Use gt and any to filter the df:

In [287]:
df.ix[:,df.gt(2).any()]

Out[287]:
          2
0  1.590124
1  2.500397

Here we use ix to select all rows, the first : and the next arg is a boolean mask of the columns that meet the condition:

In [288]:
df.gt(2)

Out[288]:
       0      1      2      3
0  False  False  False  False
1  False  False   True  False

In [289]:
df.gt(2).any()

Out[289]:
0    False
1    False
2     True
3    False
dtype: bool

In your example what you did was select the cell value for the first row and second column, you then tried to use this to mask the columns but this just returned the first column hence why it didn't work:

In [291]:
df.iloc[(0,1)]

Out[291]:
1.3296030000000001

In [293]:
df.columns[df.iloc[(0,1)]>2]

Out[293]:
'0'

Solution 2

Use mask created with df > 2 with any and then select columns by ix:

import pandas as pd
np.random.seed(18)
df = pd.DataFrame(np.random.randn(2, 4))
print(df)
          0         1         2         3
0  0.079428  2.190202 -0.134892  0.160518
1  0.442698  0.623391  1.008903  0.394249

print ((df>2).any())
0    False
1     True
2    False
3    False
dtype: bool

print (df.ix[:, (df>2).any()])
          1
0  2.190202
1  0.623391

EDIT by comment:

You can check your solution per partes:

It seems it works, but it always select second column (1, python count from 0) column if condition True:

print (df.iloc[(0,1)])
2.19020235741

print (df.iloc[(0,1)] > 2)
True

print (df.columns[df.iloc[(0,1)]>2])
1

print (df[df.columns[df.iloc[(0,1)]>2]])
0    2.190202
1    0.623391
Name: 1, dtype: float64

And first column (0) column if False, because boolean True and False are casted to 1 and 0:

np.random.seed(15)
df = pd.DataFrame(np.random.randn(2, 4))
print (df)
          0         1         2         3
0 -0.312328  0.339285 -0.155909 -0.501790
1  0.235569 -1.763605 -1.095862 -1.087766

print (df.iloc[(0,1)])
0.339284706046

print (df.iloc[(0,1)] > 2)
False

print (df.columns[df.iloc[(0,1)]>2])
0

print (df[df.columns[df.iloc[(0,1)]>2]])
0   -0.312328
1    0.235569
Name: 0, dtype: float64

If change column names:

np.random.seed(15)
df = pd.DataFrame(np.random.randn(2, 4))
df.columns = ['a','b','c','d']
print (df)
          a         b         c         d
0 -0.312328  0.339285 -0.155909 -0.501790
1  0.235569 -1.763605 -1.095862 -1.087766

print (df.iloc[(0,1)] > 2)
False

print (df[df.columns[df.iloc[(0,1)]>2]])
0   -0.312328
1    0.235569
Name: a, dtype: float64

Solution 3

Quick update, as .ix is now deprecated (since 0.20.0). For lastest versions of pandas, .loc will do the trick:

df.loc[:, df.gt(2).any()]

18,105

Author by

hans glick

Updated on June 23, 2022

Comments

hans glick almost 2 years

I have a dataframe:

import pandas as pd
df = pd.DataFrame(np.random.randn(2, 4))
print(df)
          0         1         2         3
0  1.489198  1.329603  1.590124  1.123505
1  0.024017  0.581033  2.500397  0.156280

I want to select the columns which for there is at least one row with a value greater than 2. I tried the following, but it did not work as expected.

df[df.columns[df.iloc[(0,1)]>2]]

In this toy example my expected output would be:

       2
1.590124  
2.500397

hans glick almost 8 years

Ok thkx to both of you, anyone knows what my script is supposed to do?
hans glick almost 8 years

Ok thkx to both of you, anyone knows what my script is supposed to do?
EdChum almost 8 years

this df.iloc[(0,1)] selected the first row and second column cell value and tested if it was greater than 2 which it's not so it won't work
jezrael almost 8 years

I try explain it by samples, please check it.
hans glick almost 8 years

thk you EdChum for your answer
hans glick almost 8 years

EdChum, what if I want to select columns which for at least one row is LOWER than a specific value or is EQUAL to a specific value? Or how to reverse a dataframe of boolean entries?
EdChum almost 8 years

use le or <= if you prefer