Masking multiple columns on a pandas dataframe in python
You can use the pandas all method and boolean logic. As EdChum commented I am a bit unclear still on your exact example but a similar example is
In [1]: df = DataFrame([[1,2],[-3,5]], index=[0,1], columns=['a','b'])
In [2]: df
Out [2]:
a b
0 1 2
1 -3 5
In [3]: msk = (df>1) & (df<5)
In [4]: msk
Out [4]:
a b
0 False True
1 False False
In [5]: msk.all(axis=1)
Out [5]:
0 False
1 False
dtype: bool
If you wanted to index the original dataframe by the mask you could do
In [6]: df[msk]
Out [6]:
a b
0 NaN 2
1 NaN NaN
Or as you originally indicated rows where all the rows are true
In [7]: idx = msk.all(axis=1)
In [8]: df[idx]
Out [8]:
Empty DataFrame
Columns: [a,b]
Index: []
Or if one row was true
In [9]: idx[0] = True
In [10]: df[idx]
Out [10]:
a b
0 1 2
Edit: Just to address the original question after clarification from the comments, where we want different filtering criteria for different columns
In [10]: msk1 = df[['a']] < 0
In [11]: msk2 = df[['b']] > 3
In [12]: msk = concat((msk1, msk2), axis=1)
In [12]: slct = msk.all(axis=1)
In [13]: df.ix[slct]
Out [13]:
a b
1 -3 5
Jonas
Machine learning engineer / Data Scientist Favorite ML applications : Reinforcement learning Also good in BI Most valuable /asset/: alignment of the business value and the models loss function
Updated on July 09, 2022Comments
-
Jonas almost 2 years
i am looking to apply multiply masks on each column of a pandas dataset (respectively to it's properties) in python. In the next step i want to find (a) row(s) in the dataframe that fits all conditions. therefore i have:
df Out[27]: DE FL GA IA ID 0 0 1 0 0 0 1 1 0 1 0 1 2 0 0 1 0 0 3 0 1 0 0 0 4 0 0 0 0 0 mask_list = [] for i in range(0,5): if i % 2==0: mask_list.append(df[[i]]>0) else: mask_list.append(df[[i]]<1) concat_frame = pa.DataFrame() for mask in mask_list: concat_frame =pa.concat((concat_frame, mask), axis=1) concat_frame Out[48]: DE FL GA IA ID 0 False False False True False 1 True True True True True 2 False True True True False 3 False False False True False 4 False True False True False [5 rows x 5 columns]
update expected outcome:outcome Out[60]: DE FL GA IA ID 1 1 0 1 0 1
Here comes the question :
how can i apply the concat_mask on df , so that i select rows, in which all Boolean criteria are matched (are True)? -
Jonas almost 10 yearsthank You that helped. I wanted to do: index = s_mask.all(axis=1) select = df.ix[index] out: Out[21]: a b 1 -3 5 . Can You insert that into your answer? Then i will marked it as answered :)
-
Jonas almost 10 yearsThanks for Your answer. In the proper code i actually iterate throw all columns and apply various of diffenrent conditions to mask each column. This is all what the code ment to express. And i actually want to extract data of the original df dataframe....
-
mgilbert almost 10 yearsNot exactly sure I undersand, if I do the above I would get an empty dataframe since index is False for both rows, does your s_mask refer to msk in my example? so I'm unsure how your getting Out[21] as in your comment?
-
ely almost 10 yearsThe approach that I give does extract the data from the original DataFrame as needed.
-
Jonas almost 10 yearsoh yes, i jumped ahead there. I applied: msk1 = df[[0]]<0 msk2 = df[[1]]>3 s_mask = pandas.concat((msk1, msk2), axis=1)
-
mgilbert almost 10 yearsHey @Antihead let me know if that last edit addressed your concern
-
kva1966 about 5 yearsGeneral caution for combining masks: Not immediately obvious to me initially is that because of precedence rules, combining masks using the operators
&
and|
, requires the individual relational expressions to be parenthesised because of precedence rules, otherwise you either get errors, or worse misleading, crazy bitwise-op'ed results because&
and|
have higher precedence than the relational ops! (ref: docs.python.org/3/reference/…)