pandas: multiple conditions while indexing data frame - unexpected behavior

476,019

Solution 1

As you can see, the AND operator drops every row in which at least one value equals -1. On the other hand, the OR operator requires both values to be equal to -1 to drop them.

That's right. Remember that you're writing the condition in terms of what you want to keep, not in terms of what you want to drop. For df1:

df1 = df[(df.a != -1) & (df.b != -1)]

You're saying "keep the rows in which df.a isn't -1 and df.b isn't -1", which is the same as dropping every row in which at least one value is -1.

For df2:

df2 = df[(df.a != -1) | (df.b != -1)]

You're saying "keep the rows in which either df.a or df.b is not -1", which is the same as dropping rows where both values are -1.

PS: chained access like df['a'][1] = -1 can get you into trouble. It's better to get into the habit of using .loc and .iloc.

Solution 2

You can also use query(), i.e.:

df_filtered = df.query('a == 4 & b != 2')

Solution 3

A little mathematical logic theory here:

"NOT a AND NOT b" is the same as "NOT (a OR b)", so:

"a NOT -1 AND b NOT -1" is equivalent of "NOT (a is -1 OR b is -1)", which is opposite (Complement) of "(a is -1 OR b is -1)".

So if you want exact opposite result, df1 and df2 should be as below:

df1 = df[(df.a != -1) & (df.b != -1)]
df2 = df[(df.a == -1) | (df.b == -1)]
Share:
476,019

Related videos on Youtube

Wojciech Walczak
Author by

Wojciech Walczak

Updated on April 28, 2022

Comments

  • Wojciech Walczak
    Wojciech Walczak about 2 years

    I am filtering rows in a dataframe by values in two columns.

    For some reason the OR operator behaves like I would expect AND operator to behave and vice versa.

    My test code:

    import pandas as pd
    
    df = pd.DataFrame({'a': range(5), 'b': range(5) })
    
    # let's insert some -1 values
    df['a'][1] = -1
    df['b'][1] = -1
    df['a'][3] = -1
    df['b'][4] = -1
    
    df1 = df[(df.a != -1) & (df.b != -1)]
    df2 = df[(df.a != -1) | (df.b != -1)]
    
    print pd.concat([df, df1, df2], axis=1,
                    keys = [ 'original df', 'using AND (&)', 'using OR (|)',])
    

    And the result:

          original df      using AND (&)      using OR (|)    
                 a  b              a   b             a   b
    0            0  0              0   0             0   0
    1           -1 -1            NaN NaN           NaN NaN
    2            2  2              2   2             2   2
    3           -1  3            NaN NaN            -1   3
    4            4 -1            NaN NaN             4  -1
    
    [5 rows x 6 columns]
    

    As you can see, the AND operator drops every row in which at least one value equals -1. On the other hand, the OR operator requires both values to be equal to -1 to drop them. I would expect exactly the opposite result. Could anyone explain this behavior, please?

    I am using pandas 0.13.1.

  • Phillip Cloud
    Phillip Cloud about 10 years
    DataFrame.query() works nicely here too. df.query('a != -1 or b != -1').
  • stoves
    stoves about 7 years
    Happen to know why pandas wants & and | over and and or?
  • DSM
    DSM about 7 years
    @stoves: in normal Python code, and and or have basic Python semantics that can't be modified. & and |, on the other hand, have corresponding special methods which control their behaviour. (In query strings, of course, we're free to apply any parsing we like.)
  • 3pitt
    3pitt about 6 years
    interestingly, it seems like df[True & False] fails but df[(True) & (False)] succeeds (not tested on this example)
  • tommy.carstensen
    tommy.carstensen over 5 years
    Would it be possible to break this kind of syntax across multiple lines? What would be most PEP8?
  • Aus_10
    Aus_10 about 4 years
    I have a situation where I think this syntax makes more sense e.g.: df.query(''(a==4 & b!=2) | c== 3")
  • SeaBean
    SeaBean almost 3 years
    As & has a higher operator precedence than ==, is it special for .query() not to interpret the boolean as something like a == (4 & b) ?