Logical Or/bitwise OR in pandas Data Frame

10,686

As far as I have come to understand this issue (coming from a C++ background and currently learning Python for data sciences) I stumbled upon several posts suggesting that bitwise operators (&, |) can be overloaded in classes, just like C++ does.

So basically, while you may use such bitwise operators on numbers they will compare the bits and give you the result. So for instance, if you have the following:

1 | 2 # will result in 3

What Python will actually do is compare the bits of these numbers:

00000001 | 00000010

The result will be:

00000011 (because 0 | 0 is False, ergo 0; and 0 | 1 is True, ergo 1)

As an integer: 3

It compares each bit of the numbers and spit out the result of these eight consecutive operations. This is the normal behaviour of these operators.

Enter Pandas. As you can overload these operators, Pandas has made use of this. So what bitwise operators do when coming to pandas dataframes, is the following:

(dataframe1['column'] == "expression") & (dataframe1['column'] != "another expression)

In this case, first pandas will create a series of trues or falses depending on the result of the == and != operations (be careful: you have to put braces around the outer expressions because python will always try to resolve first bitwise operators and THEN the other comparision operators!!). So it will compare each value in the column to the expression and either output a true or a false.

Then you'd have two same-length series of trues and falses. What it THEN does is take these two serieses and basically compare them with either "and" (&) or "or" (|), and finally spit out one single series either fulfilling or not fulfilling all three comparision operations.

To go even further, what I think is happening under the hood is that the &-operator actually calls a function of pandas, gives them both previously evaluated operations (so the two serieses to the left and right of the operator) and pandas then compares two distinct values at a time, returning a True or False depending on the internal mechanism to determine this.

This is basically the same principle they've used for all other operators as well (>, <, >=, <=, ==, !=).

Why do the struggle and use a different &-expression when you got the nice and neat "and"? Well, that seems to be because "and" is just hard coded and cannot be altered manually.

Hope that helps!

Share:
10,686
BernardL
Author by

BernardL

ive reached a thousand rep - 22nd Oct 2018

Updated on June 04, 2022

Comments

  • BernardL
    BernardL almost 2 years

    I am trying to use a Boolean mask to get a match from 2 different dataframes. U

    Using the logical OR operator:

    x = df[(df['A'].isin(df2['B']))
          or df['A'].isin(df2['C'])]
    
    Output:
    
    ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
    

    However using the bitwise OR operator, the results are returned successfully.

    x = df[(df['A'].isin(df2['B']))
          | df['A'].isin(df2['C'])]
    
    Output: x
    

    Is there a difference in both and would bitwise OR be the best option here? Why doesn't the logical OR work?