Using logical operators in building a Pandas DataFrame

14,429

Solution 1

The & operator binds more tightly than == (or any comparison operator). See the documentation. A simpler example is:

>>> 2 == 2 & 3 == 3
False

This is because it is grouped as 2 == (2 & 3) == 3, and then comparison chaining is invoked. This is what is happening in your case. You need to put parentheses around each comparison.

 data = all_data[((all_data['Source'] == 2) &
                np.isfinite(all_data[self.design_metric])) |
                ((all_data['Source'] != 2) &
                np.isfinite(all_data[self.actual_metric]))]

Note the extra parentheses around the == and != comparisons.

Solution 2

Along with priority, there is a difference between AND and & operators, first one being boolean and the latter being binary bitwise. Also, you must be aware of boolead expressions.

See examples in the following snippet:

logical expressions

>>> 1 and 2
1

>>> '1' and '2'
'1'

>>> 0 == 1 and 2 == 0 or 0
0

bitwise operators

>>> 1 & 2
0

>>> '1' & '2'
Traceback (most recent call last):
  ...
TypeError: unsupported operand type(s) for &: 'str' and 'str'

>>> 0 == 1 & 2 == 0 | 0
True
Share:
14,429
Jamie Bull
Author by

Jamie Bull

Manager at Baringa Partners LLP, working on climate change transition risk modelling. Also and just for the love of it, I develop websites and bots using Django, Flask, Heroku, AWS and more.

Updated on June 05, 2022

Comments

  • Jamie Bull
    Jamie Bull almost 2 years

    I have two snippets of pandas code which I think should be equivalent, but the second one doesn't do what I expect.

    # snippet 1
        data = all_data[[((np.isfinite(all_data[self.design_metric][i]) 
                        and all_data['Source'][i] == 2)) 
                        or ((np.isfinite(all_data[self.actual_metric][i]) 
                        and all_data['Source'][i] != 2))
                        for i in range(len(all_data))]]
    
    
    # snippet 2
        data = all_data[(all_data['Source'] == 2 &
                        np.isfinite(all_data[self.design_metric])) |
                        (all_data['Source'] != 2 &
                        np.isfinite(all_data[self.actual_metric]))]
    

    Each section (e.g. all_data['Source'] == 2 ) does what I expect on its own but it seems that I'm doing something wrong with the logical operators as the final result is coming out with a different result to the list comprehension version.