pandas multiple conditions based on multiple columns using np.where

46,815

Solution 1

Selection criteria uses Boolean indexing:

df['color'] = np.where(((df.A < borderE) & ((df.B - df.C) < ex)), 'r', 'b')

>>> df
   A   B   C color
0  0  11  20     r
1  1  12  19     r
2  2  13  18     r
3  3  14  17     b
4  4  15  16     b
5  5  16  15     b
6  6  17  14     b
7  7  18  13     b
8  8  19  12     b
9  9  20  11     b

Solution 2

wrap the IF in a function and apply it:

def color(row):
    borderE = 3.
    ex = 0.
    if (row.A > borderE) and( row.B - row.C < ex) :
        return "somestring"
    else:
        return "otherstring"

df.loc[:, 'color'] = df.apply(color, axis = 1)

Yields:

  A   B   C        color
0  0  11  20  otherstring
1  1  12  19  otherstring
2  2  13  18  otherstring
3  3  14  17  otherstring
4  4  15  16   somestring
5  5  16  15  otherstring
6  6  17  14  otherstring
7  7  18  13  otherstring
8  8  19  12  otherstring
9  9  20  11  otherstring
Share:
46,815
Robert
Author by

Robert

Updated on July 09, 2022

Comments

  • Robert
    Robert almost 2 years

    I am trying to color points of a pandas dataframe depending on TWO conditions. Example:

    If value of col1 > a (float) AND value of col2- value of col3 < b (float), then value of col 4 = string, else: other string.

    I have tried so many different ways now and everything I found online was only depending on one condition.

    My example code always raises the Error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

    Here's the code. Tried several variations without success.

    df = pd.DataFrame()
    
    df['A'] = range(10)
    df['B'] = range(11,21,1)
    df['C'] = range(20,10,-1)
    
    borderE = 3.
    ex = 0.
    
    #print df
    
    df['color'] = np.where(all([df.A < borderE, df.B - df.C < ex]), 'r', 'b')
    

    Btw: I understand, what it says but not how to handle it... Thanks in advance!