pandas multiple conditions based on multiple columns using np.where
46,815
Solution 1
Selection criteria uses Boolean indexing:
df['color'] = np.where(((df.A < borderE) & ((df.B - df.C) < ex)), 'r', 'b')
>>> df
A B C color
0 0 11 20 r
1 1 12 19 r
2 2 13 18 r
3 3 14 17 b
4 4 15 16 b
5 5 16 15 b
6 6 17 14 b
7 7 18 13 b
8 8 19 12 b
9 9 20 11 b
Solution 2
wrap the IF in a function and apply it:
def color(row):
borderE = 3.
ex = 0.
if (row.A > borderE) and( row.B - row.C < ex) :
return "somestring"
else:
return "otherstring"
df.loc[:, 'color'] = df.apply(color, axis = 1)
Yields:
A B C color
0 0 11 20 otherstring
1 1 12 19 otherstring
2 2 13 18 otherstring
3 3 14 17 otherstring
4 4 15 16 somestring
5 5 16 15 otherstring
6 6 17 14 otherstring
7 7 18 13 otherstring
8 8 19 12 otherstring
9 9 20 11 otherstring
Author by
Robert
Updated on July 09, 2022Comments
-
Robert almost 2 years
I am trying to color points of a pandas dataframe depending on TWO conditions. Example:
If value of col1 > a (float) AND value of col2- value of col3 < b (float), then value of col 4 = string, else: other string.
I have tried so many different ways now and everything I found online was only depending on one condition.
My example code always raises the Error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Here's the code. Tried several variations without success.
df = pd.DataFrame() df['A'] = range(10) df['B'] = range(11,21,1) df['C'] = range(20,10,-1) borderE = 3. ex = 0. #print df df['color'] = np.where(all([df.A < borderE, df.B - df.C < ex]), 'r', 'b')
Btw: I understand, what it says but not how to handle it... Thanks in advance!