multiple if else conditions in pandas dataframe and derive multiple columns
Solution 1
You need chained comparison using upper and lower bound
def flag_df(df):
if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return 'Red'
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return 'Yellow'
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return 'Orange'
elif (df['height'] > 8):
return np.nan
df2['Flag'] = df2.apply(flag_df, axis = 1)
student score height trigger1 trigger2 trigger3 Flag
0 A 100 7 84 99 114 Yellow
1 B 96 4 95 110 125 Red
2 C 80 9 15 30 45 NaN
3 D 105 5 78 93 108 Yellow
4 E 156 3 16 31 46 Orange
Note: You can do this with a very nested np.where but I prefer to apply a function for multiple if-else
Edit: answering @Cecilia's questions
- what is the returned object is not strings but some calculations, for example, for the first condition, we want to return df['height']*2
Not sure what you tried but you can return a derived value instead of string using
def flag_df(df):
if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['height']*2
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['height']*3
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['height']*4
elif (df['height'] > 8):
return np.nan
- what if there are 'NaN' values in osome columns and I want to use df['xxx'] is None as a condition, the code seems like not working
Again not sure what code did you try but using pandas isnull
would do the trick
def flag_df(df):
if pd.isnull(df['height']):
return df['height']
elif (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
return df['height']*2
elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
return df['height']*3
elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
return df['height']*4
elif (df['height'] > 8):
return np.nan
Solution 2
Here is a way to use numpy.select()
for doing this in a non cluttered way:
conditions = [
(df2['trigger1'] <= df2['score']) & (df2['score'] < df2['trigger2']) & (df2['height'] < 8),
(df2['trigger2'] <= df2['score']) & (df2['score'] < df2['trigger3']) & (df2['height'] < 8),
(df2['trigger3'] <= df2['score']) & (df2['height'] < 8),
(df2['height'] > 8)
]
choices = ['Red','Yellow','Orange', np.nan]
df['Flag1'] = np.select(conditions, choices, default=np.nan)
Solution 3
you can use also apply with a custom function on axis 1 like this :
def color_selector(x):
if (x['trigger1'] <= x['score'] < x['trigger2']) and (x['height'] < 8):
return 'Red'
elif (x['trigger2'] <= x['score'] < x['trigger3']) and (x['height'] < 8):
return 'Yellow'
elif (x['trigger3'] <= x['score']) and (x['height'] < 8):
return 'Orange'
elif (x['height'] > 8):
return ''
df2 = df2.assign(flag=df2.apply(color_selector, axis=1))
you will get something like this :
Kumar AK
Updated on November 29, 2021Comments
-
Kumar AK over 2 years
I have a dataframe like below.
import pandas as pd import numpy as np raw_data = {'student':['A','B','C','D','E'], 'score': [100, 96, 80, 105,156], 'height': [7, 4,9,5,3], 'trigger1' : [84,95,15,78,16], 'trigger2' : [99,110,30,93,31], 'trigger3' : [114,125,45,108,46]} df2 = pd.DataFrame(raw_data, columns = ['student','score', 'height','trigger1','trigger2','trigger3']) print(df2)
I need to derive Flag column based on multiple conditions.
i need to compare score and height columns with trigger 1 -3 columns.
Flag Column:
if Score greater than equal trigger 1 and height less than 8 then Red --
if Score greater than equal trigger 2 and height less than 8 then Yellow --
if Score greater than equal trigger 3 and height less than 8 then Orange --
if height greater than 8 then leave it as blank
How to write if else conditions in pandas dataframe and derive columns?
Expected Output
student score height trigger1 trigger2 trigger3 Flag 0 A 100 7 84 99 114 Yellow 1 B 96 4 95 110 125 Red 2 C 80 9 15 30 45 NaN 3 D 105 5 78 93 108 Yellow 4 E 156 3 16 31 46 Orange
For other column Text1 in my original question I have tried this one but the integer columns not converting the string when concatenation using astype(str) any other approach?
def text_df(df): if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8): return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger1'].astype(str) + " and less than height 5" elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8): return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger2'].astype(str) + " and less than height 5" elif (df['trigger3'] <= df['score']) and (df['height'] < 8): return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger3'].astype(str) + " and less than height 5" elif (df['height'] > 8): return np.nan
-
Kumar AK about 6 yearsThanks for the quick reply.. can you please help me to derive 2 more columns based on this condition. my original question was to derive 3 columns but edited because of confusing. can you able to see first post before editing?
-
Vaishali about 6 yearsChange your return statement to this: return '{} score {} greater than {} and less than height 5'.format(df['student'], df['score'], df['trigger1']). I have done it for first condition, use similar format for the rest of the conditions
-
wawawa over 2 yearsHi what is the returned object is not strings but some calculations, for example, for the first condition, we want to return
df['height']*2
? -
wawawa over 2 yearsand what if there'are 'NaN' values in osome columns and I want to use
df['xxx'] is None
as a condition, the code seems like not working -
Vaishali over 2 years@Cecilia, please see the edits