multiple if else conditions in pandas dataframe and derive multiple columns

65,686

Solution 1

You need chained comparison using upper and lower bound

def flag_df(df):
    
    if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
        return 'Red'
    elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
        return 'Yellow'
    elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
        return 'Orange'
    elif (df['height'] > 8):
        return np.nan
    
df2['Flag'] = df2.apply(flag_df, axis = 1)

    student score   height  trigger1    trigger2    trigger3    Flag
0   A       100     7       84          99          114         Yellow
1   B       96      4       95          110         125         Red
2   C       80      9       15          30          45          NaN
3   D       105     5       78          93          108         Yellow
4   E       156     3       16          31          46          Orange

Note: You can do this with a very nested np.where but I prefer to apply a function for multiple if-else

Edit: answering @Cecilia's questions

  1. what is the returned object is not strings but some calculations, for example, for the first condition, we want to return df['height']*2

Not sure what you tried but you can return a derived value instead of string using

def flag_df(df):

    if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
        return df['height']*2
    elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
        return df['height']*3
    elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
        return df['height']*4
    elif (df['height'] > 8):
        return np.nan
  1. what if there are 'NaN' values in osome columns and I want to use df['xxx'] is None as a condition, the code seems like not working

Again not sure what code did you try but using pandas isnull would do the trick

def flag_df(df):

    if pd.isnull(df['height']):
        return df['height']
    elif (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
        return df['height']*2
    elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
        return df['height']*3
    elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
        return df['height']*4
    elif (df['height'] > 8):
        return np.nan

Solution 2

Here is a way to use numpy.select() for doing this in a non cluttered way:

conditions = [
    (df2['trigger1'] <= df2['score']) & (df2['score'] < df2['trigger2']) & (df2['height'] < 8),
    (df2['trigger2'] <= df2['score']) & (df2['score'] < df2['trigger3']) & (df2['height'] < 8),
    (df2['trigger3'] <= df2['score']) & (df2['height'] < 8),
    (df2['height'] > 8)
]

choices = ['Red','Yellow','Orange', np.nan]
df['Flag1'] = np.select(conditions, choices, default=np.nan)

Solution 3

you can use also apply with a custom function on axis 1 like this :

def color_selector(x):
    if (x['trigger1'] <= x['score'] < x['trigger2']) and (x['height'] < 8):
        return 'Red'
    elif (x['trigger2'] <= x['score'] < x['trigger3']) and (x['height'] < 8):
        return 'Yellow'
    elif (x['trigger3'] <= x['score']) and (x['height'] < 8):
        return 'Orange'
    elif (x['height'] > 8):
        return ''
df2 = df2.assign(flag=df2.apply(color_selector, axis=1))

you will get something like this : enter image description here

Share:
65,686
Kumar AK
Author by

Kumar AK

Updated on November 29, 2021

Comments

  • Kumar AK
    Kumar AK over 2 years

    I have a dataframe like below.

    import pandas as pd
    import numpy as np
    raw_data = {'student':['A','B','C','D','E'],
            'score': [100, 96, 80, 105,156], 
        'height': [7, 4,9,5,3],
        'trigger1' : [84,95,15,78,16],
        'trigger2' : [99,110,30,93,31],
        'trigger3' : [114,125,45,108,46]}
    
    df2 = pd.DataFrame(raw_data, columns = ['student','score', 'height','trigger1','trigger2','trigger3'])
    
    print(df2)
    

    I need to derive Flag column based on multiple conditions.

    i need to compare score and height columns with trigger 1 -3 columns.

    Flag Column:

    1. if Score greater than equal trigger 1 and height less than 8 then Red --

    2. if Score greater than equal trigger 2 and height less than 8 then Yellow --

    3. if Score greater than equal trigger 3 and height less than 8 then Orange --

    4. if height greater than 8 then leave it as blank

    How to write if else conditions in pandas dataframe and derive columns?

    Expected Output

      student  score  height  trigger1  trigger2  trigger3    Flag
    0       A    100       7        84        99       114  Yellow
    1       B     96       4        95       110       125     Red
    2       C     80       9        15        30        45     NaN
    3       D    105       5        78        93       108  Yellow
    4       E    156       3        16        31        46  Orange
    

    For other column Text1 in my original question I have tried this one but the integer columns not converting the string when concatenation using astype(str) any other approach?

    def text_df(df):
    
        if (df['trigger1'] <= df['score'] < df['trigger2']) and (df['height'] < 8):
            return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger1'].astype(str) + " and less than height 5"
        elif (df['trigger2'] <= df['score'] < df['trigger3']) and (df['height'] < 8):
            return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger2'].astype(str) + " and less than height 5"
        elif (df['trigger3'] <= df['score']) and (df['height'] < 8):
            return df['student'] + " score " + df['score'].astype(str) + " greater than " + df['trigger3'].astype(str) + " and less than height 5"
        elif (df['height'] > 8):
            return np.nan
    
  • Kumar AK
    Kumar AK about 6 years
    Thanks for the quick reply.. can you please help me to derive 2 more columns based on this condition. my original question was to derive 3 columns but edited because of confusing. can you able to see first post before editing?
  • Vaishali
    Vaishali about 6 years
    Change your return statement to this: return '{} score {} greater than {} and less than height 5'.format(df['student'], df['score'], df['trigger1']). I have done it for first condition, use similar format for the rest of the conditions
  • wawawa
    wawawa over 2 years
    Hi what is the returned object is not strings but some calculations, for example, for the first condition, we want to return df['height']*2?
  • wawawa
    wawawa over 2 years
    and what if there'are 'NaN' values in osome columns and I want to use df['xxx'] is None as a condition, the code seems like not working
  • Vaishali
    Vaishali over 2 years
    @Cecilia, please see the edits