python: pandas np.where vs. df.loc with multiple conditions

13,416

I think your boolean are not strings, so need remove ':

df = pd.DataFrame({'Column_A': ['AAA','AAA','ABC','CDE'],
                  'checked': ['0','0','1','0'],
                  'duplicate': [True, True, False, False]})

df['flag'] = np.where((df['checked'] == 'Y') &(df['duplicate'] == True), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    0
1      AAA       0       True    0
2      ABC       1      False    0
3      CDE       0      False    0

Or if compare with boolean column, == True can be omited:

df['flag'] = np.where((df['checked'] == 'Y') &(df['duplicate']), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    0
1      AAA       0       True    0
2      ABC       1      False    0
3      CDE       0      False    0

Also if need check checked need ' because strings:

df['flag'] = np.where((df['checked'] == '0') &(df['duplicate'] == True), 'Y', '0')
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    Y
1      AAA       0       True    Y
2      ABC       1      False    0
3      CDE       0      False    0

EDIT:

Solution with loc:

df['flag'] = '0'
mask = (df['checked'] == '0') &(df['duplicate'])
df.loc[mask, 'flag'] = 'Y'
print (df)
  Column_A checked  duplicate flag
0      AAA       0       True    Y
1      AAA       0       True    Y
2      ABC       1      False    0
3      CDE       0      False    0
Share:
13,416

Related videos on Youtube

jeangelj
Author by

jeangelj

BY DAY: Data Analyst BY NIGHT: Assassin's Creed & Tomb Raider player "The Matrix has you ..." - Trinity

Updated on October 13, 2022

Comments

  • jeangelj
    jeangelj over 1 year

    Np.where has been giving me a lot of errors, so I am looking for a solution with df.loc instead.

    This is the np.where error I have been getting:

    C:\Users\xxx\AppData\Local\Continuum\Anaconda2\lib\site-packages\ipykernel\__main__.py:1: SettingWithCopyWarning: 
    A value is trying to be set on a copy of a slice from a DataFrame.
    Try using .loc[row_indexer,col_indexer] = value instead
    
    See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
      if __name__ == '__main__':
    

    I am working with the following dataframe df:

    df = pd.DataFrame({'Column_A': ['AAA','AAA','ABC','CDE'],'checked': ['0','0','1','0'],'duplicate': ['True','True','False','False']})
    
        Column_A    checked   duplicate
    0   AAA             0      True
    1   AAA             0      True
    2   ABC             1      False
    3   CDE             0      False
    

    I want to create an additional flag, if checked is 0 and duplicate is True.

    I tried this and it didn't work:

    df['flag'] = (np.where((df['checked'] == 'Y') &(df['duplicate'] == 'True'), 'Y', '0'))
    
    TypeError: invalid type comparison
    

    I tried it with df.loc:

    df['flag'] = (df.loc[df['checked'] == 'Y']& df.loc[df['duplicate'] == 'True'], 'Y','0')
    
    TypeError: invalid type comparison
    

    and I get the same error!

  • jeangelj
    jeangelj almost 7 years
    oh - I see! how woul dI do an OR statement instead of &?
  • jeangelj
    jeangelj almost 7 years
    thank you very much - is there a way to do it with df.loc?
  • jezrael
    jezrael almost 7 years
    I think yes, it depends what you need. & is bitwise and and | is bitwise or.