Conditionally calculated column for a Pandas DataFrame

14,598

You can do:

data['column_c'] = data['column_a'].where(data['column_a'] == 0, data['column_b'])

this is vectorised your attempts failed because the comparison with if doesn't understand how to treat an array of boolean values hence the error

Example:

In [81]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('abc'))
df

Out[81]:
          a         b         c
0 -1.065074 -1.294718  0.165750
1 -0.041167  0.962203  0.741852
2  0.714889  0.056171  1.197534
3  0.741988  0.836636 -0.660314
4  0.074554 -1.246847  0.183654

In [82]:
df['d'] = df['b'].where(df['b'] < 0, df['c'])
df

Out[82]:
          a         b         c         d
0 -1.065074 -1.294718  0.165750 -1.294718
1 -0.041167  0.962203  0.741852  0.741852
2  0.714889  0.056171  1.197534  1.197534
3  0.741988  0.836636 -0.660314 -0.660314
4  0.074554 -1.246847  0.183654 -1.246847
Share:
14,598

Related videos on Youtube

Edward J. Stembler
Author by

Edward J. Stembler

Technologist with 21+ years experience in designing and creating software solutions across a wide range of technologies. Passionate about Machine Learning, Data Science, Data Visualization, Elixir, Ruby, Python, Raspberry Pi, Arduino, and Robotics. Recent industry certifications: Machine Learning: Regression, Machine Learning Foundations: A Case Study Approach, Scalable Machine Learning, Amazon Web Services: Websites &amp; Web Apps, Exploratory Data Analysis, The Data Scientist’s Toolbox, Computing for Data Analysis, Machine Learning Past industry certifications: Microsoft Gold Certified Partner, Microsoft Certified Application Developer (MCAD), ASP.NET, C++, C#, Delphi, Java, and Visual Basic.

Updated on September 16, 2022

Comments

  • Edward J. Stembler
    Edward J. Stembler over 1 year

    I have a calculated column in a Pandas DataFrame which needs to be assigned base upon a condition. For example:

    if(data['column_a'] == 0):
        data['column_c'] = 0
    else:
        data['column_c'] = data['column_b']
    

    However, that returns an error:

    ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
    

    I have a feeling this has something to do with the fact that is must be done in a matrix style. Changing the code to a ternary statement doesn't work either:

    data['column_c'] = 0 if data['column_a'] == 0 else data['column_b']
    

    Anyone know the proper way to achieve this? Using apply with a lambda? I could iterate via a loop, but I'd rather keep this the preferred Pandas way.