How to write a lambda function that is conditional on two variables (columns) in python
Solution 1
Use where
:
df['dummyVar '] = df['x'].where((df['x'] > 100) & (df['y'] < 50), df['y'])
This will be much faster than performing an apply operation as it is vectorised.
Solution 2
Like this:
f = lambda x, y: x if x>100 and y<50 else y
Lambda(s) in Python are equivalent to a normal function definition.
def f(x, y):
return x if x>100 and y<50 else y
NB: The body of a Lambda must be a valid expression. This means you cannot use things like: return
for example; a Lambda will return the last expression evaluated.
For some good reading see:
Solution 3
There's now an pretty easy way to do this. Just use apply on the dataset:
df['dummy'] = df.apply(lambda row: row['x'] if row['x'] > 100 and row['y'] < 50 else row['y'])
seeiespi
A hooman doing just tryin to help the pack and earn a biscuit
Updated on June 18, 2022Comments
-
seeiespi almost 2 years
I have a data set, df, with two variables, x and y. I want to write a function that does the following:
x if x>100 and y<50 else y
I am used to doing data analysis in STATA so I'm relatively new to pandas for data analysis. If it helps, in stata it would look like:
replace x = cond(x>100 & y<50, x, y)
In other words, the function is conditional on two columns in df and will return a value from one variable or the other in each row depending on whether the condition is met.
So far I have been creating new variables through new functions like:
df.dummyVar = df.x.apply(lambda x: 1 if x>100 else 0)
Using StackOverflow and the documentation I have only been able to find how to apply a function dependent on a single variable to more than one column (using the axis option). Please help.
-
seeiespi almost 10 yearsThis is exactly what I needed. And this is great because I can already see how I can expand it to conditionals on 3 or more variables. Thank you!
-
seeiespi almost 10 yearsI had actually written a function like this but was unable to implement so that it would run through each row without a loop. The answer provided by EdChum does exactly that. If you know of a way to accomplish that using this defined function I'm sure I could make use of that in the future. Thank you for you input :)
-
EdChum almost 10 years@seeiespi you're welcome, you can accept this as answer, there will be a tick mark underneath the voting buttons. Using apply and iterating should always be the last choice, if possible find a method that operates on the whole dataframe
-
James Mills almost 10 years@seeiespi You originally asked for "How to create a lambda function that takes two arguments?" -- This is how :) -- EdChum provided you with an answer that is more aligned with what you're intentions are/were with your dataset(s) and pandas.
-
Manish Singla over 3 yearsI need to write the axis = 1 to make it work