Conditional If Statement: If value in row contains string ... set another column equal to string
92,738
Solution 1
The current solution behaves wrongly if your df contains NaN values. In that case I recommend using the following code which worked for me
temp=df.Activity.fillna("0")
df['Activity_2'] = pd.np.where(temp.str.contains("0"),"None",
pd.np.where(temp.str.contains("email"), "email",
pd.np.where(temp.str.contains("conference"), "conference",
pd.np.where(temp.str.contains("call"), "call", "task"))))
Solution 2
I assume you are using pandas
, then you can use numpy.where
, which is a vectorized version of if/else, with the condition constructed by str.contains
:
df['Activity_2'] = pd.np.where(df.Activity.str.contains("email"), "email",
pd.np.where(df.Activity.str.contains("conference"), "conference",
pd.np.where(df.Activity.str.contains("call"), "call", "task")))
df
# Activity Activity_2
#0 email personA email
#1 attend conference conference
#2 send email email
#3 call Sam call
#4 random text task
#5 random text task
#6 lwantto call call
Solution 3
This also works:
df.loc[df['Activity'].str.contains('email'), 'Activity_2'] = 'email'
df.loc[df['Activity'].str.contains('conference'), 'Activity_2'] = 'conference'
df.loc[df['Activity'].str.contains('call'), 'Activity_2'] = 'call'
Solution 4
you have an invalid syntax for checking strings.
try using
for i in df2['Activity']:
if 'email' in i :
df2['Activity_2'] = 'email'
Solution 5
- Your code had bugs- no colons on "elif" lines.
- You didn't mention you were using Pandas, but that's the assumption I'm going with.
- My answer handles defaults, uses proper Python conventions, is the most efficient, up-to-date, and easily adaptable for additional activities.
DEFAULT_ACTIVITY = 'task'
def assign_activity(todo_item):
"""Assign activity to raw text TODOs
"""
activities = ['email', 'conference', 'call']
for activity in activities:
if activity in todo_item:
return activity
else:
# Default value
return DEFAULT_ACTIVITY
df = pd.DataFrame({'Activity': ['email person A', 'attend conference', 'call Charly'],
'Colleague': ['Knor', 'Koen', 'Hedge']})
# You should really come up with a better name than 'Activity_2', like 'Labels' or something.
df["Activity_2] = df["Activity"].apply(assign_activity)
Related videos on Youtube
Author by
PineNuts0
Updated on November 24, 2021Comments
-
PineNuts0 over 2 years
EDIT MADE:
I have the 'Activity' column filled with strings and I want to derive the values in the 'Activity_2' column using an if statement.
So Activity_2 shows the desired result. Essentially I want to call out what type of activity is occurring.
I tried to do this using my code below but it won't run (please see screen shot below for error). Any help is greatly appreciated!
for i in df2['Activity']: if i contains 'email': df2['Activity_2'] = 'email' elif i contains 'conference' df2['Activity_2'] = 'conference' elif i contains 'call' df2['Activity_2'] = 'call' else: df2['Activity_2'] = 'task' Error: if i contains 'email': ^ SyntaxError: invalid syntax
-
mtkilic almost 7 yearsdid you try
if i == 'email': df2['Activity_2'] = 'email'
-
donkopotamus almost 7 years"won't run" is very unhelpful
-
PineNuts0 almost 7 yearsthanks for quick response. when I try your above code, there is no 'Activity_2' column in my dataframe
-
PineNuts0 almost 7 years@donk: I have posted my error in my message
-
Dave Liu over 2 yearsYou have a bunch of missing colons on the lines with "elif" statements
-
-
Admin over 5 years@Psidom can you help me with one of my question stackoverflow.com/questions/52819833/…
-
Hatt about 5 yearsI realize a couple years old here - but i have thousands of lines like this - how would you implement them efficiently?
-
Dave Liu over 2 yearsTried this, but all values were just default
-
Dave Liu over 2 yearsFinally, a solution that works and accounts for defaults / NAs
-
Hedge92 over 2 years@DaveLiu the example works perfectly in my Jupyter notebooks instance. Can you further explain your issue? Did copy this 1-on-1 or what did you try?
-
Dave Liu over 2 yearsI don't recall the issue, maybe a pandas/numpy versioning discrepancy
-
seakyourpeak about 2 yearsOne does not need to call np from pandas. If you do, you get the following message: " The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly. " Just using np.where() should do the job. It is a good solution suggested by @Psidom. Thank you Psidom!