Conditional If Statement: If value in row contains string ... set another column equal to string

92,738

Solution 1

The current solution behaves wrongly if your df contains NaN values. In that case I recommend using the following code which worked for me

temp=df.Activity.fillna("0")
df['Activity_2'] = pd.np.where(temp.str.contains("0"),"None",
                   pd.np.where(temp.str.contains("email"), "email",
                   pd.np.where(temp.str.contains("conference"), "conference",
                   pd.np.where(temp.str.contains("call"), "call", "task"))))

Solution 2

I assume you are using pandas, then you can use numpy.where, which is a vectorized version of if/else, with the condition constructed by str.contains:

df['Activity_2'] = pd.np.where(df.Activity.str.contains("email"), "email",
                   pd.np.where(df.Activity.str.contains("conference"), "conference",
                   pd.np.where(df.Activity.str.contains("call"), "call", "task")))

df

#   Activity            Activity_2
#0  email personA       email
#1  attend conference   conference
#2  send email          email
#3  call Sam            call
#4  random text         task
#5  random text         task
#6  lwantto call        call

Solution 3

This also works:

df.loc[df['Activity'].str.contains('email'), 'Activity_2'] = 'email'
df.loc[df['Activity'].str.contains('conference'), 'Activity_2'] = 'conference'
df.loc[df['Activity'].str.contains('call'), 'Activity_2'] = 'call'

Solution 4

you have an invalid syntax for checking strings.

try using

 for i in df2['Activity']:
        if 'email' in i :
            df2['Activity_2'] = 'email'

Solution 5

  1. Your code had bugs- no colons on "elif" lines.
  2. You didn't mention you were using Pandas, but that's the assumption I'm going with.
  3. My answer handles defaults, uses proper Python conventions, is the most efficient, up-to-date, and easily adaptable for additional activities.

DEFAULT_ACTIVITY = 'task'


def assign_activity(todo_item):
    """Assign activity to raw text TODOs
    """
    activities = ['email', 'conference', 'call']

    for activity in activities:
        if activity in todo_item:
            return activity
        else:
            # Default value
            return DEFAULT_ACTIVITY

df = pd.DataFrame({'Activity': ['email person A', 'attend conference', 'call Charly'],
                   'Colleague': ['Knor', 'Koen', 'Hedge']})

# You should really come up with a better name than 'Activity_2', like 'Labels' or something.
df["Activity_2] = df["Activity"].apply(assign_activity)
Share:
92,738

Related videos on Youtube

PineNuts0
Author by

PineNuts0

Updated on November 24, 2021

Comments

  • PineNuts0
    PineNuts0 over 2 years

    EDIT MADE:

    I have the 'Activity' column filled with strings and I want to derive the values in the 'Activity_2' column using an if statement.

    So Activity_2 shows the desired result. Essentially I want to call out what type of activity is occurring.

    I tried to do this using my code below but it won't run (please see screen shot below for error). Any help is greatly appreciated!

    enter image description here

        for i in df2['Activity']:
            if i contains 'email':
                df2['Activity_2'] = 'email'
            elif i contains 'conference'
                df2['Activity_2'] = 'conference'
            elif i contains 'call'
                df2['Activity_2'] = 'call'
            else:
                df2['Activity_2'] = 'task'
    
    
    Error: if i contains 'email':
                    ^
    SyntaxError: invalid syntax
    
    • mtkilic
      mtkilic almost 7 years
      did you try if i == 'email': df2['Activity_2'] = 'email'
    • donkopotamus
      donkopotamus almost 7 years
      "won't run" is very unhelpful
    • PineNuts0
      PineNuts0 almost 7 years
      thanks for quick response. when I try your above code, there is no 'Activity_2' column in my dataframe
    • PineNuts0
      PineNuts0 almost 7 years
      @donk: I have posted my error in my message
    • Dave Liu
      Dave Liu over 2 years
      You have a bunch of missing colons on the lines with "elif" statements
  • Admin
    Admin over 5 years
    @Psidom can you help me with one of my question stackoverflow.com/questions/52819833/…
  • Hatt
    Hatt about 5 years
    I realize a couple years old here - but i have thousands of lines like this - how would you implement them efficiently?
  • Dave Liu
    Dave Liu over 2 years
    Tried this, but all values were just default
  • Dave Liu
    Dave Liu over 2 years
    Finally, a solution that works and accounts for defaults / NAs
  • Hedge92
    Hedge92 over 2 years
    @DaveLiu the example works perfectly in my Jupyter notebooks instance. Can you further explain your issue? Did copy this 1-on-1 or what did you try?
  • Dave Liu
    Dave Liu over 2 years
    I don't recall the issue, maybe a pandas/numpy versioning discrepancy
  • seakyourpeak
    seakyourpeak about 2 years
    One does not need to call np from pandas. If you do, you get the following message: " The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly. " Just using np.where() should do the job. It is a good solution suggested by @Psidom. Thank you Psidom!