Creating New Column In Pandas Dataframe Using Regex

11,220

This will do the trick:

search = []    
for values in df['col']:
    search.append(re.search(r'\d+', values).group())

df['col1'] = search

the output looks like this:

            col    col1
0  foo 12 bar 8      12
1     bar 3 foo       3
2  bar 32bar 98      32
Share:
11,220
Admin
Author by

Admin

Updated on June 15, 2022

Comments

  • Admin
    Admin almost 2 years

    I have a column in a pandas df of type object that I want to parse to get the first number in the string, and create a new column containing that number as an int.

    For example:

    Existing df

        col
        'foo 12 bar 8'
        'bar 3 foo'
        'bar 32bar 98'
    

    Desired df

        col               col1
        'foo 12 bar 8'    12
        'bar 3 foo'       3
        'bar 32bar 98'    32
    

    I have code that works on any individual cell in the column series

    int(re.search(r'\d+', df.iloc[0]['col']).group())

    The above code works fine and returns 12 as it should. But when I try to create a new column using the whole series:

    df['col1'] = int(re.search(r'\d+', df['col']).group())

    I get the following Error:

    TypeError: expected string or bytes-like object

    I tried wrapping a str() around df['col'] which got rid of the error but yielded all 0's in col1

    I've also tried converting col to a list of strings and iterating through the list, which only yields the same error. Does anyone know what I'm doing wrong? Help would be much appreciated.

  • RolfBly
    RolfBly almost 2 years
    The list has the same name as the regex method (search), which I find a bit confusing. mylist = [], mylist.append... would imho make it a clearer example.