Creating New Column In Pandas Dataframe Using Regex

python regex pandas

11,220

This will do the trick:

search = []    
for values in df['col']:
    search.append(re.search(r'\d+', values).group())

df['col1'] = search

the output looks like this:

            col    col1
0  foo 12 bar 8      12
1     bar 3 foo       3
2  bar 32bar 98      32

11,220

Author by

Updated on June 15, 2022

Admin almost 2 years
I have a column in a pandas df of type object that I want to parse to get the first number in the string, and create a new column containing that number as an int.

For example:

Existing df
```
    col
    'foo 12 bar 8'
    'bar 3 foo'
    'bar 32bar 98'
```
Desired df
```
    col               col1
    'foo 12 bar 8'    12
    'bar 3 foo'       3
    'bar 32bar 98'    32
```
I have code that works on any individual cell in the column series

int(re.search(r'\d+', df.iloc[0]['col']).group())

The above code works fine and returns 12 as it should. But when I try to create a new column using the whole series:

df['col1'] = int(re.search(r'\d+', df['col']).group())

I get the following Error:

TypeError: expected string or bytes-like object

I tried wrapping a str() around df['col'] which got rid of the error but yielded all 0's in col1

I've also tried converting col to a list of strings and iterating through the list, which only yields the same error. Does anyone know what I'm doing wrong? Help would be much appreciated.
RolfBly almost 2 years

The list has the same name as the regex method (search), which I find a bit confusing. mylist = [], mylist.append... would imho make it a clearer example.