Creating New Column In Pandas Dataframe Using Regex
This will do the trick:
search = []
for values in df['col']:
search.append(re.search(r'\d+', values).group())
df['col1'] = search
the output looks like this:
col col1
0 foo 12 bar 8 12
1 bar 3 foo 3
2 bar 32bar 98 32
Admin
Updated on June 15, 2022Comments
-
Admin almost 2 years
I have a column in a pandas df of type
object
that I want to parse to get the first number in the string, and create a new column containing that number as anint
.For example:
Existing df
col 'foo 12 bar 8' 'bar 3 foo' 'bar 32bar 98'
Desired df
col col1 'foo 12 bar 8' 12 'bar 3 foo' 3 'bar 32bar 98' 32
I have code that works on any individual cell in the column series
int(re.search(r'\d+', df.iloc[0]['col']).group())
The above code works fine and returns 12 as it should. But when I try to create a new column using the whole series:
df['col1'] = int(re.search(r'\d+', df['col']).group())
I get the following Error:
TypeError: expected string or bytes-like object
I tried wrapping a
str()
arounddf['col']
which got rid of the error but yielded all 0's in col1I've also tried converting
col
to alist
of strings and iterating through thelist
, which only yields the same error. Does anyone know what I'm doing wrong? Help would be much appreciated. -
RolfBly almost 2 yearsThe list has the same name as the regex method (
search
), which I find a bit confusing.mylist = []
,mylist.append...
would imho make it a clearer example.