How to remove non-alpha-numeric characters from strings within a dataframe column in Python?

python regex pandas dataframe

39,255

Solution 1

Use str.replace.

df
  strings
0  a#bc1!
1   a(b$c

df.strings.str.replace('[^a-zA-Z]', '')
0    abc
1    abc
Name: strings, dtype: object

To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need:

df.strings.str.replace('\W', '')
0    abc1
1     abc
Name: strings, dtype: object

Since you wrote alphanumeric, you need to add 0-9 in the regex. But maybe you only wanted alphabetic...

import pandas as pd

ded = pd.DataFrame({'strings': ['a#bc1!', 'a(b$c']})

ded.strings.str.replace('[^a-zA-Z0-9]', '')

But it's basically what COLDSPEED wrote

You can also use regex

import re

regex = re.compile('[^a-zA-Z]')

l = ["a#bc1!","a(b$c"]

print [regex.sub('', i) for i in l]

['abc', 'abc']

39,255

Delete Me

Updated on July 05, 2022

TheSaint321 almost 2 years
I have a DF column which has many strings in it. I need to remove all non-alpha numeric characters from that column: i.e:
```
df['strings'] = ["a#bc1!","a(b$c"]
```
Run code:
```
Print(df['strings']): ['abc','abc']
```
I've tried:
```
df['strings'].replace([',','.','/','"',':',';','!','@','#','$','%',"'","*","(",")","&",],"")
```
But this didn't work and I feel that there should be a more efficient way to do this using regex. Any help would be very appreciated.
TheSaint321 over 6 years

That is correct, I had to add in the 0-9 and also the spaces since I wanted that but coldspeed's answer was first and is the correct method.
citynorman over 2 years

[^0-9a-zA-Z.,-/ ] was what i was after personally