How to remove non-alpha-numeric characters from strings within a dataframe column in Python?
39,255
Solution 1
Use str.replace
.
df
strings
0 a#bc1!
1 a(b$c
df.strings.str.replace('[^a-zA-Z]', '')
0 abc
1 abc
Name: strings, dtype: object
To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need:
df.strings.str.replace('\W', '')
0 abc1
1 abc
Name: strings, dtype: object
Solution 2
Since you wrote alphanumeric, you need to add 0-9 in the regex. But maybe you only wanted alphabetic...
import pandas as pd
ded = pd.DataFrame({'strings': ['a#bc1!', 'a(b$c']})
ded.strings.str.replace('[^a-zA-Z0-9]', '')
But it's basically what COLDSPEED wrote
Solution 3
You can also use regex
import re
regex = re.compile('[^a-zA-Z]')
l = ["a#bc1!","a(b$c"]
print [regex.sub('', i) for i in l]
['abc', 'abc']
Related videos on Youtube
Comments
-
TheSaint321 almost 2 years
I have a DF column which has many strings in it. I need to remove all non-alpha numeric characters from that column: i.e:
df['strings'] = ["a#bc1!","a(b$c"]
Run code:
Print(df['strings']): ['abc','abc']
I've tried:
df['strings'].replace([',','.','/','"',':',';','!','@','#','$','%',"'","*","(",")","&",],"")
But this didn't work and I feel that there should be a more efficient way to do this using regex. Any help would be very appreciated.
-
TheSaint321 over 6 yearsThat is correct, I had to add in the 0-9 and also the spaces since I wanted that but coldspeed's answer was first and is the correct method.
-
citynorman over 2 years
[^0-9a-zA-Z.,-/ ]
was what i was after personally