How to remove non-alpha-numeric characters from strings within a dataframe column in Python?

39,255

Solution 1

Use str.replace.

df
  strings
0  a#bc1!
1   a(b$c

df.strings.str.replace('[^a-zA-Z]', '')
0    abc
1    abc
Name: strings, dtype: object

To retain alphanumeric characters (not just alphabets as your expected output suggests), you'll need:

df.strings.str.replace('\W', '')
0    abc1
1     abc
Name: strings, dtype: object 

Solution 2

Since you wrote alphanumeric, you need to add 0-9 in the regex. But maybe you only wanted alphabetic...

import pandas as pd

ded = pd.DataFrame({'strings': ['a#bc1!', 'a(b$c']})

ded.strings.str.replace('[^a-zA-Z0-9]', '')

But it's basically what COLDSPEED wrote

Solution 3

You can also use regex

import re

regex = re.compile('[^a-zA-Z]')

l = ["a#bc1!","a(b$c"]

print [regex.sub('', i) for i in l]

['abc', 'abc']
Share:
39,255

Related videos on Youtube

TheSaint321
Author by

TheSaint321

Delete Me

Updated on July 05, 2022

Comments

  • TheSaint321
    TheSaint321 almost 2 years

    I have a DF column which has many strings in it. I need to remove all non-alpha numeric characters from that column: i.e:

    df['strings'] = ["a#bc1!","a(b$c"]
    

    Run code:

    Print(df['strings']): ['abc','abc']
    

    I've tried:

    df['strings'].replace([',','.','/','"',':',';','!','@','#','$','%',"'","*","(",")","&",],"")
    

    But this didn't work and I feel that there should be a more efficient way to do this using regex. Any help would be very appreciated.

  • TheSaint321
    TheSaint321 over 6 years
    That is correct, I had to add in the 0-9 and also the spaces since I wanted that but coldspeed's answer was first and is the correct method.
  • citynorman
    citynorman over 2 years
    [^0-9a-zA-Z.,-/ ] was what i was after personally