Python - Replace non-ascii character in string (»)
Solution 1
In order to replace the content of string using str.replace()
method; you need to firstly decode the string, then replace the text and encode it back to the original text:
>>> a = "hi »"
>>> a.decode('utf-8').replace("»".decode('utf-8'), "").encode('utf-8')
'hi '
You may also use the following regex to remove all the non-ascii characters from the string:
>>> import re
>>> re.sub(r'[^\x00-\x7f]',r'', 'hi »')
'hi '
Solution 2
@Moinuddin Quadri's answer fits your use-case better, but in general, an easy way to remove non-ASCII characters from a given string is by doing the following:
# the characters '¡' and '¢' are non-ASCII
string = "hello, my name is ¢arl... ¡Hola!"
all_ascii = ''.join(char for char in string if ord(char) < 128)
This results in:
>>> print(all_ascii)
"hello, my name is arl... Hola!"
You could also do this:
''.join(filter(lambda c: ord(c) < 128, string))
But that's about 30% slower than the char for char ...
approach.
Hyperion
Updated on January 21, 2020Comments
-
Hyperion over 4 years
I need to replace in a string the character "»" with a whitespace, but I still get an error. This is the code I use:
# -*- coding: utf-8 -*- from bs4 import BeautifulSoup # other code soup = BeautifulSoup(data, 'lxml') mystring = soup.find('a').text.replace(' »','')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in position 13: ordinal not in range(128)
But If I test it with this other script:
# -*- coding: utf-8 -*- a = "hi »" b = a.replace('»','')
It works. Why this?
-
Ma0 over 7 yearsgoogling the error you get as it is, i get this back: stackoverflow.com/questions/5141559/… There should be something there you can use
-
-
user almost 5 yearsThe regex version is the fastest. Instead of using
[^\x00-\x7f]
I used[^\x20-\x7E]
to also remove the ASCII control characters from 0 up to 31 and 127.