'ascii' codec can't encode character at position * ord not in range(128)
Solution 1
You can do it through smart_str
of Django
module. Just try this:
from django.utils.encoding import smart_str, smart_unicode
text = u'\u2019'
print smart_str(text)
You can install Django by starting a command shell with administrator privileges and run this command:
pip install Django
Solution 2
Your data is unicode data. To write that to a file, use .encode()
:
text = text.encode('ascii', 'ignore')
but that would remove anything that isn't ASCII. Perhaps you wanted to encode to a more suitable encoding, like UTF-8, instead?
You may want to read up on Python and Unicode:
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
Pragmatic Unicode by Ned Batchelder
minocha
Python, search, scraping, nlp, eye-tracking and everything nice...
Updated on June 15, 2022Comments
-
minocha almost 2 years
There are a few threads on stackoverflow, but i couldn't find a valid solution to the problem as a whole.
I have collected huge sums of textual data from the urllib read function and stored the same in pickle files.
Now I want to write this data to a file. While writing i'm getting errors similar to -
'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in range(128)
and a lot of data is being lost.
I suppose the data off the urllib read is byte data
I've tried
1. text=text.decode('ascii','ignore') 2. s=filter(lambda x: x in string.printable, s) 3. text=u''+text text=text.decode().encode('utf-8')
but still im ending up with similar errors. Can somebody point out a proper solution. And also would codecs strip work. I have no issues if the conflict bytes are not written to the file as a string hence the loss is accepted.
-
Kos about 11 yearsAnd perhaps also Say Hello to Unicode (shameless plug :-))
-
minocha about 11 years@martijn-pieters I noticed that while writing to the db i used the encode function. and i was unnecessarily doing the encode decode while dumping the contents to a text file again. There are however a few errors, maybe because of utf 8 and utf 16 conflicts. any resolution you might suggest ?
-
minocha about 11 years@martjin-pieters - Any idea on what i should do if the error still persists ?
-
minocha about 11 yearsThe results are strikingly similar to the encode('ascii','ignore')
-
Martijn Pieters about 11 years@minocha: read the linked articles carefully, and the documentation for whatever database you are using. Perhaps the database handles Unicode natively, for example.
-
minocha about 9 yearsTwo years down the line I can say how important this answer is for python2.7, sorry for the inconsiderate remark above^