'ascii' codec can't encode character at position * ord not in range(128)

python unicode decode encode

12,728

Solution 1

You can do it through smart_str of Django module. Just try this:

from django.utils.encoding import smart_str, smart_unicode

text = u'\u2019'
print smart_str(text)

You can install Django by starting a command shell with administrator privileges and run this command:

pip install Django

Solution 2

Your data is unicode data. To write that to a file, use .encode():

text = text.encode('ascii', 'ignore')

but that would remove anything that isn't ASCII. Perhaps you wanted to encode to a more suitable encoding, like UTF-8, instead?

You may want to read up on Python and Unicode:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder

12,728

Author by

minocha

Python, search, scraping, nlp, eye-tracking and everything nice...

Updated on June 15, 2022

Comments

minocha almost 2 years
There are a few threads on stackoverflow, but i couldn't find a valid solution to the problem as a whole.

I have collected huge sums of textual data from the urllib read function and stored the same in pickle files.

Now I want to write this data to a file. While writing i'm getting errors similar to -
```
'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in range(128)
```
and a lot of data is being lost.

I suppose the data off the urllib read is byte data

I've tried
```
   1. text=text.decode('ascii','ignore')
   2. s=filter(lambda x: x in string.printable, s)
   3. text=u''+text
      text=text.decode().encode('utf-8')
```
but still im ending up with similar errors. Can somebody point out a proper solution. And also would codecs strip work. I have no issues if the conflict bytes are not written to the file as a string hence the loss is accepted.
Kos about 11 years

And perhaps also Say Hello to Unicode (shameless plug :-))
minocha about 11 years

@martijn-pieters I noticed that while writing to the db i used the encode function. and i was unnecessarily doing the encode decode while dumping the contents to a text file again. There are however a few errors, maybe because of utf 8 and utf 16 conflicts. any resolution you might suggest ?
minocha about 11 years

@martjin-pieters - Any idea on what i should do if the error still persists ?
minocha about 11 years

The results are strikingly similar to the encode('ascii','ignore')
Martijn Pieters about 11 years

@minocha: read the linked articles carefully, and the documentation for whatever database you are using. Perhaps the database handles Unicode natively, for example.
minocha about 9 years

Two years down the line I can say how important this answer is for python2.7, sorry for the inconsiderate remark above^