'ascii' codec can't encode character at position * ord not in range(128)

12,728

Solution 1

You can do it through smart_str of Django module. Just try this:

from django.utils.encoding import smart_str, smart_unicode

text = u'\u2019'
print smart_str(text)

You can install Django by starting a command shell with administrator privileges and run this command:

pip install Django

Solution 2

Your data is unicode data. To write that to a file, use .encode():

text = text.encode('ascii', 'ignore')

but that would remove anything that isn't ASCII. Perhaps you wanted to encode to a more suitable encoding, like UTF-8, instead?

You may want to read up on Python and Unicode:

Share:
12,728
minocha
Author by

minocha

Python, search, scraping, nlp, eye-tracking and everything nice...

Updated on June 15, 2022

Comments

  • minocha
    minocha almost 2 years

    There are a few threads on stackoverflow, but i couldn't find a valid solution to the problem as a whole.

    I have collected huge sums of textual data from the urllib read function and stored the same in pickle files.

    Now I want to write this data to a file. While writing i'm getting errors similar to -

    'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in range(128)
    

    and a lot of data is being lost.

    I suppose the data off the urllib read is byte data

    I've tried

       1. text=text.decode('ascii','ignore')
       2. s=filter(lambda x: x in string.printable, s)
       3. text=u''+text
          text=text.decode().encode('utf-8')
    

    but still im ending up with similar errors. Can somebody point out a proper solution. And also would codecs strip work. I have no issues if the conflict bytes are not written to the file as a string hence the loss is accepted.

  • Kos
    Kos about 11 years
    And perhaps also Say Hello to Unicode (shameless plug :-))
  • minocha
    minocha about 11 years
    @martijn-pieters I noticed that while writing to the db i used the encode function. and i was unnecessarily doing the encode decode while dumping the contents to a text file again. There are however a few errors, maybe because of utf 8 and utf 16 conflicts. any resolution you might suggest ?
  • minocha
    minocha about 11 years
    @martjin-pieters - Any idea on what i should do if the error still persists ?
  • minocha
    minocha about 11 years
    The results are strikingly similar to the encode('ascii','ignore')
  • Martijn Pieters
    Martijn Pieters about 11 years
    @minocha: read the linked articles carefully, and the documentation for whatever database you are using. Perhaps the database handles Unicode natively, for example.
  • minocha
    minocha about 9 years
    Two years down the line I can say how important this answer is for python2.7, sorry for the inconsiderate remark above^