What is unicode_literals used for?

25,442

Your terminal or console is failing to let Python know it supports UTF-8.

Without the from __future__ import unicode_literals line, you are building a byte string that holds UTF-8 encoded bytes. With the string you are building a unicode string.

print has to treat these two values differently; a byte string is written to sys.stdout unchanged. A unicode string is encoded to bytes first, and Python consults sys.stdout.encoding for that. If your system doesn't correctly tell Python what codec it supports, the default is to use ASCII.

Your system failed to tell Python what codec to use; sys.stdout.encoding is set to ASCII, and encoding the unicode value to print failed.

You can verify this by manually encoding to UTF-8 when printing:

# encoding: utf-8
from __future__ import unicode_literals
name = 'helló wörld from example'
print name.encode('utf8')

and you can reproduce the issue by creating unicode literals without the from __future__ import statement too:

# encoding: utf-8
name = u'helló wörld from example'
print name

where u'..' is a unicode literal too.

Without details on what your environment is, it is hard to say what the solution is; this depends very much on the OS and console or terminal used.

Share:
25,442

Related videos on Youtube

ssj
Author by

ssj

to code or not to code,it's a problem.

Updated on July 09, 2022

Comments

  • ssj
    ssj almost 2 years

    I get a weird problem with __future__.unicode_literals in Python. Without importing unicode_literals I get the correct output:

    # encoding: utf-8
    # from __future__ import unicode_literals
    name = 'helló wörld from example'
    print name
    

    But when I add the unicode_literals import:

    # encoding: utf-8
    from __future__ import unicode_literals
    name = 'helló wörld from example'
    print name
    

    I got this error:

    UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 4: ordinal not in range(128)
    

    Does unicode_literals encode every string as an utf-8? What should I do to override this error?

    • Martijn Pieters
      Martijn Pieters almost 10 years
      The import only has an effect on Python 2; it makes Python 2 behave as Python 3 does when it comes to string literals. It makes your code cross-Python-version compatible.
    • roippi
      roippi almost 10 years
      The issue is with your terminal, it is unable to display non-ascii characters.
    • Martijn Pieters
      Martijn Pieters almost 10 years
      Since you are using print as a statement you must be using Python 2 instead; I've removed the python-3.x tag that threw me off.
    • Martijn Pieters
      Martijn Pieters almost 10 years
      @roippi: no, it is perfectly capable of showing already-encoded UTF-8 bytes. It is not communicating to Python that it is using UTF-8.
    • roippi
      roippi almost 10 years
      Yes, I could have phrased that better.