Python: Using .format() on a Unicode-escaped string
Solution 1
Just make the second string also a unicode string
>>> s = u'\u2265'
>>> print s
≥
>>> print "{0}".format(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)
>>> print u"{0}".format(s)
≥
>>>
Solution 2
unicode
s need unicode
format strings.
>>> print u'{0}'.format(s)
≥
Solution 3
A bit more information on why that happens.
>>> s = u'\u2265'
>>> print s
works because print
automatically uses the system encoding for your environment, which was likely set to UTF-8. (You can check by doing import sys; print sys.stdout.encoding
)
>>> print "{0}".format(s)
fails because format
tries to match the encoding of the type that it is called on (I couldn't find documentation on this, but this is the behavior I've noticed). Since string literals are byte strings encoded as ASCII in python 2, format
tries to encode s
as ASCII, which then results in that exception. Observe:
>>> s = u'\u2265'
>>> s.encode('ascii')
Traceback (most recent call last):
File "<input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)
So that is basically why these approaches work:
>>> s = u'\u2265'
>>> print u'{}'.format(s)
≥
>>> print '{}'.format(s.encode('utf-8'))
≥
The source character set is defined by the encoding declaration; it is ASCII if no encoding declaration is given in the source file (https://docs.python.org/2/reference/lexical_analysis.html#string-literals)
Related videos on Youtube
Kit
Updated on March 19, 2020Comments
-
Kit about 4 years
I am using Python 2.6.5. My code requires the use of the "more than or equal to" sign. Here it goes:
>>> s = u'\u2265' >>> print s >>> ≥ >>> print "{0}".format(s) Traceback (most recent call last): File "<input>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)`
Why do I get this error? Is there a right way to do this? I need to use the
.format()
function. -
Philipp almost 14 years@Kit: If you want all literals to be Unicode (like in Python 3), put
from __future__ import unicode_literals
at the beginning of your source files. -
Hylidan about 9 yearsYeah, this will get you if you're used to % formatting as this "%s" % u"\u2265" works, but "{}".format(u"\u2265") will throw an exception.
-
Iosu S. almost 7 yearswhat a simple thing.. what a terrible headache i got until i found this bit of enlightenment..
-
lps about 5 yearsOh and I found this to be of great help in understanding unicode in python, and text representation in computer systems in general: nedbatchelder.com/text/unipain.html