Unicode latin1 string encode / decode
13,179
I guess the string has been incorrectly converted to lowercase at some point, changing \xc3
to \xe3
. The lowercase conversion has assumed latin1 encoding when it was actually utf-8.
>>> print 'gr\xc3\xa9gory'.decode('utf8')
grégory
Author by
user3203201
Updated on June 04, 2022Comments
-
user3203201 almost 2 years
While fetching data from an unknown/old/non-consistent Mysql database to a Postgres utf-8 db using Python (Django) ORM I have sometimes faulty encoded data as a result.
Target: grégory
> a u'gr\xe3\xa9gory' > print a grã©gory
I tried several decode/encode tricks without success:
> print a.encode('utf-8').decode('latin1') grã©gory > print a.encode('utf-8').decode('latin1') grã©gory > print a.decode('latin-1') UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-3: ordinal not in range(128)
Even with some unicode_escape