Decode string with hex characters in python 2

25,077

Solution 1

Assuming Python 2.6,

>>> print('kitap ara\xfet\xfdrmas\xfd'.decode('iso-8859-9'))
kitap araştırması
>>> 'kitap ara\xfet\xfdrmas\xfd'.decode('iso-8859-9').encode('utf-8')
'kitap ara\xc5\x9ft\xc4\xb1rmas\xc4\xb1'

Solution 2

Try(Python 3.x):

import codecs
codecs.decode("707974686f6e2d666f72756d2e696f", "hex").decode('utf-8')

From here.

Solution 3

Try

hex_string.decode("cp1254").encode("utf-8")

(cp1254 or iso-8859-9 are the Turkish codepages, the former being the usual name on Windows platforms, but in Python, both work equally well)

Solution 4

First you need to decode it from the encoded bytes you have. That appears to be ISO-8859-9 (latin-5), or, if you are using Windows, probably code page 1254, which is based on latin-5.

>>> 'kitap ara\xfet\xfdrmas\xfd'.decode('cp1254')
u'kitap ara\u015ft\u0131rmas\u0131' # u'kitap araştırması'

If you are using Windows, then depending on where you are getting those bytes, it might be more appropriate to decode them as mbcs, which translates to ‘whichever code page the local system is using’. If the string is just sitting in a .py file, you would be better off just writing u'kitap araştırması' in the source and setting a -*- coding declaration to direct Python to decode it. See PEP 263.

As to how to encode unicode strings to UTF-8 for the database, well, if you want to you can do it manually:

>>> u'kitap ara\u015ft\u0131rmas\u0131'.encode('utf-8')
'kitap ara\xc5\x9ft\xc4\xb1rmas\xc4\xb1'

but a good data access layer is likely to do that automatically for you, if you've got the COLLATION of the tables the data is going into right.

Share:
25,077
user260223
Author by

user260223

Updated on October 13, 2020

Comments

  • user260223
    user260223 about 3 years

    I have a hex string and i want to convert it utf8 to insert mysql. (my database is utf8)

    hex_string = 'kitap ara\xfet\xfdrmas\xfd'
    ...
    result = 'kitap araştırması'
    

    How can I do that?