String In python with my unicode?

15,339

Solution 1

There is nothing wrong with your string! You just have confused encode() and decode(). The string is meaningful symbols. To turn it into bytes that could be stored in a file or transmitted over the Internet, use encode() with an encoding like UTF-8. Each encoding is a scheme for converting meaningful symbols to flat bytes of output.

When the time comes to do the opposite — to take some raw bytes from a file or a socket and turn them into symbols like letters and numbers — you will decode the bytes using the decode() method of bytestrings in Python 3.

>>> str_version = 'នយោបាយ'
>>> str_version.encode('utf-8')
b'\xe1\x9e\x93\xe1\x9e\x99\xe1\x9f\x84\xe1\x9e\x94\xe1\x9e\xb6\xe1\x9e\x99'

See that big long line of bytes? Those are the bytes that UTF-8 uses to represent your string, if you need to transmit the string over a network, or store them in a document. There are many other encodings in use, but it seems to be the most popular. Each encoding can turn meaningful symbols like ន and យោ into bytes — the little 8-bit numbers with which computers communicate.

>>> rawbytes = str_version.encode('utf-8')
>>> rawbytes
b'\xe1\x9e\x93\xe1\x9e\x99\xe1\x9f\x84\xe1\x9e\x94\xe1\x9e\xb6\xe1\x9e\x99'
>>> rawbytes.decode('utf-8')
'នយោបាយ'

Solution 2

You're reading the 2.x docs. str.decode() (and bytes.encode()) was dropped in 3.x. And str is already a Unicode string; there's no need to decode it.

Share:
15,339
kn3l
Author by

kn3l

rust developer

Updated on July 29, 2022

Comments

  • kn3l
    kn3l almost 2 years
    Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32
    Type "copyright", "credits" or "license()" for more information.
    >>> str_version = 'នយោបាយ'
    >>> type(str_version)
    <class 'str'>
    >>> print (str_version)
    នយោបាយ
    >>> unicode_version = 'នយោបាយ'.decode('utf-8')
    Traceback (most recent call last):
      File "<pyshell#3>", line 1, in <module>
        unicode_version = 'នយោបាយ'.decode('utf-8')
    AttributeError: 'str' object has no attribute 'decode'
    >>> 
    

    What the problem with my unicode string?

  • kn3l
    kn3l about 13 years
    still not clean .Could you more clear explain ? thanks Brandon Craig Rhodes
  • Brandon Rhodes
    Brandon Rhodes about 13 years
    I have added another paragraph, and some code samples — do those make it any clearer?
  • kn3l
    kn3l about 13 years
    Now it's clear .I understand right now from your example ,thank you so much @Brandon Craig Rhodes