using extended Ascii codes with Python

14,545

Use unichr instead of chr. The function chr produces a string containing a single byte, whereas unichr produces a string containing a single unicode character. Finally, do lookups using unicode characters too: d[u'é'] because d['é'] will look up the utf-8 encoding of é.

You have 3 things in your code: a latin-1 encoded str, a utf-8 encoded str, and a unicode string. Getting it clear in your head which you've got at any point in time requires a lot of knowledge about how Python works and a decent understanding of Unicode and encodings.

No answer about encodings and Unicode is complete without a link to Joel Spolsky's article on the matter: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Share:
14,545
lilawood
Author by

lilawood

Updated on June 04, 2022

Comments

  • lilawood
    lilawood almost 2 years

    I've created a dictionnary with Python but I've got problems with extended Ascii codes.

    The loop that creats the dictionnary is : (ascii number 128 to 164 : é,à etc)

    #extented ascii codes
    i = 128
    while i <= 165 :
        dictionnary[chr(i)] = 'extended ascii'
        i = i + 1
    

    But when I try to use dictionnary :

        >>> dictionnary['è']
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    KeyError: '\xc3\xa8'
    

    I've got # -- coding: utf-8 -- in the header of the python script. I've tried encode,decode etc but the result is always bad.

    To understand what happens, I've tried :

    >>> ord('é')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: ord() expected a character, but string of length 2 found
    

    and

        >>> ord(u'é')
    233
    

    I'am confused with ord(u'é') because 'é' is number 130 in extended ascii table and not 233.

    I understand that extended ascii codes contains "two characters" but I don't understand how to solve the problem with dictionnary ?

    Thanks in advance ! :-)

    • glglgl
      glglgl over 10 years
      There is no such thing as "extended ASCII". there are a lot of encodings (cpXXXX in Windows, latinXX, iso-8859-XX and others in the real world) where 247 can mean different things.
    • M T Head
      M T Head over 6 years
      Extended Ascii is the characters in the range 128 and above. Ascii = 0-127, Extended Ascii = 128-255. This dates back to the 60ies and 70ies. Now it is not important except for its residual effects like when you can't print out characters above 128 but you can for less than 128. Dates back to dumb terminals.
  • Drew Delano
    Drew Delano over 12 years
    Did you mean to say, "No answer about encodings and Unicode is complete without a link..."?
  • lilawood
    lilawood over 12 years
    thanks for your reply. I've installed python3 and it works perfectly :-)