Encoding characters with ISO 8859-1 in Python

21,268

Solution 1

When you're starting with a Unicode string, you need to encode rather than decode.

>>> def char_code(c):
        return ord(c.encode('iso-8859-1'))

>>> print char_code(u'à')
224

For ISO-8859-1 in particular, you don't even need to encode it at all, since Unicode uses the ISO-8859-1 characters for its first 256 code points.

>>> print ord(u'à')
224

Edit: I see the problem now. You've given a source code encoding comment that indicates the source is in ISO-8859-1. However, I'll bet that your editor is actually working in UTF-8. The source code will be mis-interpreted, and the single-character string you think you created will actually be two characters. Try the following to see:

print len(u'à')

If your encoding is correct, it will return 1, but in your case it's probably 2.

Solution 2

You can get ord() for anything. As you might expect, ord(u'💩') works fine, provided you can represent the character properly in your source, and/or read it in a known encoding.

Your error message vaguely suggests that coding: iso-8859-1 is not actually true, and the file's encoding is actually something else (UTF-8 or UTF-16 would be my guess).

The canonical must-read on character encoding in Python is http://nedbatchelder.com/text/unipain.html

21,268

Author by

Drimades Boy

Updated on August 21, 2020

Comments

Drimades Boy over 3 years
With ord(ch) you can get a numerical code for character ch up to 127. Is there any function that returns a number from 0-255, so to cover also ISO 8859-1 characters?
Edit: Follows my last version of code and error I get
```
#!/usr/bin/python
# coding: iso-8859-1

import sys
reload(sys)
sys.setdefaultencoding('iso-8859-1')
print sys.getdefaultencoding()  # prints "iso-8859-1" 

def char_code(c):
    return ord(c.encode('iso-8859-1'))
print char_code(u'à')
```
I get an error: TypeError: ord() expected a character, but string of length 2 found
Drimades Boy over 8 years

Using print char_code(u'💩') I get: Non-ASCII character '\xf0' in file unicode.py on line 4, but no encoding declared;
Rafael Telles over 8 years

This character does not exists in ISO-8859-1, check the table.
Rafael Telles over 8 years

And you should specify an encoding header.
tripleee over 8 years

The error message suggests the coding: header is wrong. If you declare ISO-8859-1 encoding, but the actual encoding of the file is UTF-8 (or UTF16) that's the error message you would expect.
tripleee over 8 years

Maybe see the character-encoding tag wiki for some hints.
Drimades Boy over 8 years

I tried both ways you suggest,but I still get the same error.
Mark Ransom over 8 years

@DrimadesBoy then your example is incorrect, please update it with code that actually demonstrates the error.
Drimades Boy over 8 years

Solved. I'm using Geany in Ubuntu and changed the file encoding from 'utf-8' to 'iso-8859-1' from Document > Set Encoding > Western European > ISO-8859-1
Mark Ransom over 8 years

@DrimadesBoy if it's solved, please use the checkbox so everybody knows it. And an upvote would be nice.