Encoding characters with ISO 8859-1 in Python
Solution 1
When you're starting with a Unicode string, you need to encode
rather than decode
.
>>> def char_code(c):
return ord(c.encode('iso-8859-1'))
>>> print char_code(u'à')
224
For ISO-8859-1 in particular, you don't even need to encode it at all, since Unicode uses the ISO-8859-1 characters for its first 256 code points.
>>> print ord(u'à')
224
Edit: I see the problem now. You've given a source code encoding comment that indicates the source is in ISO-8859-1. However, I'll bet that your editor is actually working in UTF-8. The source code will be mis-interpreted, and the single-character string you think you created will actually be two characters. Try the following to see:
print len(u'à')
If your encoding is correct, it will return 1
, but in your case it's probably 2
.
Solution 2
You can get ord()
for anything. As you might expect, ord(u'💩')
works fine, provided you can represent the character properly in your source, and/or read it in a known encoding.
Your error message vaguely suggests that coding: iso-8859-1
is not actually true, and the file's encoding is actually something else (UTF-8 or UTF-16 would be my guess).
The canonical must-read on character encoding in Python is http://nedbatchelder.com/text/unipain.html
Drimades Boy
Updated on August 21, 2020Comments
-
Drimades Boy over 3 years
With
ord(ch)
you can get a numerical code for characterch
up to127
. Is there any function that returns a number from 0-255, so to cover alsoISO 8859-1
characters?
Edit: Follows my last version of code and error I get#!/usr/bin/python # coding: iso-8859-1 import sys reload(sys) sys.setdefaultencoding('iso-8859-1') print sys.getdefaultencoding() # prints "iso-8859-1" def char_code(c): return ord(c.encode('iso-8859-1')) print char_code(u'à')
I get an error: TypeError: ord() expected a character, but string of length 2 found
-
Drimades Boy over 8 yearsUsing print char_code(u'💩') I get: Non-ASCII character '\xf0' in file unicode.py on line 4, but no encoding declared;
-
Rafael Telles over 8 yearsThis character does not exists in ISO-8859-1, check the table.
-
Rafael Telles over 8 yearsAnd you should specify an encoding header.
-
tripleee over 8 yearsThe error message suggests the
coding:
header is wrong. If you declare ISO-8859-1 encoding, but the actual encoding of the file is UTF-8 (or UTF16) that's the error message you would expect. -
tripleee over 8 yearsMaybe see the
character-encoding
tag wiki for some hints. -
Drimades Boy over 8 yearsI tried both ways you suggest,but I still get the same error.
-
Mark Ransom over 8 years@DrimadesBoy then your example is incorrect, please update it with code that actually demonstrates the error.
-
Drimades Boy over 8 yearsSolved. I'm using Geany in Ubuntu and changed the file encoding from 'utf-8' to 'iso-8859-1' from Document > Set Encoding > Western European > ISO-8859-1
-
Mark Ransom over 8 years@DrimadesBoy if it's solved, please use the checkbox so everybody knows it. And an upvote would be nice.