UnicodeDecodeError on python3
17,063
It looks like it is invalid UTF-8 and you should try to read with latin-1
encoding. Try
file = open('exampleFileName', 'r', encoding='latin-1')
Author by
EliteKaffee
Updated on June 26, 2022Comments
-
EliteKaffee over 1 year
Im currently trying to use some simple regex on a very big .txt file (couple of million lines of text). The most simple code that causes the problem:
file = open("exampleFileName", "r") for line in file: pass
The error message:
Traceback (most recent call last): File "example.py", line 34, in <module> example() File "example.py", line 16, in example for line in file: File "/usr/lib/python3.4/codecs.py", line 319, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 7332: invalid continuation byte
How can i fix this? is utf-8 the wrong encoding? And if it is, how do i know which one is right?
Thanks and best regards!
-
Jeff about 7 yearsPossibly related to stackoverflow.com/questions/5552555/…
-
Admin about 7 yearsPost the output of
file -bi [your_filename]
. You'll get an encoding. After that provide theencoding
argument toopen()
. -
Reihan_amn over 5 yearswhat does file -bi command does?
-
-
chivorotkiv almost 6 yearsDo you know how to do the same when reading from command line? I use
input()
function, is there a way to configure its encoding or is there some other configurable function? -
Reihan_amn over 5 yearsHow did you figure out to use latin-1 encoding?
-
mic4ael over 5 years0xed is
í
characters which you can find in the latin-1 encoding -
Reihan_amn over 5 yearsSo confused! after unicode encoding came into the scene to cover all ~2 m code point, why latin-1 encoding is still here? shouldn't latin-1 encoding be a subset of UTF encoding? shouldn't all defined codes in latin-1 be now a part of UTF? if so, why UTF cannot support it? (sorry I am kinda new in this field)