How to write a check in python to see if file is valid UTF-8?
15,619
Solution 1
You could do something like
import codecs
try:
f = codecs.open(filename, encoding='utf-8', errors='strict')
for line in f:
pass
print "Valid utf-8"
except UnicodeDecodeError:
print "invalid utf-8"
Solution 2
def try_utf8(data):
"Returns a Unicode object on success, or None on failure"
try:
return data.decode('utf-8')
except UnicodeDecodeError:
return None
data = f.read()
udata = try_utf8(data)
if udata is None:
# Not UTF-8. Do something else
else:
# Handle unicode data
Comments
-
Jox over 1 year
As stated in title, I would like to check in given file object (opened as binary stream) is valid UTF-8 file.
Anyone?
Thanks
-
Jox over 13 yearsObviously I didn't do my homework good enough when there is more that one solution simple as this :( Thanks!
-
colidyre over 4 yearsCould be simpler by using only one line:
codecs.open("path/to/file", encoding="utf-8", errors="strict").readlines()
instead of 3.