Python script to convert from UTF-8 to ASCII
139,413
Solution 1
data="UTF-8 DATA"
udata=data.decode("utf-8")
asciidata=udata.encode("ascii","ignore")
Solution 2
UTF-8 is a superset of ASCII. Either your UTF-8 file is ASCII, or it can't be converted without loss.
Author by
Nicolas
Updated on January 07, 2020Comments
-
Nicolas over 4 years
I'm trying to write a script in python to convert utf-8 files into ASCII files:
#!/usr/bin/env python # *-* coding: iso-8859-1 *-* import sys import os filePath = "test.lrc" fichier = open(filePath, "rb") contentOfFile = fichier.read() fichier.close() fichierTemp = open("tempASCII", "w") fichierTemp.write(contentOfFile.encode("ASCII", 'ignore')) fichierTemp.close()
When I run this script I have the following error :
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 13: ordinal not in range(128)
I thought that can ignore error with the ignore parameter in the encode method. But it seems not.
I'm open to other ways to convert.
-
Ignacio Vazquez-Abrams over 13 yearsI think he's aware of that, otherwise he wouldn't be trying to use
'ignore'
. -
Tobu over 13 years@Ignacio True. But this one left me wondering what the asker is trying to achieve. They could be cargo-culting, or maybe their need is best met by something like urlencode, or being lossy is just acceptable.
-
tchrist over 13 yearsSounds like a bad recipe for data loss.
-
tchrist over 13 yearsI am afraid of the cargo-culting. Culling all characters that you don’t have an appreciation for is really insensitive.
-
Utku Zihnioglu over 13 yearsYou should expect data loss if you wish to convert from a 8bit encoding to 7bit.
-
tchrist over 13 years@Ignacio: Imagine being addressed as Vzquez-Abrams. :(
-
Ignacio Vazquez-Abrams over 13 years@tchrist: That's why I never use it.
-
Nicolas over 13 yearsI ignored that I have to decode first. It works now thanks. To answer to the questions, I want to do this because my MP3 player can only display lyrics files encoded in ASCII.
-
JSBach about 7 yearsYou can have a look at this solution: stackoverflow.com/a/517974/1463812
-
Kovalex over 2 yearsSometimes you can convert UTF8 to ASCII without losses, for instance, single quotes or apostrophes, in few other cases - arithmetic operations - both available as UTF8 long encoding and ASCII single symbol.