Python script to convert from UTF-8 to ASCII

139,413

Solution 1

data="UTF-8 DATA"
udata=data.decode("utf-8")
asciidata=udata.encode("ascii","ignore")

Solution 2

UTF-8 is a superset of ASCII. Either your UTF-8 file is ASCII, or it can't be converted without loss.

Share:
139,413
Nicolas
Author by

Nicolas

Updated on January 07, 2020

Comments

  • Nicolas
    Nicolas over 4 years

    I'm trying to write a script in python to convert utf-8 files into ASCII files:

    #!/usr/bin/env python
    # *-* coding: iso-8859-1 *-*
    
    import sys
    import os
    
    filePath = "test.lrc"
    fichier = open(filePath, "rb")
    contentOfFile = fichier.read()
    fichier.close()
    
    fichierTemp = open("tempASCII", "w")
    fichierTemp.write(contentOfFile.encode("ASCII", 'ignore'))
    fichierTemp.close()
    

    When I run this script I have the following error :

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 13: ordinal not in range(128)

    I thought that can ignore error with the ignore parameter in the encode method. But it seems not.

    I'm open to other ways to convert.

  • Ignacio Vazquez-Abrams
    Ignacio Vazquez-Abrams over 13 years
    I think he's aware of that, otherwise he wouldn't be trying to use 'ignore'.
  • Tobu
    Tobu over 13 years
    @Ignacio True. But this one left me wondering what the asker is trying to achieve. They could be cargo-culting, or maybe their need is best met by something like urlencode, or being lossy is just acceptable.
  • tchrist
    tchrist over 13 years
    Sounds like a bad recipe for data loss.
  • tchrist
    tchrist over 13 years
    I am afraid of the cargo-culting. Culling all characters that you don’t have an appreciation for is really insensitive.
  • Utku Zihnioglu
    Utku Zihnioglu over 13 years
    You should expect data loss if you wish to convert from a 8bit encoding to 7bit.
  • tchrist
    tchrist over 13 years
    @Ignacio: Imagine being addressed as Vzquez-Abrams. :(
  • Ignacio Vazquez-Abrams
    Ignacio Vazquez-Abrams over 13 years
    @tchrist: That's why I never use it.
  • Nicolas
    Nicolas over 13 years
    I ignored that I have to decode first. It works now thanks. To answer to the questions, I want to do this because my MP3 player can only display lyrics files encoded in ASCII.
  • JSBach
    JSBach about 7 years
    You can have a look at this solution: stackoverflow.com/a/517974/1463812
  • Kovalex
    Kovalex over 2 years
    Sometimes you can convert UTF8 to ASCII without losses, for instance, single quotes or apostrophes, in few other cases - arithmetic operations - both available as UTF8 long encoding and ASCII single symbol.