UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)

21,018

str(cell) is trying to convert cell to ASCII. ASCII only supports characters with ordinals less than 255. What is cell?

If cell is a unicode string, just do cell.encode("utf8"), and that will return a bytestring encoded as utf 8

...or really iirc. If you pass mysql unicode, then the database will automagically convert it to utf8...

You could also try,

cell = unicode(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "")

or just use a 3rd party library. There is a good one that will fix text for you.

Share:
21,018
user3422637
Author by

user3422637

Updated on July 13, 2022

Comments

  • user3422637
    user3422637 almost 2 years

    I am using Python 2.7 and MySQLdb 1.2.3. I tried everything I found on stackoverflow and other forums to handle encoding errors my script is throwing. My script reads data from all tables in a source MySQL DB, writes them in a python StringIO.StringIO object, and then loads that data from StringIO object to Postgres database (which apparently is in UTF-8 encoding format. I found this by looking into Properties--Definition of database in pgadmin) using psycopg2 library's copy_from command.

    I found out that my source MySQL database has some tables in latin1_swedish_ci encoding while others in utf_8 encoding format (Found this from TABLE_COLLATION in information_schema.tables).

    I wrote all this code on the top of my Python script based on my research on the internet.

    db_conn = MySQLdb.connect(host=host,user=user,passwd=passwd,db=db, charset="utf8", init_command='SET NAMES UTF8' ,use_unicode=True) 
    db_conn.set_character_set('utf8') 
    db_conn_cursor = db_conn.cursor()
    db_conn_cursor.execute('SET NAMES utf8;')
    db_conn_cursor.execute('SET CHARACTER SET utf8;')
    db_conn_cursor.execute('SET character_set_connection=utf8;')
    

    I still get the UnicodeEncodeError below with this line: cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value,

    UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)
    

    I wrote the following line of code to clean cells in every table of source MySQL database when writing to StringIO object.

    cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value
    

    Please help.