UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)
str(cell)
is trying to convert cell
to ASCII. ASCII only supports characters with ordinals less than 255. What is cell?
If cell
is a unicode string, just do cell.encode("utf8")
, and that will return a bytestring encoded as utf 8
...or really iirc. If you pass mysql unicode, then the database will automagically convert it to utf8...
You could also try,
cell = unicode(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "")
or just use a 3rd party library. There is a good one that will fix text for you.
user3422637
Updated on July 13, 2022Comments
-
user3422637 almost 2 years
I am using Python 2.7 and MySQLdb 1.2.3. I tried everything I found on stackoverflow and other forums to handle encoding errors my script is throwing. My script reads data from all tables in a source MySQL DB, writes them in a python
StringIO.StringIO
object, and then loads that data fromStringIO
object to Postgres database (which apparently is in UTF-8 encoding format. I found this by looking into Properties--Definition of database in pgadmin) using psycopg2 library's copy_from command.I found out that my source MySQL database has some tables in latin1_swedish_ci encoding while others in utf_8 encoding format (Found this from TABLE_COLLATION in information_schema.tables).
I wrote all this code on the top of my Python script based on my research on the internet.
db_conn = MySQLdb.connect(host=host,user=user,passwd=passwd,db=db, charset="utf8", init_command='SET NAMES UTF8' ,use_unicode=True) db_conn.set_character_set('utf8') db_conn_cursor = db_conn.cursor() db_conn_cursor.execute('SET NAMES utf8;') db_conn_cursor.execute('SET CHARACTER SET utf8;') db_conn_cursor.execute('SET character_set_connection=utf8;')
I still get the
UnicodeEncodeError
below with this line:cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value
,UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)
I wrote the following line of code to clean cells in every table of source MySQL database when writing to StringIO object.
cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value
Please help.