How to fetch and print utf-8 data from mysql DB using Python?

11,605

Your problem is with your terminal (sys.stdout) encoding (cf http://en.wikipedia.org/wiki/Code_page_862), which depends on your system's settings. The best solution (as explained here : https://stackoverflow.com/a/15740694/41316) is to explicitely encode your unicode data before printing them to sys.stdout.

If you can't use a more usable encoding (utf-8 comes to mind, as it has been designed to handle all unicode characters), you can at least use an alternative error handling like "replace" (replaces non-encodable characters with '?') or "ignore" (suppress non-encodable characters).

Here's a corrected version of your code, you can play with the encoding and on_error settings to find out what solution works for you:

import sys
import MySQLdb

# set desired output encoding here
# it looks like your default encoding is "cp862"
# but you may want to first try 'utf-8' first
# encoding = "cp862"
encoding = "utf-8" 

# what do when we can't encode to the desired output encoding
# options are:
# - 'strict' : raises a UnicodeEncodeError (default)
# - 'replace': replaces missing characters with '?'
# - 'ignore' : suppress missing characters
on_error = "replace" 

db = MySQLdb.connect(
   "localhost","matan","pass","youtube", 
   charset='utf8',
   use_unicode=True
   )
cursor = db.cursor()
sql = "SELECT * FROM VIDEOS"
try:
   cursor.execute(sql)
   for i, row in enumerate(cursor):
      try:
         # encode unicode data to the desired output encoding
         title = row[0].encode(encoding, on_error)
         link = row[1].encode(encoding, on_error)
      except UnicodeEncodeError as e:
         # only if on_error='strict'
         print >> sys.stderr, "failed to encode row #%s - %s" % (i, e)
      else:
         print "title=%s\nlink=%s\n\n" % (title, link))
finally:
   cursor.close()
   db.close()

NB : you may also want to read this (specially the comments) http://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/ for more on Python, strings, unicode, encoding, sys.stdout and terminal issues.

Share:
11,605
Admin
Author by

Admin

Updated on June 04, 2022

Comments

  • Admin
    Admin almost 2 years

    I'm having problems with reading utf-8 data from a MySQL DB using Python. My DB contains one table named Videos, and the table contains at least one row that has Unicode chars, i.e.

    [KR] Samsung Galaxy Beam 2 간단 리뷰 [4K]

    The collation of the table is utf8_general_ci, just like the collation of the fields in the table.

    This is the code I wrote in order to fetch all the data from my table:

    # Open database connection
    db = MySQLdb.connect("localhost","matan","pass","youtube", charset = 'utf8',use_unicode=True)
    
    # prepare a cursor object using cursor() method
    cursor = db.cursor()
    
    # Prepare SQL query to INSERT a record into the database.
    sql = "SELECT * FROM VIDEOS"
    try:
       # Execute the SQL command
       cursor.execute(sql)
       # Fetch all the rows in a list of lists.
       results = cursor.fetchall()
       for row in results:
          title = row[0]
          link = row[1]
          # Now print fetched result
          print ("title=%s\nlink=%s\n\n" % \
                (title, link))
    except:
       print "Error: unable to fecth data"
    
    # disconnect from server
    db.close()
    

    When I run the above code, it prints all the rows that contain only "ascii" chars, but when it gets to a row that contains Unicode char (i.e. the line I mentioned above), it prints :

    File "C:\Users\Matan\Dropbox\Code\Python\youtube.py", line 28, in printall (title, link)) File "C:\Python27\lib\encodings\cp862.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode characters in position 33-34: c haracter maps to <undefined>

    and doesn't continue to the next row.

    I'm using PhpMyAdmin version 4.1.14, MySQL version 5.6.17, and Python version 2.7.8 .

    Edit: I dropped the except clause, and updated the error I've got.