MySQL "incorrect string value" error when save unicode string in Django

137,282

Solution 1

I just figured out one method to avoid above errors.

Save to database

user.first_name = u'Rytis'.encode('unicode_escape')
user.last_name = u'Slatkevičius'.encode('unicode_escape')
user.save()
>>> SUCCEED

print user.last_name
>>> Slatkevi\u010dius
print user.last_name.decode('unicode_escape')
>>> Slatkevičius

Is this the only method to save strings like that into a MySQL table and decode it before rendering to templates for display?

Solution 2

None of these answers solved the problem for me. The root cause being:

You cannot store 4-byte characters in MySQL with the utf-8 character set.

MySQL has a 3 byte limit on utf-8 characters (yes, it's wack, nicely summed up by a Django developer here)

To solve this you need to:

  1. Change your MySQL database, table and columns to use the utf8mb4 character set (only available from MySQL 5.5 onwards)
  2. Specify the charset in your Django settings file as below:

settings.py

DATABASES = {
    'default': {
        'ENGINE':'django.db.backends.mysql',
        ...
        'OPTIONS': {'charset': 'utf8mb4'},
    }
}

Note: When recreating your database you may run into the 'Specified key was too long' issue.

The most likely cause is a CharField which has a max_length of 255 and some kind of index on it (e.g. unique). Because utf8mb4 uses 33% more space than utf-8 you'll need to make these fields 33% smaller.

In this case, change the max_length from 255 to 191.

Alternatively you can edit your MySQL configuration to remove this restriction but not without some django hackery

UPDATE: I just ran into this issue again and ended up switching to PostgreSQL because I was unable to reduce my VARCHAR to 191 characters.

Solution 3

I had the same problem and resolved it by changing the character set of the column. Even though your database has a default character set of utf-8 I think it's possible for database columns to have a different character set in MySQL. Here's the SQL QUERY I used:

    ALTER TABLE database.table MODIFY COLUMN col VARCHAR(255)
    CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

Solution 4

If you have this problem here's a python script to change all the columns of your mysql database automatically.

#! /usr/bin/env python
import MySQLdb

host = "localhost"
passwd = "passwd"
user = "youruser"
dbname = "yourdbname"

db = MySQLdb.connect(host=host, user=user, passwd=passwd, db=dbname)
cursor = db.cursor()

cursor.execute("ALTER DATABASE `%s` CHARACTER SET 'utf8' COLLATE 'utf8_unicode_ci'" % dbname)

sql = "SELECT DISTINCT(table_name) FROM information_schema.columns WHERE table_schema = '%s'" % dbname
cursor.execute(sql)

results = cursor.fetchall()
for row in results:
  sql = "ALTER TABLE `%s` convert to character set DEFAULT COLLATE DEFAULT" % (row[0])
  cursor.execute(sql)
db.close()

Solution 5

If it's a new project, I'd just drop the database, and create a new one with a proper charset:

CREATE DATABASE <dbname> CHARACTER SET utf8;
Share:
137,282

Related videos on Youtube

Blemone
Author by

Blemone

Updated on January 21, 2020

Comments

  • Blemone
    Blemone over 4 years

    I got strange error message when tried to save first_name, last_name to Django's auth_user model.

    Failed examples

    user = User.object.create_user(username, email, password)
    user.first_name = u'Rytis'
    user.last_name = u'Slatkevičius'
    user.save()
    >>> Incorrect string value: '\xC4\x8Dius' for column 'last_name' at row 104
    
    user.first_name = u'Валерий'
    user.last_name = u'Богданов'
    user.save()
    >>> Incorrect string value: '\xD0\x92\xD0\xB0\xD0\xBB...' for column 'first_name' at row 104
    
    user.first_name = u'Krzysztof'
    user.last_name = u'Szukiełojć'
    user.save()
    >>> Incorrect string value: '\xC5\x82oj\xC4\x87' for column 'last_name' at row 104
    

    Succeed examples

    user.first_name = u'Marcin'
    user.last_name = u'Król'
    user.save()
    >>> SUCCEED
    

    MySQL settings

    mysql> show variables like 'char%';
    +--------------------------+----------------------------+
    | Variable_name            | Value                      |
    +--------------------------+----------------------------+
    | character_set_client     | utf8                       | 
    | character_set_connection | utf8                       | 
    | character_set_database   | utf8                       | 
    | character_set_filesystem | binary                     | 
    | character_set_results    | utf8                       | 
    | character_set_server     | utf8                       | 
    | character_set_system     | utf8                       | 
    | character_sets_dir       | /usr/share/mysql/charsets/ | 
    +--------------------------+----------------------------+
    8 rows in set (0.00 sec)
    

    Table charset and collation

    Table auth_user has utf-8 charset with utf8_general_ci collation.

    Results of UPDATE command

    It didn't raise any error when updating above values to auth_user table by using UPDATE command.

    mysql> update auth_user set last_name='Slatkevičiusa' where id=1;
    Query OK, 1 row affected, 1 warning (0.00 sec)
    Rows matched: 1  Changed: 1  Warnings: 0
    
    mysql> select last_name from auth_user where id=100;
    +---------------+
    | last_name     |
    +---------------+
    | Slatkevi?iusa | 
    +---------------+
    1 row in set (0.00 sec)
    

    PostgreSQL

    The failed values listed above can be updated into PostgreSQL table when I switched the database backend in Django. It's strange.

    mysql> SHOW CHARACTER SET;
    +----------+-----------------------------+---------------------+--------+
    | Charset  | Description                 | Default collation   | Maxlen |
    +----------+-----------------------------+---------------------+--------+
    ...
    | utf8     | UTF-8 Unicode               | utf8_general_ci     |      3 | 
    ...
    

    But from http://www.postgresql.org/docs/8.1/interactive/multibyte.html, I found the following:

    Name Bytes/Char
    UTF8 1-4
    

    Is it means unicode char has maxlen of 4 bytes in PostgreSQL but 3 bytes in MySQL which caused above error?

  • Blemone
    Blemone over 14 years
    @Thomas, i tried exactly as what you said but it still raise same errors.
  • muudscope
    muudscope almost 14 years
    I'm having a similar problem, but I don't agree that this is a valid solution. When you .encode('unicode_escape') you're not actually storing unicode characters in the database. You're forcing all the clients to unencode before using them, which means it won't work properly with django.admin or all sorts of other things.
  • markpasc
    markpasc almost 13 years
    Ugh, I changed all the character sets on everything I could until I really re-read this answer: columns can have their own character sets, independent of the tables and the database. That's crazy and also was exactly my problem.
  • madprops
    madprops over 12 years
    This worked for me as well, using mysql with the defaults, in a TextField model.
  • markpasc
    markpasc almost 12 years
    While it seems distasteful to store escape codes instead of characters, this is probably one of the few ways to save 4-byte UTF-8 characters such as emoji in MySQL 5.1's 3-byte utf8 character set.
  • Chris
    Chris over 11 years
    This solution solved all my issues with a django app which was storing file and directory paths. Toss in dbname as your django database and let it run. Worked like a charm!
  • Mark Erdmann
    Mark Erdmann about 11 years
    This code didn't work for me until I added db.commit() before db.close().
  • CatShoes
    CatShoes about 11 years
    Does this solution avoid the issue discussed in @markpasc comment: '...4-byte UTF-8 characters such as emoji in MySQL 5.1's 3-byte utf8 character set'
  • Mihai Danila
    Mihai Danila over 10 years
    There is an encoding called utf8mb4 that allows more than the Basic Multilingual Plane to be stored. I know, you'd think "UTF8" is all that's needed to store Unicode fully. Well, whaddaya know, it's not. See dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html
  • Javier Vieira
    Javier Vieira over 10 years
    the solution help me when i was deleting a record trough django admin, i didn't had any problem when creating o editing...weird! I was even able to delete directly in the db
  • donturner
    donturner over 10 years
    @jack you might want to consider changing the accepted answer to one which is more useful
  • Marcelo Sardelich
    Marcelo Sardelich about 10 years
    it is a feasible workaround, but I do not recommend using it too (as advocated by @muudscope). I still can't store, for example, emoji to mysql databases. Has anybody accomplished it?
  • Michael Bylstra
    Michael Bylstra almost 10 years
    this answer needs way, way, way more upvotes. Thanks! The real problem is your application may run fine for years until someone tries to enter a 4byte character.
  • Vanuan
    Vanuan almost 10 years
    Should I do this every time I change the Model?
  • Carlos
    Carlos almost 10 years
    Fantastic, took me ages to find this.
  • stratis
    stratis over 9 years
    @madprops Thanks! Worked like a charm!
  • Xerion
    Xerion over 8 years
    This is absolutely the right answer. The OPTIONS setting is critical to make django decode emoji characters and store them in MySQL. Just changing mysql charset to utf8mb4 via SQL commands is not enough!
  • 1man
    1man over 8 years
    Thank you so much. You helped me a lot.
  • Rockallite
    Rockallite over 7 years
    There's no need to update the character set of the whole table to utf8mb4. Just update character set of necessary columns. Also the 'charset': 'utf8mb4' option in Django settings is critical, as @Xerion said. Finally, the index problem is a mess. Remove the index on the column, or make its length no more than 191, or use a TextField instead!
  • King
    King over 6 years
    Hi kindly help check this question stackoverflow.com/questions/46348817/…
  • followben
    followben about 6 years
    In my case, our db is created by docker so to fix I added the following to the db:command: instruction in my compose file: - --character-set-server=utf8
  • Enkum
    Enkum almost 6 years
    As simple as that. Thanks @Vanuan
  • Michal Przysucha
    Michal Przysucha over 4 years
    This solved my problem. The only change I did was to use utf8mb4 and utf8mb4_general_ci instead of utf8 / utf8_general_ci.
  • Mohammad Reza
    Mohammad Reza over 4 years
    if this is not a new project, we get backup from db, drop it and recreate it with utf8 charset and then restore backup. I did it in my project that was not new...
  • Qback
    Qback over 4 years
    I love your link to this quote: This is just another case of MySQL being purposefully and irreversibly brain-damaged. :)
  • alstr
    alstr over 3 years
    Note that if using mysql.connector.django as the database backend, you must also set 'collation': 'utf8mb4_unicode_ci' in OPTIONS.
  • bilbohhh
    bilbohhh over 3 years
    @followben thank you, that resolved my problem. Though I had to use --character-set-server=utf8mb4
  • user1953366
    user1953366 almost 3 years
    Not straightforward: Error Code: 1118. Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. This includes storage overhead, check the manual. You have to change some columns to TEXT or BLOBs . I can change it, but then Django will mess with it
  • user1953366
    user1953366 almost 3 years
    It helped me, also I needed to change it in MySQL: ALTER TABLE <<table_name>> CONVERT TO CHARACTER SET 'utf8mb4' Which also required me to limit size of certain columns. But it worked. Yeee
  • huyxdong
    huyxdong about 2 years
    This line sql = "ALTER TABLE `%s` convert to character set DEFAULT COLLATE DEFAULT" % (row[0]) should be changed to sql = "ALTER TABLE `%s` convert to character set 'utf8' COLLATE 'utf8_unicode_ci' " % (row[0]) . Thank you for very best answer.