enforce column encoding with sqlalchemy
Solution 1
To specify a specific collation per column, use the collation
parameter on the data type:
class MyTableName(Base):
__tablename__ = "mytablename2"
test_column = Column(Unicode(2),
primary_key=True)
test_column2 = Column(Unicode(2, collation='utf8_bin'))
# ^^^^^^^^^^^^^^^^^^^^
Mind that MySQL understands this as both the set of codepoints to describe the text as well as the sort order the text will be indexed with; the usual suspects like 'utf8' or 'utf-8' won't be familiar to MySQL (use SHOW COLLATION
to see the full list)
mysql> show full columns from mytablename2;
+--------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+--------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
| test_column | varchar(2) | latin1_swedish_ci | NO | PRI | NULL | | select,insert,update,references | |
| test_column2 | varchar(2) | utf8_bin | YES | | NULL | | select,insert,update,references | |
+--------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+
2 rows in set (0.00 sec)
mysql>
Solution 2
For me the collation parameter did not work.
My connection string was:
db = create_engine('mysql+pymysql://user:pass@dbhost/schema?charset=utf8')
Pymysql was executing set names utf8 because of the charset, and the database was converting the utf8 to the encoding of the table, resulting in data loss.
If i left the charset out, the charset was defaulting to latin1, and pymysql was trying to encode my utf8 strings to latin1 before sending them to the database, thus throwing UnicodeEncode errors.
This worked for me :session.execute(text("SET NAMES latin1"))
to make the database assume the utf8 strings i was sending did not need to be converted.
Jealie
Updated on June 03, 2022Comments
-
Jealie almost 2 years
I am using sqlalchemy to create the schema of my database. I have no success in enforcing the use of utf-8, no matter what I tried.
Here is a minimal python script that recreates my problem:
from sqlalchemy import create_engine, Column, Unicode from sqlalchemy.ext.declarative import declarative_base engine = create_engine('mysql+mysqldb://user:password@localhost/multidic?charset=utf8', echo=True) Base = declarative_base() class MyTableName(Base): __tablename__ = "mytablename" test_column = Column(Unicode(2),primary_key=True) Base.metadata.create_all(engine)
After running this script, when I look into the database, I see that the encoding is latin1 instead of utf-8:
mysql> SHOW FULL COLUMNS FROM mytablename; +-------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+ | Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment | +-------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+ | test_column | varchar(2) | latin1_swedish_ci | NO | PRI | NULL | | select,insert,update,references | | +-------------+------------+-------------------+------+-----+---------+-------+---------------------------------+---------+ 1 row in set (0.00 sec)
I have tried changing the type of the column created (String instead of Unicode), and tried also to add the argument encoding = "utf8" in the call to create_engine, but none of it worked.
So, my question is:
How to enforce the use of a given character encoding (utf-8 in my case) in MySQL, with sqlalchemy ?
Thank you :)
Notes:
I am using sqlalchemy 0.7 and python 2.7; I can possibly upgrade one or both, but only if it is the only solution!
I have mysql 5, and it supports utf-8:
mysql> show character set where charset="utf8"; +---------+---------------+-------------------+--------+ | Charset | Description | Default collation | Maxlen | +---------+---------------+-------------------+--------+ | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | +---------+---------------+-------------------+--------+ 1 row in set (0.00 sec)
-
Jealie over 10 yearsTrying
test_column2 = Column(Unicode(2, collation='utf8_bin'))
throws an error:TypeError: __init__() got an unexpected keyword argument 'collation'
... In addition, tryingtest_column2 = Column(Unicode(2), collation='utf8_bin')
throws another error:sqlalchemy.exc.ArgumentError: Unknown arguments passed to Column: ['collation']
-
flow over 10 yearsexactly what the world needs—A DATABASE THAT CONFUSES ENCODING WITH COLLATION, AND CALLS ENCODING 'CHARSET'. did you know Latin-1 is seriously OUTDATED ever since it got replaced by Latin-9 = ISO-8859-15 in 1999? Latin-1 doesn't even have a Euro-sign.
-
Jealie over 10 yearsUpgrading to the latest version of SQLalchemy (i.e. 0.8.2) made your solution work... Thank you !!