Fetching UTF-8 text from MySQL in R returns "????"

12,618

Solution 1

Thanks to @chooban I found out the connection session was using latin1 instead of utf8. Here are two solutions I found:

  • For RMySQL, after connecting run the query SET NAMES utf8 to change the connection character set.
  • For RODBC, connect using CharSet=utf8 in the DSN string. I was not able to run SET NAMES via ODBC.

This question pointed me in the right direction.

Solution 2

This worked for me. Here is a full example:

con = dbConnect(drv = MySQL(), user = user, password = password,
                dbname = dbname, host=host)

dbSendQuery(con, "SET NAMES utf8mb4;")
dbSendQuery(con, "SET CHARACTER SET utf8mb4;")
dbSendQuery(con, "SET character_set_connection=utf8mb4;")


dbGetQuery(con, "SELECT * FROM WHATEVER")

Solution 3

Here's something to try at least. After you've connected, run "SHOW VARIABLES LIKE 'character_set_%'" and print out the results. If nothing else it's a useful check to see if the character set options you've specified have taken.

Share:
12,618
Daniel Dickison
Author by

Daniel Dickison

I love Lisp, finding Swift pretty great, and I'd probably get hooked on Rust if I found some time to get to know it. Also: I don't hate JavaScript.

Updated on June 03, 2022

Comments

  • Daniel Dickison
    Daniel Dickison almost 2 years

    I'm stuck trying to fetch UTF-8 text in a MySQL database from R. I'm running R on OS X (tried both via the GUI and command line), where the default locale is en_US.UTF-8, and no matter what I try, the query result shows "?" for all non-ASCII characters.

    I've tried setting options(encoding='UTF-8'), DBMSencoding='UTF-8' when connecting via ODBC, setting Encoding(res$str) <- 'UTF-8' after fetching the results, as well as 'utf8' variants of each of those, all to no avail. Running the query from the command line mysql client shows the results correctly.

    I'm totally stumped. Any ideas why it's not working, or other things I should try?

    Here's a fairly minimal test case:

    $ mysql -u root
    mysql> CREATE DATABASE test;
    mysql> USE test;
    mysql> CREATE TABLE test (str VARCHAR(10)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    Query OK, 0 rows affected (0.02 sec)
    
    mysql> INSERT INTO test (str) VALUES ('こんにちは');
    Query OK, 1 row affected (0.00 sec)
    
    mysql> select * from test;
    +-----------------+
    | str             |
    +-----------------+
    | こんにちは      |
    +-----------------+
    1 row in set (0.00 sec)
    

    Querying the table in R using both RODBC and RMySQL shows "?????" for the str column:

    > con <- odbcDriverConnect('DRIVER=mysql;user=root', DBMSencoding='UTF-8')
    > sqlQuery(con, 'SELECT * FROM rtest.test')
        str
    1 ?????
    > library(RMySQL)
    Loading required package: DBI
    > con <- dbConnect(MySQL(), user='root')
    > dbGetQuery(con, 'SELECT * FROM rtest.test')
        str
    1 ?????
    

    For completeness, here's my sessionInfo:

    > sessionInfo()
    R version 2.15.1 (2012-06-22)
    Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
    
    locale:
    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
    
    attached base packages:
    [1] stats     graphics  grDevices utils     datasets  methods   base     
    
    other attached packages:
    [1] RMySQL_0.9-3 DBI_0.2-5    RODBC_1.3-6