Values in UTF-8 being encoded as NULL in JSON

15,136

Solution 1

The reason could be the current client character setting. A simple solution could be to do set the client with mysql_query('SET CHARACTER SET utf8') before running the SELECT query.

Update (June 2014)

The mysql extension is deprecated as of PHP 5.5.0. It is now recommended to use mysqli. Also, upon further reading - the above way of setting the client set should be avoided for reasons including security.

I haven't tested it, but this should be an ok substitute:

$mysqli = new mysqli("localhost", "my_user", "my_password", "my_db");
if (!$mysqli->set_charset('utf8')) {
    printf("Error loading character set utf8: %s\n", $mysqli->error);
} else {
    printf("Current character set: %s\n", $mysqli->character_set_name());
}

or with the connection parameter :

$conn = mysqli_connect("localhost", "my_user", "my_password", "my_db");
if (!mysqli_set_charset($conn, "utf8")) {
    # TODO - Error: Unable to set the character set
    exit;
}

Solution 2

I tried your code sample like this

[~]> cat utf.php 
<?php
$arr = array('Coffee', 'Cappuccino', 'Café');
print json_encode($arr);
[~]> php utf.php 
["Coffee","Cappuccino","Caf\u00e9"]
[~]>

Based on that I would say that if the source data is really UTF-8, then json_encode works just fine. If its not, then thats where you get null. Why its not, I cannot tell based on this information.

Solution 3

json_encode seems to be dropping strings that contain invalid characters. It is likely that your UTF-8 data is not arriving in the proper form from your database.

Looking at the examples you give, my wild guess would be that your database connection is not UTF-8 encoded and serves ISO-8859-1 characters instead.

Can you try a SET NAMES utf8; after initializing the connection?

Solution 4

Try sending your array through this function before doing json_encode():

<?php

function utf8json($inArray) {

    static $depth = 0;

    /* our return object */
    $newArray = array();

    /* safety recursion limit */
    $depth ++;
    if($depth >= '30') {
        return false;
    }

    /* step through inArray */
    foreach($inArray as $key=>$val) {
        if(is_array($val)) {
            /* recurse on array elements */
            $newArray[$key] = utf8json($inArray);
        } else {
            /* encode string values */
            $newArray[$key] = utf8_encode($val);
        }
    }

    /* return utf8 encoded array */
    return $newArray;
}
?>

Taken from comment on phpnet @ http://php.net/manual/en/function.json-encode.php.

The function basically loops though array elements, perhaps you did your utf-8 encode on the array itself?

Share:
15,136
mwieczorek
Author by

mwieczorek

Updated on July 24, 2022

Comments

  • mwieczorek
    mwieczorek almost 2 years

    I have a set of keywords that are passed through via JSON from a DB (encoded UTF-8), some of which may have special characters like é, è, ç, etc. This is used as part of an auto-completer. Example:

    array('Coffee', 'Cappuccino', 'Café');
    

    I should add that the array as it comes from the DB would be:

    array('Coffee', 'Cappuccino', 'Café');
    

    But JSON encodes as:

    ["coffee", "cappuccino", null];
    

    If I print these via print_r(), they show up fine on a UTF-8 encoded webpage, but café comes through as "café" if text/plain is used if I want to look at the array using print_r($array);exit();.

    If I encode using utf8_encode() before encoding to JSON, it comes through fine, but what gets printed on the webpage is "café" and not "café".

    Also strange, but json_last_error() is being seen as an undefined function, but json_decode() and json_encode() work fine.

    Any ideas on how to get UTF-8 encoded data from the database to behave the same throughout the entire process?

    EIDT: Here is the PHP function that grabs the keywords and makes them into a single array:

    private function get_keywords() 
    {
        global $db, $json;
    
        $output = array();
    
        $db->query("SELECT keywords FROM listings");
    
        while ($r = $db->get_array())
        {
            $split = explode(",", $r['keywords']);
    
            foreach ($split as $s)
            {
                $s = trim($s);
                if ($s != "" && !in_array($s, $output)) $output[] = strtolower($s);
            }
        }
    
        $json->echo_json($output);
    }
    

    The json::echo_json method just encodes, sets the header and prints it (for usage with Prototype)

    EDIT: DB Connection method:

    function connect()
    {
    
        if ($this->set['sql_connect'])
        {
            $this->connection = @mysql_connect( $this->set['sql_host'], $this->set['sql_user'], $this->set['sql_pass'])
                    OR $this->debug( "Connection Error", mysql_errno() .": ". mysql_error());
            $this->db = @mysql_select_db( $this->set['sql_name'], $this->connection)
                    OR $this->debug( "Database Error", "Cannot Select Database '". $this->set['sql_name'] ."'");
    
            $this->is_connected = TRUE;
        }
    
        return TRUE;
    }
    

    More Updates: Simple PHP script I ran:

    echo json_encode( array("Café") ); // ["Caf\u00e9"]
    echo json_encode( array("Café") ); // null
    
  • Pekka
    Pekka over 13 years
    This is a function to encode an array that is not UTF-8 into UTF-8. That is fine in itself, but probably not the solution here (the OP's incoming data is already UTF-8).
  • mwieczorek
    mwieczorek over 13 years
    "Café" comes through as "Caf\u00e9" if I utf8_encode() each keyword as it's added to array $output (see code). This works fine, but when adding to the DOM via Prototype's update() method, it comes through as "Café".
  • mwieczorek
    mwieczorek over 13 years
    I'm not exactly sure how to do this. I use a mysql class of my own to govern all SQL interaction. Is this a flag set on the connection itself, or does this have to be applied to every query I perform?
  • Pekka
    Pekka over 13 years
    @Mike it is a flag set on the connection itself, needs to be run just once. Just try hacking the call into your SQL class for a second to see whether that's the problem.
  • Pekka
    Pekka over 13 years
    @Mike yeah, just add a mysql_query("SET NAMES utf8", $this->connection); before the return
  • Anti Veeranna
    Anti Veeranna over 13 years
    Then the problem appears to be on the client side. Can you check the page (the html page, that contains your Prototype code) encoding in your browser?
  • mwieczorek
    mwieczorek over 13 years
    @Pekka - added that query, but still... "Café" as utf-8 encoded (and straight "Café" are returning NULL. My JSON header:
  • mwieczorek
    mwieczorek over 13 years
    ...JSON header is header('Content-type: application/json');
  • Pekka
    Pekka over 13 years
    @Mike nonono, you wouldn't want to do any utf8_encode() ing here. The process you are doing should work without any additional encoding or decoding.
  • mwieczorek
    mwieczorek over 13 years
    All HTML pages are UTF-8. The 'null' values I see are showing up in Firebug - the raw JSON has these values as null. Here's a truncated version of the JSON echoed: ["amok","curries","draught beer","anchor beer","half-pound burgers",null,"british food","english food","cold beer",null,"seafood","punjabi","halal food","tandoori"]
  • Anti Veeranna
    Anti Veeranna over 13 years
    Thats interesting. How is it possible that properly encoded data comes out of json_encode, yet when it gets to browser/firebug, its null? Try sending JSON header like this: header('Content-type: application/json; charset=utf-8');
  • Tiago Gouvêa
    Tiago Gouvêa about 11 years
    I had the same problem, and resolve with this after connect: mysql_query("SET NAMES utf8");
  • sms247
    sms247 over 7 years
    you rock for giving us magical statement mysqli_set_charset($conn, "utf8")