Is PHP serialize function compatible UTF-8?

17,720

Solution 1

The behaviour is completely correct. Two strings with different encodings will generate different byte streams, thus different serialization strings.

Solution 2

Dump the database in latin1.

In the command line:

sed  -e 's/latin1/utf8/g' -i ./DBNAME.sql

Import the file converted to a new database in UTF-8.

Use a php script to update each field. Make a query, loop through each field and update the serialized string using this:

$str = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $str);

After that, I was able to use unserialize() and everything working with UTF-8.

Solution 3

To unserialize an utf-8 encoded serialized array:

$array = @unserialize($arrayFromDatabase);
if ($array === false) {
  $array =  @unserialize(utf8_decode($arrayFromDatabase)); //decode first
  $array = array_map('utf8_encode', $array ); // encode the array again
}

Solution 4

PHP 4 and 5 do not have built-in Unicode support; I believe PHP 6 is starting to add more Unicode support although I'm not sure how complete that is.

Share:
17,720
Matthieu Napoli
Author by

Matthieu Napoli

I am a software engineer passionate about code and human interactions around it. I like to work with great people, learn and get things done. You can read more about me on my blog or on my GitHub profile. Here are some projects I'm working on: bref.sh: deploy PHP on AWS Lambda to create serverless applications PHP-DI - Dependency injection library for PHP externals.io @matthieunapoli

Updated on August 17, 2022

Comments

  • Matthieu Napoli
    Matthieu Napoli over 1 year

    I have a site I want to migrate from ISO to UTF-8.

    I have a record in database indexed by the following primary key :

    s:22:"Informations générales";
    

    The problem is, now (with UTF-8), when I serialize the string, I get :

    s:24:"Informations générales";
    

    (notice the size of the string is now the number of bytes, not string length)

    So this is not compatible with non-utf8 previous records !

    Did I do something wrong ? How could I fix this ?

    Thanks