how to detect and fix character encoding in a mysql database via php?

11,823

Solution 1

When you connect to the database remember to always use mysql_set_charset('utf8', $db_connection);

it will fix everything, it solved all my problems.

See this: http://phpanswer.com/store-french-characters-into-mysql-db-and-display/

Solution 2

I think you might be taking a more compilation approach. I received a Bulgarian database a few weeks back that was dynamically encoded in the DB, but when moving it to another database I got the funky ???

The way I solved that was by dumping the database, setting the database to utf8 collation and then importing the data as binary. This auto-converted everything to utf8 and didn't give me anymore ???.

This was in MySQL

Share:
11,823
pixeline
Author by

pixeline

I'm a self-trained web designer / web developer and since 2009 web teacher. I live in Belgium. #SOreadytohelp

Updated on June 15, 2022

Comments

  • pixeline
    pixeline almost 2 years

    I have received this database full of people names and data in French, which means, using characters such as é,è,ö,û, etc. Around 3000 entries.

    Apparently, the data inside has been encoded sometimes using utf8_encode(), and sometimes not. This result in a messed up output: at some places the characters show up fine, at others they don't.

    At first i tried to track down every place in the UI where those issues arise and use utf8_decode() where necessary, but it's really not a practicable solution.

    I did some testing and there is no reason to use utf8_encode in the first place, so i'd rather remove all that and just work in UTF8 everywhere - at the browser, middleware and database levels. So i need to clean the database, converting all misencoded data by its cleaned up version.

    Question : would it be possible to create a function in php that would check if a utf8 string is correctly encoded (without utf8_encode) or not (with utf8_encode), and, if it was, convert it back to its original state?

    In other terms: i would like to know how i could detect utf8 content that has been utf8_encode() to utf8 content that has not been utf8_encode()d.

    **UPDATE: EXAMPLE **

    Here is a good example: you take a string full of special chars and take a copy of that string and utf8_encode() it. The function i'm dreaming of takes both strings, leaves the first one untouched and the second string is now the same as string one.

    I tried this:

    $loc_fr = setlocale(LC_ALL, 'fr_BE.UTF8','fr_BE@euro', 'fr_BE', 'fr', 'fra', 'fr_FR');
    $str1= "éèöûêïà ";
    $str2 = utf8_encode($str1);
    
    function convert_charset($str) {
        $charset=  mb_detect_encoding($str);
        if( $charset=="UTF-8" ) {
            return utf8_decode($str);
        }
        else {
            return $str;
        }
    }
    function correctString($str) {
        echo "\nbefore: $str";
        $str= convert_charset($str);
        echo "\nafter: $str"; 
    }
    
    correctString($str1);
    echo('<hr/>'."\n");
    correctString($str2);
    

    And that gives me:

    before: éèöûêïà after: ������� 
    before: éèöûêïà  after: éèöûêïà 
    

    Thanks,

    Alex

  • pixeline
    pixeline over 14 years
    hi Gumbo, i've updated my question with a first (unsuccessful try). Can you please look and advise?
  • pixeline
    pixeline over 14 years
    mmh, sounds good! How exactly do you do the "importing the data as binary" part? Is it possible via phpmyadmin ?
  • Gus
    Gus over 14 years
    Sorry I took a while to answer I was away. It is possible via phpmyadmin i38.tinypic.com/1z8cgj.jpg
  • pixeline
    pixeline over 14 years
    Hi Gus. Thanks for getting back to me. I tried it and no luck. Béatrice still turns out as Béatrice both on the old and the new database.