Using str_split on a UTF-8 encoded string

81,935

Solution 1

str_split does not work with multi-byte characters, it will only return the first byte - thus invalidating your characters. you could use mb_split.

Solution 2

Mind that the utf8 declaration used in your connect-string is reported to be not working. In the comments on php.net I frequently see this alternative:

$dbHandle = new PDO("mysql:host=$dbHost;dbname=$dbName;charset=utf8", $dbUser, $dbPass,
                    array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'"));

Solution 3

UTF-8 Using PDO

problems when writing international (even Chinese and Thailandic) characters to the database

there may be more ways to make this work. I am not an expert, just a tech-freak, interested to understand all this. In Linux and Windows I have set up a few CMS (content-managing-systems), using a sample from the following website:

'http://www.elated.com/articles/cms-in-an-afternoon-php-mysql'

The sample is using PDO for insert, update and delete.

It took me a few hours to find a solution. Whatever I did, I always concluded differences between the data in my forms and in the phpmyadmin/heidi -views

I followed the hints of: 'https://mathiasbynens.be/notes/mysql-utf8mb4' but there was still no success

In my CMS-structure there is a file 'Config.php': After reading this webpage I changed the line

    define( 'DB_DSN', 'mysql:host=localhost;dbname=mythings);

to

    define( 'DB_DSN', 'mysql:host=localhost;dbname=mythings;charset=utf8');

Now all works fine.

Solution 4

The str_split function splits by byte, not by character. You'll need mb_split.

Solution 5

this work for me... hope its usefull.

ensure that the database, apache and every config was in utf8.

PDO OBJECT

            $dsn = 'mysql:host=' . Config::read('db.host') . ';dbname=' . config::read('db.basename') .';charset=utf8'. ';port=' . Config::read('db.port') .';connect_timeout=15';
            $user = Config::read('db.user');
            $password = Config::read('db.password');
            $this->dbh = new PDO($dsn, $user, $password,array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'"));
            $this->dbh->setAttribute(PDO::ATTR_EMULATE_PREPARES, false);

it work if not using another function like str_word_count.

USING str_word_count you need to use utf8_decode(utf8_encode)..

function cortar($str)
{
    if (20>$count=str_word_count($str)) {
        return $str;
    }
    else
    {
        $array = str_word_count($str,1,'.,-0123456789()+=?¿!"<>*ñÑáéíóúÁÉÍÓÚ@|/%$#¡');
        $s='';
        $c=0;
        foreach ($array as $e) {
            if (20>$c) {
                if (19>$c) {
                $s.=$e.' ';
                }
                else
                {
                $s.=$e;
                }               
            }
            $c+=1;
        }
        return utf8_decode(utf8_encode($s));
    }
}

function returs string with 20 words.

Share:
81,935

Related videos on Youtube

Jonathan
Author by

Jonathan

I'm a designer and software developer from Sweden running my own business, as well as working as a consultant for company called EC Solutions (http://www.ecsolutions.se). I specialize in the following programming languages: PHP JavaScript ActionScript 2/3 C# (Xamarin.iOS / MonoTouch) I also specialize in these markup languages: CSS 2/3 HTML 4/5

Updated on May 02, 2020

Comments

  • Jonathan
    Jonathan about 4 years

    I'm currently working on a project, and instead of using regular MySQL queries I thought I'd go ahead and learn how to use PDO.

    I have a table called contestants, both the database, the table, and all of the columns are in utf-8. I have ten entries in the contestant table, and their column "name" contains characters such as åäö.

    Now, when I fetch an entry from the database, and var_dump the name, I get a good result, a string with all the special characters intact. But what I need to do is to split the string by characters, to get them in an array that I then shuffle.

    For instance, I have this string: Test ÅÄÖ Tåän

    And when I run str_split I get each character in it's own key in an array. The only issue is that all the special characters display as this: �, meaning the array will be like this:

    Array
    (
        [0] => T
        [1] => e
        [2] => s
        [3] => t
        [4] =>  
        [5] => �
        [6] => �
        [7] => �
        [8] => �
        [9] => �
        [10] => �
        [11] =>  
        [12] => T
        [13] => �
        [14] => �
        [15] => �
        [16] => �
        [17] => n
    )
    

    As you can see, it not only messes up the characters, but it also duplicates them in str_split process. I've tried several ways to split the string, but they all have the same issue. When I output the string before the split, it shows the special characters just fine.

    This is my dbConn.php code:

    // Require config file: require_once('config.inc.php');

    // Start PDO connection:
    $dbHandle = new PDO("mysql:host=$dbHost;dbname=$dbName;charset=utf-8", $dbUser, $dbPass);
    $dbHandle -> exec("SET CHARACTER SET utf8");
    
    // Set error reporting:
    $dbHandle->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_WARNING);
    

    And this is the code that I use to fetch from the database and loop:

    // Require files:
    require_once('dbConn.php');
    
    // Get random artist:
    $artist = $dbHandle->query("SELECT * FROM ".ARTIST_TABLE." WHERE id = 11 ORDER BY RAND() LIMIT 1");
    $artist->setFetchMode(PDO::FETCH_OBJ);
    $artist = $artist->fetch();
    var_dump($artist->name);
    
    // Split name:
    $artistChars = str_split($artist->name);
    

    I'm connecting with utf-8, my php file is utf-8 without BOM and no other special characters on this page share this issue. What could be wrong, or what am I doing wrong?

    • Shane Jones
      Shane Jones about 9 years
      Should be charset=utf8 in the PDO part.
  • Jonathan
    Jonathan over 12 years
    I did try to use mb_split now that you told me, and it seemed to work, although I could not find a proper regexp so I ended up using preg_split. $artistChars = preg_split('/(?<!^)(?!$)/u', $artist->name); Although, now I need to make all the characters lowercase, and mb_convert_case messes up the characters as well..
  • Jonathan
    Jonathan over 12 years
    Thanks :) I ended up using preg_split like this: $artistChars = preg_split('/(?<!^)(?!$)/u', $artist->name); Although, now I need to make all the characters lowercase, and mb_convert_case messes up the characters as well..
  • Adam Thornton
    Adam Thornton almost 11 years
    +1, I had a similar issue where an ü in the database was being displayed as a � in php. Adding that additional parameter to my PDO connection fixed the issue.
  • NotGaeL
    NotGaeL almost 11 years
    +1, it took me 4 hours to finally find what was causing the problem. After using the exact connection script from the PHP manual example, how that could possibly be the problem? I'm so mad at the PHP development team right now...
  • Matthew Walker
    Matthew Walker over 10 years
    Oh thank you so very much for this reply. A million "thank you"s and more.
  • Dean Or
    Dean Or over 10 years
    Had a similar issue (no string splitting) where results had � between characters. This fixed it.
  • Fabian Pas
    Fabian Pas over 10 years
    Luckily I found this answer directly, saved me a lot of pain!
  • Bud Damyanov
    Bud Damyanov about 10 years
    @Jonathan, you can try the mb_ereg() function, php.net/manual/en/function.mb-ereg.php
  • bizzr3
    bizzr3 almost 10 years
    +1 for , array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES 'utf8'"), you saved my life
  • Pathros
    Pathros about 6 years
    That did the trick! In my very humble pretending opinion, this should be the best answer.