strtolower() for unicode/multibyte strings

25,392

Solution 1

have your tried using mb_strtolower()?

Solution 2

PHP5 is not UTF-8 compatible, so you still need to resort to the mb extension. I suggest you set the internal encoding of mb to utf-8 and then you can freely use its functions without specifying the charset all the time:

mb_internal_encoding('UTF-8');

...

$b = mb_strtolower($a);
echo $b;

Solution 3

i have found this solution from here

$string = 'Թ';
echo 'Uppercase: '.mb_convert_case($string, MB_CASE_UPPER, "UTF-8").'';
echo 'Lowercase: '.mb_convert_case($string, MB_CASE_LOWER, "UTF-8").'';
echo 'Original: '.$string.'';

works for me (lower case)

Solution 4

Have you tried

http://www.php.net/manual/en/function.mb-strtolower.php

mb_strtolower() and specifying the encoding as the second parameter?

The examples on that page appear to work.

You could also try:

$str = mb_strtolower($str, mb_detect_encoding($str));

Solution 5

Php by default does not know about utf-8. It assumes any string is ASCII, so it strtolower converts bytes containing codes of uppercase letters A-Z to codes of lowercase a-z. As the UTF-8 non-ascii letters are written with two or more bytes, the strtolower converts each byte separately, and if the byte happens to contain code equal to letters A-Z, it is converted. In the result the sequence is broken, and it no longer represents correct character.

To change this you need to configure the mbstring extension:

http://www.php.net/manual/en/book.mbstring.php

to replace strtolower with mb_strtolower or use mb_strtolower direclty. I any case, you need to spend some time to configure the mbstring settings to match your requirements.

Share:
25,392
Simon
Author by

Simon

Studied at State Engineering University of Armenia - SEUA, Cybernetic department, faculty of Microelectronics and Semiconductor devices, degree with distinction. In 2009 graduated from Bachelor program with a specialization in VLSI Design at educational department of SYNOPSYS Armenia. In 2011 get the Master's degree with a specialization in VLSI Design at SYNOPSYS Armenia. He started his IT carrier in 2007 with an associate programmer position at Business Solutions LLC, in 2009 got the senior programmer position. Since 2012 created his own firm - Creative Solutions, to create creative and easy-to-use software. In 2018 created his last project - CS Builder, Which is website builder platform, with creative approach to details... Since 2020 started new project - GSpeech, to allow users to integrate text to speech solutions in their projects.

Updated on August 14, 2021

Comments

  • Simon
    Simon almost 3 years

    I have some text in a non-English/foreign language in my page, but when I try to make it lowercase, it characters are converted into black diamonds containing question marks.

    $a = "Երկիր Ավելացնել";
    echo $b = strtolower($a);
    //returns  ����� ���������
    

    I've set my charset in a metatag, but this didn't fix it.

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    

    What can I do to convert my string to lowercase without corrupting it?

  • Pekka
    Pekka about 14 years
    @Syom did you specify UTF-8 as the encoding?
  • Nick Bastin
    Nick Bastin about 14 years
    strtolower does actually work on multibyte characters, it just works off of the current locale, which is not usually what you want in these cases.
  • SteelBytes
    SteelBytes about 14 years
    might also need mb_internal_encoding() first
  • Admin
    Admin over 10 years
    var_dump(mb_strtolower('ԱԱԱ', mb_detect_encoding('ԱԱԱ'))); // string(6) "աաա" 100% Working!!!!
  • Dumitru
    Dumitru over 6 years
    I had the same problem and it's worked for me! Thank you a lot!
  • ESP32
    ESP32 over 3 years
    I lost 8 hours with debugging... finaly found that strtolower was the problem.