PHP: mb_strtoupper not working

12,114

Solution 1

Instead of strtoupper()/mb_strtoupper() use mb_convert_case() since upper case converting is very tricky across different encodings, also make sure your string IS UTF-8.

$content = 'Le Courrier de Sáint-Hyácinthe';

mb_internal_encoding('UTF-8');
if(!mb_check_encoding($content, 'UTF-8')
    OR !($content === mb_convert_encoding(mb_convert_encoding($content, 'UTF-32', 'UTF-8' ), 'UTF-8', 'UTF-32'))) {

    $content = mb_convert_encoding($content, 'UTF-8'); 
}

// LE COURRIER DE SÁINT-HYÁCINTHE
echo mb_convert_case($content, MB_CASE_UPPER, "UTF-8"); 

Working example: http://3v4l.org/enEfm#v443

See also my comment at the PHP website about the converter: http://www.php.net/manual/function.utf8-encode.php#102382

Solution 2

It works for me, but only when the php file itself is saved as UTF-8 and when the terminal that I'm in expects UTF-8. I think what is happening for you is that the file is saved as ISO-8859-1 and your terminal is expecting ISO-8859-1.

First, mb_detect_encoding doesn't actually work for this string. Even when the PHP file is not UTF-8, it still reports it as UTF-8.

When you print the lower case string, it prints ISO-8859-1 characters and your terminal displays them just fine. Then when you convert to upper case using UTF-8, it gets mangled.

I created two versions of this file. I saved it using my text editor in ISO-8859-1 as iso-8859-1.php. Then I used iconv to convert the entire file to UTF-8 and saved it as utf-8.php

iconv iso-8859-1.php --from iso-8859-1 --to UTF-8 > utf-8.php

I added a line to print the result the encoding that mb_detect_encoding returns.

$ file iso-8859-1.php 
iso-8859-1.php: PHP script, ISO-8859 text

$ php iso-8859-1.php 
ENCODING: UTF-8
DEBUG1 Le Courrier de S�int-Hy�cinthe
DEBUG2 LE COURRIER DE S?INT-HY?CINTHE

$ file utf-8.php 
utf-8.php: PHP script, UTF-8 Unicode text

$ php utf-8.php 
ENCODING: UTF-8
DEBUG1 Le Courrier de Sáint-Hyácinthe
DEBUG2 LE COURRIER DE SÁINT-HYÁCINTHE

My terminal actually expects UTF-8 text, so when I print out ISO-8859-1 text it gets mangled. Everything works correctly when the file is saved as utf-8 and the terminal expects utf-8.

Solution 3

Actually, what works here is simply

<?php
mb_internal_encoding('UTF-8');

$x='Le Courrier de Sáint-Hyácinthe';
echo mb_strtoupper( $x ) . "\n";

outputs

LE COURRIER DE SÁINT-HYÁCINTHE

here it works directly, but maybe in your case you have to add utf8_encode:

$x = utf8_encode( 'Le Courrier de Sáint-Hyácinthe' );

--

An alternative that works here without MB,

<?php
echo strtoupper(str_replace('á', 'Á', 'Le Courrier de Sáint-Hyácinthe'));
Share:
12,114
Alasdair
Author by

Alasdair

Updated on June 27, 2022

Comments

  • Alasdair
    Alasdair over 1 year

    I have a problem with UTF-8 and mb_strtoupper.

    mb_internal_encoding('UTF-8');
    $guesstitlestring='Le Courrier de Sáint-Hyácinthe';
    
    $encoding=mb_detect_encoding($guesstitlestring);
    if ($encoding!=='UTF-8') $guesstitlestring=mb_convert_encoding($guesstitlestring,'UTF-8',$encoding);
    
    echo "DEBUG1 $guesstitlestring\n";
    $guesstitlestring=mb_strtoupper($guesstitlestring);
    echo "DEBUG2 $guesstitlestring\n";
    

    Result:

    DEBUG1 Le Courrier de Sáint-Hyácinthe
    DEBUG2 LE COURRIER DE S?INT-HY?CINTHE
    

    I don't understand why this is happening? I'm trying to be as careful as I can with the encoding. The string is given first as a UTF-8, verified and possible reconverted to UTF-8. It's a nightmare!

    UPDATE

    So I've figured out that this was caused by a combination of my entering the arguments via the console and the arguments coming back out of the console. So they were garbled both on the way in and the way out. The solution is to not enter any of the arguments in this way, or get the arguments out in this way.

    Thank you everyone for your help in resolving this issue!

  • ozahorulia
    ozahorulia over 10 years
    Why the á is in lower case in the output?
  • emmanuel honore
    emmanuel honore over 10 years
    @Hast I'm not sure. Maybe at only at the french character encoding the upper case á exists?
  • ozahorulia
    ozahorulia over 10 years
    I just run an example from the question in my console and it echoed: DEBUG2 LE COURRIER DE SÁINT-HYÁCINTHE
  • Déjà vu
    Déjà vu over 10 years
    In French there is no word having a "á" by the way - having a "à", yes there are. But that is not the problem here anyway...
  • emmanuel honore
    emmanuel honore over 10 years
    @Hast found the solution: mb_convert_case()
  • Alasdair
    Alasdair over 10 years
    mb_convert_case I tried already and it does not help my problem, unfortunately. It's the same.
  • emmanuel honore
    emmanuel honore over 10 years
    because mb_detect_encoding does not work, I check if the encoded and again decoded string is still the original string in my answer: stackoverflow.com/a/15051401/22470
  • Alasdair
    Alasdair over 10 years
    OK. But I can't do this because the string is given as an argument into the PHP script on the console. So I need a way to force it into UTF-8 somehow from inside the PHP script already.
  • emmanuel honore
    emmanuel honore over 10 years
    See my answer, I convert the string to UTF-8 no matter what the input string is...
  • Alasdair
    Alasdair over 10 years
    It doesn't work for me... still gives exactly the same result of ?
  • emmanuel honore
    emmanuel honore over 10 years
    @Alasdair have you run my whole example?
  • Danyal Sandeelo
    Danyal Sandeelo over 6 years
    @powtac I referenced your code here stackoverflow.com/questions/42480477/… and upvoted you.