PHP mb_substr() not working correctly?

22,152

Solution 1

Try passing the encoding parameter to mb_substr, as such:

print mb_substr('éxxx', 0, 1, 'utf-8');

The encoding is never detected automatically.

Solution 2

In practice I've found that, in some systems, multi-byte functions default to ISO-8859-1 for internal encoding. That effectively ruins their ability to handle multi-byte text.

Setting a good default will probably fix this and some other issues:

mb_internal_encoding('UTF-8');
Share:
22,152
Alex
Author by

Alex

I'm still learning so I'm only here to ask questions :P

Updated on December 20, 2020

Comments

  • Alex
    Alex over 3 years

    This code

    print mb_substr('éxxx', 0, 1);

    prints an empty space :(

    It is supposed to print the first character, é. This seems to work however:

    print mb_substr('éxxx', 0, 2);

    But it's not right, because (0, 2) means 2 characters...

  • Gromski
    Gromski over 11 years
    The encoding is never detected automatically, it just always defaults to something.
  • Alvin Wong
    Alvin Wong over 11 years
    Could it be a better idea if you use mb_detect_encoding to actually try to detect the encoding?
  • Gromski
    Gromski over 11 years
    @AlvinWong No. Know what encoding you're working with, there's no other way.
  • povilasp
    povilasp over 11 years
    @Alvin Wong, that would be more correct, yes, but I could also say that using anything but utf-8 can be considered adventurous and marginal :)
  • povilasp
    povilasp over 11 years
    @deceze, wasn't sure, but thanks for the clarification, I updated the answer.
  • Alex
    Alex over 11 years
    tx that works. Can mb_substr work like substr($string, 1) without giving it the mb_strlen() argument ?
  • povilasp
    povilasp over 11 years
    @Alex, that I think is another question, but my guess would be that yes - because the parameter is optional as it is in substr.
  • Alex
    Alex over 11 years
    yes, but that UTF-8 thing has to go after that argument. Anyway nvm, I`ll just use mb_strlen ..
  • Alvin Wong
    Alvin Wong over 11 years
    OK, then how about mb_internal_encoding instead of passing "utf-8" to all mb_* functions? Just like Álvaro G. Vicario has pointed out
  • povilasp
    povilasp over 11 years
    @AlvinWong is right, it's better to look to mb_internal_encoding if this is not only function usage and you are planning to use a lot of mb_* functions through out your code.