PHP mb_substr() not working correctly?
22,152
Solution 1
Try passing the encoding parameter to mb_substr, as such:
print mb_substr('éxxx', 0, 1, 'utf-8');
The encoding is never detected automatically.
Solution 2
In practice I've found that, in some systems, multi-byte functions default to ISO-8859-1 for internal encoding. That effectively ruins their ability to handle multi-byte text.
Setting a good default will probably fix this and some other issues:
mb_internal_encoding('UTF-8');
Comments
-
Alex over 3 years
This code
print mb_substr('éxxx', 0, 1);
prints an empty space :(
It is supposed to print the first character,
é
. This seems to work however:print mb_substr('éxxx', 0, 2);
But it's not right, because (0, 2) means 2 characters...
-
Gromski over 11 yearsThe encoding is never detected automatically, it just always defaults to something.
-
Alvin Wong over 11 yearsCould it be a better idea if you use
mb_detect_encoding
to actually try to detect the encoding? -
Gromski over 11 years@AlvinWong No. Know what encoding you're working with, there's no other way.
-
povilasp over 11 years@Alvin Wong, that would be more correct, yes, but I could also say that using anything but utf-8 can be considered adventurous and marginal :)
-
povilasp over 11 years@deceze, wasn't sure, but thanks for the clarification, I updated the answer.
-
Alex over 11 yearstx that works. Can mb_substr work like
substr($string, 1)
without giving it the mb_strlen() argument ? -
povilasp over 11 years@Alex, that I think is another question, but my guess would be that yes - because the parameter is optional as it is in substr.
-
Alex over 11 yearsyes, but that UTF-8 thing has to go after that argument. Anyway nvm, I`ll just use mb_strlen ..
-
Alvin Wong over 11 yearsOK, then how about
mb_internal_encoding
instead of passing"utf-8"
to allmb_*
functions? Just like Álvaro G. Vicario has pointed out -
povilasp over 11 years@AlvinWong is right, it's better to look to mb_internal_encoding if this is not only function usage and you are planning to use a lot of mb_* functions through out your code.