php sprintf() with foreign characters?
Solution 1
Strings in PHP are basically arrays of bytes (not characters). They cannot work natively with multibyte encodings (such as UTF-8).
For details see:
https://www.php.net/manual/en/language.types.string.php#language.types.string.details
Most string functions in PHP have multibyte equivalent though (with the mb_
prefix). But the sprintf
does not.
There's a user comment (by "viktor at textalk dot com") with multibyte implementation of the sprintf
on the function's documentation page at php.net. It may work for you:
https://www.php.net/manual/en/function.sprintf.php#89020
Solution 2
I was actually trying to find out if PHP ^7 finally has a native mb_sprintf()
but apparently no xD.
For the sake of completeness, here is a simple solution I've been using in some old projects. It just adds the diff between strlen
& mb_strlen
to the desired $targetLengh
.
The non-multibyte example is just added for the sake of easy comparison =).
$text = "Gultigkeitsprufung ist fehlgeschlagen: %{errors}";
$mbText = "Gültigkeitsprüfung ist fehlgeschlagen: %{errors}";
$mbTextRussian = "Проверка не удалась: %{errors}";
$targetLength = 60;
$mbTargetLength = strlen($mbText) - mb_strlen($mbText) + $targetLength;
$mbRussianTargetLength = strlen($mbTextRussian) - mb_strlen($mbTextRussian) + $targetLength;
printf("%{$targetLength}s\n", $text);
printf("%{$mbTargetLength}s\n", $mbText);
printf("%{$mbRussianTargetLength}s\n", $mbTextRussian);
result
Gultigkeitsprufung ist fehlgeschlagen: %{errors}
Gültigkeitsprüfung ist fehlgeschlagen: %{errors}
Проверка не удалась: %{errors}
update 2019-06-12
@flowtron made me give it another thought. A simple mb_sprintf()
could look like this.
function mb_sprintf($format, ...$args) {
$params = $args;
$callback = function ($length) use (&$params) {
$value = array_shift($params);
return strlen($value) - mb_strlen($value) + $length[0];
};
$format = preg_replace_callback('/(?<=%|%-)\d+(?=s)/', $callback, $format);
return sprintf($format, ...$args);
}
echo mb_sprintf("%-10s %-10s %10s\n", 'thüs', 'wörks', 'ök');
echo mb_sprintf("%-10s %-10s %10s\n", 'this', 'works', 'ok');
result
thüs wörks ök
this works ok
I only did some happy path testing here, but it works for PHP >=5.6 and should be good enough to give ppl an idea on how to encapsulate the behavior.
It does not work with the repetition/order modifiers though - e.g. %1$20s
will be ignored/remain unchanged.
Solution 3
If you're using characters that fit in the ISO-8859-1 character set, you can convert the strings before formatting, and convert the result back to UTF8 when you are done
utf8_encode(sprintf("%-12s %-8s", utf8_decode($paramOne), utf8_decode($paramTwo))
Related videos on Youtube
Mille
Updated on June 08, 2022Comments
-
Mille almost 2 years
Seams to be like sprintf have a problem with foregin characters? Or is it me doing something wrong? Looks like it work when removing chars like åäö from the string though. Should that be necessary?
I want the following lines to be aligned correctly for a report:
2011-11-27 A1823 -Ref. Leif - 12 873,00 18.98 2011-11-30 A1856 -Rättat xx - 6 594,00 19.18
I'm using sprintf() like this: %-12s %-8s -%-10s -%20s %8.2f
Using: php-5.3.23-nts-Win32-VC9-x86
-
PleaseStand almost 11 yearsThis problem (that different characters consist of different numbers of bytes and different grapheme clusters consist of different numbers of characters) is somewhat similar to (but not the same as) stackoverflow.com/questions/9166698/…. The bottom line is that it might be easiest to put the data in an HTML table instead.
-
xyphoid over 10 yearsYeah this is definitely not a duplicate, this question is about multibyte characters is sprintf(), the other one is about font display widths.
-
Gérald Croës over 10 yearsThis was not a duplicate question at all... You can do the trick by doing : utf8_encode(sprintf('format', utf8_decode($yourstring));... Of course you'll have to check every arguments if many are given.
-
some about 10 yearsThis question is about characters with a unicode code point above 127, that when encoded with UTF-8 uses more than one byte. Unfortunately
sprintf
andprintf
don't handle that. When printing a 2 character string that uses 6 bytes when encoded with UTF-8,%8s
prints the wrong number of spaces (8-6=2) instead of (8-2=6). This has NOTHING to do with the font used, like the question that this question is supposed to be duplicate of. This question is about phps' lack of support for multibyte characters.
-
-
flowtron almost 5 yearscorrect explanation, but the linked function does not work for me – even after doing the mb_* function name replacements mentioned in the remarks. I'd hoped for a better solution than @nimmneun has provided, it's my current hacky solution too.
-
flowtron almost 5 yearsI had hoped to find something less hacky, because this is the way I've been doing it too - upvoted since the linked routine in @Martin Prikryl doesn't work (for me).
-
nimmneun almost 5 yearsyou made me give it another though =)