php sprintf() with foreign characters?

12,402

Solution 1

Strings in PHP are basically arrays of bytes (not characters). They cannot work natively with multibyte encodings (such as UTF-8).

For details see:
https://www.php.net/manual/en/language.types.string.php#language.types.string.details

Most string functions in PHP have multibyte equivalent though (with the mb_ prefix). But the sprintf does not.

There's a user comment (by "viktor at textalk dot com") with multibyte implementation of the sprintf on the function's documentation page at php.net. It may work for you:
https://www.php.net/manual/en/function.sprintf.php#89020

Solution 2

I was actually trying to find out if PHP ^7 finally has a native mb_sprintf() but apparently no xD.

For the sake of completeness, here is a simple solution I've been using in some old projects. It just adds the diff between strlen & mb_strlen to the desired $targetLengh. The non-multibyte example is just added for the sake of easy comparison =).

$text = "Gultigkeitsprufung ist fehlgeschlagen: %{errors}";
$mbText = "Gültigkeitsprüfung ist fehlgeschlagen: %{errors}";
$mbTextRussian = "Проверка не удалась: %{errors}";

$targetLength = 60;
$mbTargetLength = strlen($mbText) - mb_strlen($mbText) + $targetLength;
$mbRussianTargetLength = strlen($mbTextRussian) - mb_strlen($mbTextRussian) + $targetLength;

printf("%{$targetLength}s\n", $text);
printf("%{$mbTargetLength}s\n", $mbText);
printf("%{$mbRussianTargetLength}s\n", $mbTextRussian);

result

            Gultigkeitsprufung ist fehlgeschlagen: %{errors}
            Gültigkeitsprüfung ist fehlgeschlagen: %{errors}
                              Проверка не удалась: %{errors}

update 2019-06-12


@flowtron made me give it another thought. A simple mb_sprintf() could look like this.

function mb_sprintf($format, ...$args) {
    $params = $args;

    $callback = function ($length) use (&$params) {
        $value = array_shift($params);
        return strlen($value) - mb_strlen($value) + $length[0];
    };

    $format = preg_replace_callback('/(?<=%|%-)\d+(?=s)/', $callback, $format);

    return sprintf($format, ...$args);
}

echo mb_sprintf("%-10s %-10s %10s\n", 'thüs', 'wörks', 'ök');
echo mb_sprintf("%-10s %-10s %10s\n", 'this', 'works', 'ok');

result

thüs       wörks              ök
this       works              ok

I only did some happy path testing here, but it works for PHP >=5.6 and should be good enough to give ppl an idea on how to encapsulate the behavior. It does not work with the repetition/order modifiers though - e.g. %1$20s will be ignored/remain unchanged.

Solution 3

If you're using characters that fit in the ISO-8859-1 character set, you can convert the strings before formatting, and convert the result back to UTF8 when you are done

utf8_encode(sprintf("%-12s %-8s", utf8_decode($paramOne), utf8_decode($paramTwo))
Share:
12,402

Related videos on Youtube

Mille
Author by

Mille

Updated on June 08, 2022

Comments

  • Mille
    Mille almost 2 years

    Seams to be like sprintf have a problem with foregin characters? Or is it me doing something wrong? Looks like it work when removing chars like åäö from the string though. Should that be necessary?

    I want the following lines to be aligned correctly for a report:

    2011-11-27   A1823    -Ref. Leif  -           12 873,00    18.98
    2011-11-30   A1856    -Rättat xx -            6 594,00    19.18
    

    I'm using sprintf() like this: %-12s %-8s -%-10s -%20s %8.2f

    Using: php-5.3.23-nts-Win32-VC9-x86

    • PleaseStand
      PleaseStand almost 11 years
      This problem (that different characters consist of different numbers of bytes and different grapheme clusters consist of different numbers of characters) is somewhat similar to (but not the same as) stackoverflow.com/questions/9166698/…. The bottom line is that it might be easiest to put the data in an HTML table instead.
    • xyphoid
      xyphoid over 10 years
      Yeah this is definitely not a duplicate, this question is about multibyte characters is sprintf(), the other one is about font display widths.
    • Gérald Croës
      Gérald Croës over 10 years
      This was not a duplicate question at all... You can do the trick by doing : utf8_encode(sprintf('format', utf8_decode($yourstring));... Of course you'll have to check every arguments if many are given.
    • some
      some about 10 years
      This question is about characters with a unicode code point above 127, that when encoded with UTF-8 uses more than one byte. Unfortunately sprintf and printf don't handle that. When printing a 2 character string that uses 6 bytes when encoded with UTF-8, %8s prints the wrong number of spaces (8-6=2) instead of (8-2=6). This has NOTHING to do with the font used, like the question that this question is supposed to be duplicate of. This question is about phps' lack of support for multibyte characters.
  • flowtron
    flowtron almost 5 years
    correct explanation, but the linked function does not work for me – even after doing the mb_* function name replacements mentioned in the remarks. I'd hoped for a better solution than @nimmneun has provided, it's my current hacky solution too.
  • flowtron
    flowtron almost 5 years
    I had hoped to find something less hacky, because this is the way I've been doing it too - upvoted since the linked routine in @Martin Prikryl doesn't work (for me).
  • nimmneun
    nimmneun almost 5 years
    you made me give it another though =)