PHP Convert Windows-1256 encoded text to UTF-8

18,622

Solution 1

Check this: http://rayed.com/wordpress/wp-content/upload/lib.utf2win.php.txt

Apparently he also had some problems, because he wrote this script, if you can reverse that, it might work.

I reversed it for you, try that:

$f[]="\xc2\xac";  $t[]="\x80";
$f[]="\xd9\xbe";  $t[]="\x81";
$f[]="\xc0\x9a";  $t[]="\x82";
$f[]="\xc6\x92";  $t[]="\x83";
$f[]="\xc0\x9e";  $t[]="\x84";
$f[]="\xc0\xa6";  $t[]="\x85";
$f[]="\xc0\xa0";  $t[]="\x86";
$f[]="\xc0\xa1";  $t[]="\x87";
$f[]="\xcb\x86";  $t[]="\x88";
$f[]="\xc0\xb0";  $t[]="\x89";
$f[]="\xd9\xb9";  $t[]="\x8a";
$f[]="\xc0\xb9";  $t[]="\x8b";
$f[]="\xc5\x92";  $t[]="\x8c";
$f[]="\xda\x86";  $t[]="\x8d";
$f[]="\xda\x98";  $t[]="\x8e";
$f[]="\xda\x88";  $t[]="\x8f";
$f[]="\xda\xaf";  $t[]="\x90";
$f[]="\xc0\x98";  $t[]="\x91";
$f[]="\xc0\x99";  $t[]="\x92";
$f[]="\xc0\x9c";  $t[]="\x93";
$f[]="\xc0\x9d";  $t[]="\x94";
$f[]="\xc0\xa2";  $t[]="\x95";
$f[]="\xc0\x93";  $t[]="\x96";
$f[]="\xc0\x94";  $t[]="\x97";
$f[]="\xda\xa9";  $t[]="\x98";
$f[]="\xc4\xa2";  $t[]="\x99";
$f[]="\xda\x91";  $t[]="\x9a";
$f[]="\xc0\xba";  $t[]="\x9b";
$f[]="\xc5\x93";  $t[]="\x9c";
$f[]="\xc0\x8c";  $t[]="\x9d";
$f[]="\xc0\x8d";  $t[]="\x9e";
$f[]="\xda\xba";  $t[]="\x9f";
$f[]="\xd8\x8c";  $t[]="\xa1";
$f[]="\xda\xbe";  $t[]="\xaa";
$f[]="\xd8\x9b";  $t[]="\xba";
$f[]="\xd8\x9f";  $t[]="\xbf";
$f[]="\xdb\x81";  $t[]="\xc0";
$f[]="\xd8\xa1";  $t[]="\xc1";
$f[]="\xd8\xa2";  $t[]="\xc2";
$f[]="\xd8\xa3";  $t[]="\xc3";
$f[]="\xd8\xa4";  $t[]="\xc4";
$f[]="\xd8\xa5";  $t[]="\xc5";
$f[]="\xd8\xa6";  $t[]="\xc6";
$f[]="\xd8\xa7";  $t[]="\xc7";
$f[]="\xd8\xa8";  $t[]="\xc8";
$f[]="\xd8\xa9";  $t[]="\xc9";
$f[]="\xd8\xaa";  $t[]="\xca";
$f[]="\xd8\xab";  $t[]="\xcb";
$f[]="\xd8\xac";  $t[]="\xcc";
$f[]="\xd8\xad";  $t[]="\xcd";
$f[]="\xd8\xae";  $t[]="\xce";
$f[]="\xd8\xaf";  $t[]="\xcf";
$f[]="\xd8\xb0";  $t[]="\xd0";
$f[]="\xd8\xb1";  $t[]="\xd1";
$f[]="\xd8\xb2";  $t[]="\xd2";
$f[]="\xd8\xb3";  $t[]="\xd3";
$f[]="\xd8\xb4";  $t[]="\xd4";
$f[]="\xd8\xb5";  $t[]="\xd5";
$f[]="\xd8\xb6";  $t[]="\xd6";
$f[]="\xd8\xb7";  $t[]="\xd8";
$f[]="\xd8\xb8";  $t[]="\xd9";
$f[]="\xd8\xb9";  $t[]="\xda";
$f[]="\xd8\xba";  $t[]="\xdb";
$f[]="\xd9\x80";  $t[]="\xdc";
$f[]="\xd9\x81";  $t[]="\xdd";
$f[]="\xd9\x82";  $t[]="\xde";
$f[]="\xd9\x83";  $t[]="\xdf";
$f[]="\xd9\x84";  $t[]="\xe1";
$f[]="\xd9\x85";  $t[]="\xe3";
$f[]="\xd9\x86";  $t[]="\xe4";
$f[]="\xd9\x87";  $t[]="\xe5";
$f[]="\xd9\x88";  $t[]="\xe6";
$f[]="\xd9\x89";  $t[]="\xec";
$f[]="\xd9\x8a";  $t[]="\xed";
$f[]="\xd9\x8b";  $t[]="\xf0";
$f[]="\xd9\x8c";  $t[]="\xf1";
$f[]="\xd9\x8d";  $t[]="\xf2";
$f[]="\xd9\x8e";  $t[]="\xf3";
$f[]="\xd9\x8f";  $t[]="\xf5";
$f[]="\xd9\x90";  $t[]="\xf6";
$f[]="\xd9\x91";  $t[]="\xf8";
$f[]="\xd9\x92";  $t[]="\xfa";
$f[]="\xc0\x8e";  $t[]="\xfd";
$f[]="\xc0\x8f";  $t[]="\xfe";
$f[]="\xdb\x92";  $t[]="\xff";

function win_to_utf8($str) {
  global $f, $t;
  return str_replace($t, $f, $str);
}

Solution 2

Trying

echo iconv('WINDOWS-1256', 'UTF-8', 'testÍÊ');

...on http://writecodeonline.com/php/ seems to work correctly (produces testأچأٹ)

Solution 3

Try this, should work:

iconv("windows-1256", "utf-8//TRANSLIT//IGNORE", $text)

Share:
18,622
applechief
Author by

applechief

Updated on June 17, 2022

Comments

  • applechief
    applechief almost 2 years

    I am getting Windows-1256 encoded text from the web and nee to convert it to utf-8.

    I tried using mb_convert_encoding and iconv but they don't seem to work.

    none of them seem to be capable of handling windows-1256.

    How to do it?

    Edit: More details about the errors. When trying

    mb_convert_encoding($text,"utf-8", "windows-1256");
    

    I get

    Message: mb_convert_encoding() [function.mb-convert-encoding]: Illegal character encoding specified

    And when i try

    iconv("windows-1256", "utf-8", $text);
    

    I get no errors but it returns an empty string

  • applechief
    applechief over 12 years
    I didn't test it, but how efficient would it be for large texts?
  • Derk Arts
    Derk Arts over 12 years
    I doubt it will be much less efficient than for instance mb_convert or iconv. How do recon those functions work? Maybe at a lower level, but still, you need to replace those characters. And anyway, why not give it a go and see how it goes. As is said, not my code so I'm curious.
  • Frosty Z
    Frosty Z over 12 years
    The two last characters "ÍÊ" should match Windows-1256 characters. See en.wikipedia.org/wiki/Windows-1256. That's why I get arabic chars in the output.
  • MindRoasterMir
    MindRoasterMir over 3 years
    echo iconv('WINDOWS-1256', 'UTF-8', $text); this also worked
  • MindRoasterMir
    MindRoasterMir over 3 years
    echo iconv('WINDOWS-1256', 'UTF-8', $text);