Problem writing UTF-8 encoded file in PHP

15,778

Solution 1

First off, don't depend on mb_detect_encoding. It's not great at figuring out what the encoding is unless there's a bunch of encoding specific entities (meaning entities that are invalid in other encodings).

Try just getting rid of the mb_detect_encoding line all together.

Oh, and utf8_encode turns a Latin-1 string into a UTF-8 string (not from an arbitrary charset to UTF-8, which is what you really want)... You want iconv, but you need to know the source encoding (and since you can't really trust mb_detect_encoding, you'll need to figure it out some other way).

Or you can try using iconv with a empty input encoding $str = iconv('', 'UTF-8', $str); (which may or may not work)...

Solution 2

It doesn't work like that. Even if you utf8_encode($theString) you will not CREATE a UTF8 file.

The correct answer has something to do with the UTF-8 byte-order mark.

This to understand the issue: - http://en.wikipedia.org/wiki/Byte_order_mark
- http://unicode.org/faq/utf_bom.html

The solution is the following: As the UTF-8 byte-order mark is '\xef\xbb\xbf' we should add it to the document's header.

<?php
function writeStringToFile($file, $string){
$f=fopen($file, "wb");
$file="\xEF\xBB\xBF".$string; // utf8 bom
fputs($f, $string);
fclose($f);
}
?>

The $file could be anything text or xml... The $string is your UTF8 encoded string.

Try it now and it will write a UTF8 encoded file with your UTF8 content (string).

writeStringToFile('test.xml', 'éèàç');
Share:
15,778
user387302
Author by

user387302

Updated on June 04, 2022

Comments

  • user387302
    user387302 almost 2 years

    I have a large file that contains world countries/regions that I'm seperating into smaller files based on individual countries/regions. The original file contains entries like:

      EE.04 Järvamaa
      EE.05 Jõgevamaa
      EE.07 Läänemaa
    

    However when I extract that and write it to a new file, the text becomes:

      EE.04  Järvamaa
      EE.05  Jõgevamaa
      EE.07  Läänemaa
    

    To save my files I'm using the following code:

    mb_detect_encoding($text, "UTF-8") == "UTF-8" ? : $text = utf8_encode($text);
    $fp = fopen(MY_LOCATION,'wb');
    fwrite($fp,$text);
    fclose($fp);
    

    I tried saving the files with and without utf8_encode() and neither seems to work. How would I go about saving the original encoding (which is UTF8)?

    Thank you!