How to write file in UTF-8 format?

162,746

Solution 1

file_get_contents() and file_put_contents() will not magically convert encoding.

You have to convert the string explicitly; for example with iconv() or mb_convert_encoding().

Try this:

$data = file_get_contents($npath);
$data = mb_convert_encoding($data, 'UTF-8', 'OLD-ENCODING');
file_put_contents('tempfolder/' . $a, $data);

Or alternatively, with PHP's stream filters:

$fd = fopen($file, 'r');
stream_filter_append($fd, 'convert.iconv.UTF-8/OLD-ENCODING');
stream_copy_to_stream($fd, fopen($output, 'w'));

Solution 2

Add BOM: UTF-8

file_put_contents($myFile, "\xEF\xBB\xBF".  $content); 

Solution 3

<?php
    function writeUTF8File($filename, $content) {
        $f = fopen($filename, "w");
        # Now UTF-8 - Add byte order mark
        fwrite($f, pack("CCC", 0xef, 0xbb, 0xbf));
        fwrite($f, $content);
        fclose($f);
    }
?>

Solution 4

Iconv to the rescue.

Solution 5

On Unix/Linux, a simple shell command could be used alternatively to convert all files from a given directory:

recode L1..UTF8 dir/*

It could be started via PHP's exec() as well.

Share:
162,746
Starmaster
Author by

Starmaster

web developer, skydiver, traveller, ...

Updated on August 02, 2022

Comments

  • Starmaster
    Starmaster over 1 year

    I have bunch of files that are not in UTF-8 encoding and I'm converting a site to UTF-8 encoding.

    I'm using simple script for files that I want to save in utf-8, but the files are saved in old encoding:

    header('Content-type: text/html; charset=utf-8');
    mb_internal_encoding('UTF-8');
    $fpath="folder";
    $d=dir($fpath);
    while (False !== ($a = $d->read()))
     {
    
     if ($a != '.' and $a != '..')
      {
    
      $npath=$fpath.'/'.$a;
    
      $data=file_get_contents($npath);
    
      file_put_contents('tempfolder/'.$a, $data);
    
      }
    
     }
    

    How can I save files in utf-8 encoding?

  • Starmaster
    Starmaster about 13 years
    Didn't know about this command. Thanks! I use Linux even as workstation, all of my local servers are on Linux. And what does L1.. in the command means?
  • mario
    mario about 13 years
    @Starmaster: L1 is shorthand for Latin-1, the source charset.
  • cuzzea
    cuzzea almost 11 years
    I was trying to create a php download script in order to use UTF-8 for danish characters, this is what it was missing, ty
  • TSr
    TSr about 8 years
    It also works to UTF-16 but with that bytes: fwrite($f, pack("CC",0xff,0xfe));
  • David R.
    David R. over 6 years
    This should be the accepted answer... short and sweet, and works!
  • Jaakko Uusitalo
    Jaakko Uusitalo about 6 years
    What is $a variable on line 3 of first example?
  • papo
    papo over 5 years
    There is a distinction between creating a file recognized as an UTF-8 and converting the content which goes to that file. A plain text file without special characters has the same content as UTF-8 without BOM, also parsers which might be processing your text have an encoding option. PHP uses UTF-8 itself, so if you see text OK but file does not seem to be UTF-8, chances are the text is UTF-8 and adding BOM is all you need. But, it's not converting. This problem is seen often, because PHP is lazy adding BOM, but it itself is expecting it on input.
  • zooks
    zooks over 4 years
    In case of using stream_filter_append: OLD-ENCODING/UTF-8
  • Andy Borgmann
    Andy Borgmann about 3 years
    I had a slightly different issue than the OP, but this solved my issue. I didn't use file_put_contents, but instead used header to download the file immediately. The data was already in UTF-8 in the database, but it wasn't working the CSV download. This worked great. Thank you.
  • Aseel Ashraf
    Aseel Ashraf over 2 years
    @tSr you're a life saver
  • Peter Mortensen
    Peter Mortensen about 2 years
    Can you elaborate?
  • Peter Mortensen
    Peter Mortensen about 2 years
    This is similar to Alaa's answer.
  • Peter Mortensen
    Peter Mortensen about 2 years
    What is "No Mark"? Without a BOM?
  • Peter Mortensen
    Peter Mortensen about 2 years
    Why "php,txt"? Shouldn't it be "php.txt"?