Write a file with encoding UTF-8 in php

10,220

Solution 1

If your 3rd party program "do not support files in ANSI but UTF-8" as you mentioned in a comment then most likely it's expecting a BOM.

While the Unicode Standard does allow a BOM in UTF-8,[2] it does not require or recommend it.[3] Byte order has no meaning in UTF-8[4] so a BOM serves only to identify a text stream or file as UTF-8.

The reason the BOM is recommended against is that it defeats the ASCII back-compatibility that is part of UTF-8's design.

So strictly speaking your 3rd party program isn't completely compliant with the standard because the BOM should be optional. ANSI is 100% valid UTF-8 and that is one of the main drivers of it. Anything that can understand UTF-8 accordng to the standard by definition also understands ANSI.

Try writing "\xEF\xBB\xBF" to the front of the file and see if that solves your problem.

Solution 2

I do not know of a database that will do the encoding conversion for you easily. For example, in MySQL, you have to reset all the character encodings for the db, tables, and columns, AND THEN convert the data.

I would suggest instead that you create your database dump and use iconv to change the encoding, whether on the command line:

iconv -f original_charset -t utf-8 dumpTextData > convertedTextData

or in PHP (taken from How to write file in UTF-8 format?)

$input = fopen($file, 'r');
$output = fopen($file, 'w');
stream_filter_append($input, 'convert.iconv.UTF-8/OLD-ENCODING');
stream_copy_to_stream($input, $output);
fclose($input);
fclose($output);

NOTE: edited to avoid leaking file descriptors.

Share:
10,220
devasia2112
Author by

devasia2112

I Work with Development and Operations. A.K.A. DevOps. Specialties Operating System: [UNIX-LIKE, GNU/Linux, *BSD] Programming Language: [Python, PHP, C/C++, Shell Script, TCL, Java, Java Script] Data Base: [MySQL, Postgres, MongoDB] Web Servers: [Apache, nginx, TomCat] IDE: [Eclipse, PhpStorm, Atom] Code Editor: [nano, vi, Emacs, Sublime] Repo: [Git]

Updated on June 04, 2022

Comments

  • devasia2112
    devasia2112 almost 2 years

    P.S.: It is not a duplicated question, because I'm not looking to write contents in a file because it is already done, I'm looking to change a type of a file to be UTF-8, there is a difference in it.

    How to generate the UTF-8 file and not ANSI. (Is not the contents).

    For example, the most IDE have an option encoding, where you are able to modify the type of your file, but I'm generating a bulk from my database, and it generates a lot of individual text files, but the whole files is ANSI default.. I'm just looking for a function in php that make it possible to change the encoding before it generates the bulk.

    If the source code help I can post it here. just let me know.

    Thanks in advance.

    EDITED

    Follow a print of what I'm asking here.

    enter image description here

    When I generate the file "testecli01.csv" it always get encoding ANSI, whatever I do in my script it is always ANSI, and I need in UTF-8, just this. Is simple but I have no idea how to do.

    • zneak
      zneak almost 13 years
      Except that you're generating files from a database, the question How to write file in UTF-8 format? quite matches yours. There is no magical call to change the encoding of a file, you have to read it, change the encoding, then write it back.
    • shelhamer
      shelhamer almost 13 years
      The above comment has it right. There's not magical database encoding conversion free lunch.
    • devasia2112
      devasia2112 almost 13 years
      Is not the same question, it is about the file itself and not the contents of the file. Is a thing that freak me out.. no good resources, even the php docs itself.. I can do it by hand, but I have thousand of files ... 0_o
    • zneak
      zneak almost 13 years
      @Fernando, text files don't have an 'encoding' property. The closest thing to that, for UTF-8, is a BOM marker at the beginning of a file. But even then, you still have to convert the contents of the file to UTF-8: just throwing in a BOM isn't going to fix anything unless there are no special characters in your file, in which case they were valid UTF-8 to start with.
    • Accountant م
      Accountant م over 7 years
      notebad++ worked as charm with me, regarding converting files encoding. download it for free
  • zneak
    zneak almost 13 years
    Your copied answer leaks file descriptors. If you have more than a few hundred files, this will cause problems.
  • shelhamer
    shelhamer almost 13 years
    @zneak thanks for pointing that out. I forget you can't trust people to know you need an fclose. Edited to include.
  • zneak
    zneak almost 13 years
    You're still leaking the file descriptor from fopen in the stream_copy_to_stream. :) I've fixed it for you.
  • shelhamer
    shelhamer almost 13 years
    ...and I need to remember how to read haha. I went to edit but you beat me to it. Thanks.
  • devasia2112
    devasia2112 almost 13 years
    @Zneak Actually it has thousands and not hundreds of files, it is not in full use, but I expect to use in a production environment.. The problem is the thousand of .txt files will be imported by a 3td party program and it do not support files in ANSI but UTF-8. Then the txt files need to be in such a way..
  • shelhamer
    shelhamer almost 13 years
    If you notice, the answer was already edited to not leak descriptors and is correct (see the two fclose). A file "in" utf-8 is a utf-8 file, I am not sure what the problem you are facing is.
  • devasia2112
    devasia2112 almost 13 years
    In true, the answer is good, I did myself something very close to this, but is my specific case it will not work. Also I had edited my post, I think it is easy to understand now.
  • devasia2112
    devasia2112 almost 13 years
    The 3rd party program is from gov and it is a very old program, ascent is not allowed, then you can imagine what type of program ... Does not matter, cause I generate the data. I think ANSI is correct 'cause it has all ascent, and the data is OK, but the gov program do not accept ascent, perheps I remove all ascent from my database.. ahahaha Thanks anyway, BOM I did'nt know about it.