UTF-8 problems while reading CSV file with fgetcsv

121,921

Solution 1

Now I got it working (after removing the header command). I think the problem was that the encoding of the php file was in ISO-8859-1. I set it to UTF-8 without BOM. I thought I already have done that, but perhaps I made an additional undo.

Furthermore, I used SET NAMES 'utf8' for the database. Now it is also correct in the database.

Solution 2

Try this:

<?php
$handle = fopen ("specialchars.csv","r");
echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
while ($data = fgetcsv ($handle, 1000, ";")) {
        $data = array_map("utf8_encode", $data); //added
        $num = count ($data);
        for ($c=0; $c < $num; $c++) {
            // output data
            echo "<td>$data[$c]</td>";
        }
        echo "</tr><tr>";
}
?>

Solution 3

Encountered similar problem: parsing CSV file with special characters like é, è, ö etc ...

The following worked fine for me:

To represent the characters correctly on the html page, the header was needed :

header('Content-Type: text/html; charset=UTF-8');

In order to parse every character correctly, I used:

utf8_encode(fgets($file));

Dont forget to use in all following string operations the 'Multibyte String Functions', like:

mb_strtolower($value, 'UTF-8');

Solution 4

Try putting this into the top of your file (before any other output):

<?php

header('Content-Type: text/html; charset=UTF-8');

?>

Solution 5

In my case the source file has windows-1250 encoding and iconv prints tons of notices about illegal characters in input string...

So this solution helped me a lot:

/**
 * getting CSV array with UTF-8 encoding
 *
 * @param   resource    &$handle
 * @param   integer     $length
 * @param   string      $separator
 *
 * @return  array|false
 */
private function fgetcsvUTF8(&$handle, $length, $separator = ';')
{
    if (($buffer = fgets($handle, $length)) !== false)
    {
        $buffer = $this->autoUTF($buffer);
        return str_getcsv($buffer, $separator);
    }
    return false;
}

/**
 * automatic convertion windows-1250 and iso-8859-2 info utf-8 string
 *
 * @param   string  $s
 *
 * @return  string
 */
private function autoUTF($s)
{
    // detect UTF-8
    if (preg_match('#[\x80-\x{1FF}\x{2000}-\x{3FFF}]#u', $s))
        return $s;

    // detect WINDOWS-1250
    if (preg_match('#[\x7F-\x9F\xBC]#', $s))
        return iconv('WINDOWS-1250', 'UTF-8', $s);

    // assume ISO-8859-2
    return iconv('ISO-8859-2', 'UTF-8', $s);
}

Response to @manvel's answer - use str_getcsv instead of explode - because of cases like this:

some;nice;value;"and;here;comes;combinated;value";and;some;others

explode will explode string into parts:

some
nice
value
"and
here
comes
combinated
value"
and
some
others

but str_getcsv will explode string into parts:

some
nice
value
and;here;comes;combinated;value
and
some
others
Share:
121,921
testing
Author by

testing

Updated on July 14, 2020

Comments

  • testing
    testing almost 4 years

    I try to read a CSV and echo the content. But the content displays the characters wrong.

    Mäx Müstermänn -> Mäx Müstermänn

    Encoding of the CSV file is UTF-8 without BOM (checked with Notepad++).

    This is the content of the CSV file:

    "Mäx";"Müstermänn"

    My PHP script

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    </head>
    <body>
    <?php
    $handle = fopen ("specialchars.csv","r");
    echo '<table border="1"><tr><td>First name</td><td>Last name</td></tr><tr>';
    while ($data = fgetcsv ($handle, 1000, ";")) {
            $num = count ($data);
            for ($c=0; $c < $num; $c++) {
                // output data
                echo "<td>$data[$c]</td>";
            }
            echo "</tr><tr>";
    }
    ?>
    </body>
    </html>
    

    I tried to use setlocale(LC_ALL, 'de_DE.utf8'); as suggested here without success. The content is still wrong displayed.

    What I'm missing?

    Edit:

    An echo mb_detect_encoding($data[$c],'UTF-8'); gives me UTF-8 UTF-8.

    echo file_get_contents("specialchars.csv"); gives me "Mäx";"Müstermänn".

    And

    print_r(str_getcsv(reset(explode("\n", file_get_contents("specialchars.csv"))), ';'))
    

    gives me

    Array ( [0] => Mäx [1] => Müstermänn )

    What does it mean?