iconv UTF-8//IGNORE still produces "illegal character" error

29,354

Solution 1

The output character set (the second parameter) should be different from the input character set (first param). If they are the same, then if there are illegal UTF-8 characters in the string, iconv will reject them as being illegal according to the input character set.

Solution 2

I know 2 methods how to fix UTF-8 string containing illegal characters:

  1. Illegal characters will be replaced by question marks ("?"):

$message = mb_convert_encoding($message, 'UTF-8', 'UTF-8');

  1. Illegal characters will be removedL

$message = iconv('UTF-8', 'UTF-8//IGNORE', $message);

The second method actually was described in question. But it doesn't produce any E_NOTICE in my case. I tested with different corrupted UTF-8 strings with error_reporting(E_ALL); and always result was as expected. Possible something was changed since 2012. I tested on PHP 7.2.9 Win.

Solution 3

I am using mb_convert_encoding with bellow setting which removes the invalid character

ini_set('mbstring.substitute_character', "none");
$string= mb_convert_encoding($string, 'UTF-8', 'UTF-8');

it is working in my case.Earlier I was getting below notice

Notice: iconv(): Wrong charset, conversion from UTF-8' to UTF-8//IGNORE' is not allowed

$string= iconv('UTF-8', 'UTF-8//TRANSLIT//IGNORE', $string)
Share:
29,354
Znarkus
Author by

Znarkus

Updated on July 23, 2021

Comments

  • Znarkus
    Znarkus almost 3 years
    $string = iconv("UTF-8", "UTF-8//IGNORE", $string);
    

    I thought this code would remove invalid UTF-8 characters, but it produces [E_NOTICE] "iconv(): Detected an illegal character in input string". What am I missing, how do I properly strip a string from illegal characters?

  • Znarkus
    Znarkus about 12 years
    Do you propose a solution? I've actually read that this should work
  • msgmash.com
    msgmash.com about 12 years
    Yes, I've seen that link, but have a look at this github.com/EllisLab/CodeIgniter/issues/261. My understanding is that iconv doesn't do input encoding now - but I could be wrong. The link above also has a link to an alternative solution, which is at gist.github.com/1262496.
  • Znarkus
    Znarkus about 12 years
    That makes sense. I will first try mb_convert_encoding($string, "UTF-8", "UTF-8"), and if it doesn't work out I'll try the gist. Thanks!
  • clod986
    clod986 over 9 years
    The string returned then is empty
  • champion
    champion over 7 years
    You shouldn't do that, because in some cases you can obtain empty string.
  • gingerCodeNinja
    gingerCodeNinja about 7 years
    also note that iconv relies on locale being set correctly, so make sure you call setlocale(..); with the appropriate value first. Depending on the locale, the output will be different.
  • Christoffer Bubach
    Christoffer Bubach almost 4 years
    I don't have my code in front of me right now, but I'm pretty sure iconv() will allow it if you just specify one of the encodings as UCS