iconv UTF-8//IGNORE still produces "illegal character" error
Solution 1
The output character set (the second parameter) should be different from the input character set (first param). If they are the same, then if there are illegal UTF-8 characters in the string, iconv
will reject them as being illegal according to the input character set.
Solution 2
I know 2 methods how to fix UTF-8 string containing illegal characters:
- Illegal characters will be replaced by question marks ("?"):
$message = mb_convert_encoding($message, 'UTF-8', 'UTF-8');
- Illegal characters will be removedL
$message = iconv('UTF-8', 'UTF-8//IGNORE', $message);
The second method actually was described in question. But it doesn't produce any E_NOTICE
in my case. I tested with different corrupted UTF-8 strings with error_reporting(E_ALL);
and always result was as expected. Possible something was changed since 2012. I tested on PHP 7.2.9 Win.
Solution 3
I am using mb_convert_encoding with bellow setting which removes the invalid character
ini_set('mbstring.substitute_character', "none");
$string= mb_convert_encoding($string, 'UTF-8', 'UTF-8');
it is working in my case.Earlier I was getting below notice
Notice: iconv(): Wrong charset, conversion from
UTF-8' to
UTF-8//IGNORE' is not allowed
$string= iconv('UTF-8', 'UTF-8//TRANSLIT//IGNORE', $string)
Znarkus
Updated on July 23, 2021Comments
-
Znarkus almost 3 years
$string = iconv("UTF-8", "UTF-8//IGNORE", $string);
I thought this code would remove invalid UTF-8 characters, but it produces
[E_NOTICE] "iconv(): Detected an illegal character in input string"
. What am I missing, how do I properly strip a string from illegal characters? -
Znarkus about 12 years
-
msgmash.com about 12 yearsYes, I've seen that link, but have a look at this github.com/EllisLab/CodeIgniter/issues/261. My understanding is that iconv doesn't do input encoding now - but I could be wrong. The link above also has a link to an alternative solution, which is at gist.github.com/1262496.
-
Znarkus about 12 yearsThat makes sense. I will first try
mb_convert_encoding($string, "UTF-8", "UTF-8")
, and if it doesn't work out I'll try the gist. Thanks! -
clod986 over 9 yearsThe string returned then is empty
-
champion over 7 yearsYou shouldn't do that, because in some cases you can obtain empty string.
-
gingerCodeNinja about 7 yearsalso note that iconv relies on locale being set correctly, so make sure you call setlocale(..); with the appropriate value first. Depending on the locale, the output will be different.
-
Christoffer Bubach almost 4 yearsI don't have my code in front of me right now, but I'm pretty sure
iconv()
will allow it if you just specify one of the encodings as UCS