mb_detect_encoding detects ASCII as UTF-8?
Solution 1
Specifying a custom order, where ASCII is detected first, works.
mb_detect_encoding($val, 'ASCII,UTF-8,ISO-8859-15');
For completeness, the list of available encodings is at http://www.php.net/manual/en/mbstring.supported-encodings.php
Solution 2
You can specified explicitly
$val = mb_convert_encoding($val, 'UTF-8', 'ASCII');
EDIT:
$val = mb_convert_encoding($val, 'UTF-8', 'auto');
Comments
-
Cobra_Fast almost 2 years
I'm trying to automatically convert imported IPTC metadata from images to UTF-8 for storage in a database based on the PHP
mb_
functions.Currently it looks like this:
$val = mb_convert_encoding($val, 'UTF-8', mb_detect_encoding($val));
However, when
mb_detect_encoding()
is supplied an ASCII string (special characters in the Latin1-fields from 192-255) it detects it as UTF-8, hence in the following attempt to convert everything to proper UTF-8 all special characters are removed.I tried writing my own method by looking for Latin1 values and if none occured I would go on to letting
mb_detect_encoding
decide what it is. But I stopped midway when I realized that I can't be sure that other encoding don't use the same byte values for other things.So, is there a way to properly detect ASCII to feed to
mb_convert_encoding
as the source encoding?