php iconv translit for removing accents: not working as excepted?
Solution 1
I have this standard function to return valid url strings without the invalid url characters. The magic seems to be in the line after the //remove unwanted characters comment.
This is taken from the Symfony framework documentation: http://www.symfony-project.org/jobeet/1_4/Doctrine/en/08 which in turn is taken from http://php.vrana.cz/vytvoreni-pratelskeho-url.php but i don't speak Czech ;-)
function slugify($text)
{
// replace non letter or digits by -
$text = preg_replace('#[^\\pL\d]+#u', '-', $text);
// trim
$text = trim($text, '-');
// transliterate
if (function_exists('iconv'))
{
$text = iconv('utf-8', 'us-ascii//TRANSLIT', $text);
}
// lowercase
$text = strtolower($text);
// remove unwanted characters
$text = preg_replace('#[^-\w]+#', '', $text);
if (empty($text))
{
return 'n-a';
}
return $text;
}
echo slugify('é'); // --> "e"
Solution 2
cf @tchrist, with INTL php extension
http://fr2.php.net/manual/en/book.intl.php
preg_replace('/\pM*/u','',normalizer_normalize( $mystring, Normalizer::FORM_D));
eéèêëiîïoöôuùûüaâäÅ Ἥ ŐǟǠ ǺƶƈƉųŪŧȬƀ␢ĦŁȽŦ ƀǖ becomes
eeeeeiiiooouuuuaaaA Η OaA AƶƈƉuUŧOƀ␢ĦŁȽŦ ƀu
As tchrist emphasises, not all unicode characters are considered decomposable:
extract from Unicode charts:
U0080.pdf
00CF Ï LATIN CAPITAL LETTER I WITH DIAERESIS
≡ 0049 I 0308 ¨
NB this symbol « ≡ » indicate an available decomposition
00D0 Ð LATIN CAPITAL LETTER ETH
→ 00F0 ð latin small letter eth
→ 0110 Đ latin capital letter d with stroke
→ 0189 Ɖ latin capital letter african d
no decomposition available, IMHO strangely (we could consider ASCII letter D as an acceptable equivalent).
U0100.pdf
0110 Đ LATIN CAPITAL LETTER D WITH STROKE
→ 00D0 Ð latin capital letter eth
→ 0111 đ latin small letter d with stroke
→ 0189 Ɖ latin capital letter african d
even stranger: this one is identified as LATIN CAPITAL LETTER D (with stroke), but not decomposable as such! Perhaps a cooler solution should be to get the unicode description of each char, and compare it with the description of each ascii char (and replace accordingly). Anyone? ;-]
cf http://unicode.org/Public/UNIDATA/UnicodeData.txt
Solution 3
It happen with me with pure iconv without php. The Trick was to set LANG environment value to en_US.UTF-8 (it was hu_HU.UTF-8 before, in my case). After it worked as expected.
Related videos on Youtube
dynamic
__ _ ____/ /_ ______ ____ _____ ___ (_)____ / __ / / / / __ \/ __ `/ __ `__ \/ / ___/ / /_/ / /_/ / / / / /_/ / / / / / / / /__ \__,_/\__, /_/ /_/\__,_/_/ /_/ /_/_/\___/ /____/ avatar from http://www.pinterest.com/pin/504332858244739013/
Updated on July 09, 2022Comments
-
dynamic almost 2 years
consider this simple code:
echo iconv('UTF-8', 'ASCII//TRANSLIT', 'è');
it prints
`e
instead of just
e
do you know what I am doing wrong?
nothing changed after adding setlocale
setlocale(LC_COLLATE, 'en_US.utf8'); echo iconv('UTF-8', 'ASCII//TRANSLIT', 'è');
-
Michał Leon over 10 yearsIgnore tchris, this is THE way to do it, I use it in practice. The only error you made is that the locale "subclass" is
setlocale(LC_CTYPE, 'en_US.UTF-8');
-> LC_TYPE, not _COLLATE. Tschüss. -
Scott over 8 yearsI'm having this same problem - it is certainly not LC_TYPE... that generates an error (for me at least). I've tried LC_ALL (which is what everyone else says) - with no effect. I'm putting in the string
CŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ
and gettingCSOEZsoez"Yyenu
A'A^A~A"AAAECE'E^E"E
I'I^I"ID~NO'O^O~O"OO
U'U^U"U'Yssa'a^a~a"aaaec
e'e^e"ei'i^i"id~n
o'o^o~o"oou'u^u"u'y"y
-
-
dynamic about 13 yearssame result as before with setlocale, (see first post)
-
dynamic about 13 yearsI know I could do a preg_replace like that after the transliterate by iconv... I only wanted to know if the behaviour descrived in my first post is standard or iconv can transliterate "better"
-
dynamic about 12 yearsSorrry but why there are 2 backslash in the preg_replace? shouldn't be just
[^\pL\d]
? -
NullPointer almost 11 yearsWhat about
plƒtre francin
string wheref
does not get converted? -
dearsina almost 5 yearsThis is the only one that worked for me, on vanilla PHP7.2.