PHP: Replace umlauts with closest 7-bit ASCII equivalent in an UTF-8 string
Solution 1
iconv("utf-8","ascii//TRANSLIT",$input);
Extended example
Solution 2
A little trick that doesn't require setting locales or having huge translation tables:
function Unaccent($string)
{
if (strpos($string = htmlentities($string, ENT_QUOTES, 'UTF-8'), '&') !== false)
{
$string = html_entity_decode(preg_replace('~&([a-z]{1,2})(?:acute|cedil|circ|grave|lig|orn|ring|slash|tilde|uml);~i', '$1', $string), ENT_QUOTES, 'UTF-8');
}
return $string;
}
The only requirement for it to work properly is to save your files in UTF-8 (as you should already).
Solution 3
you can also try this
$string = "Fóø Bår";
$transliterator = Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: Lower(); :: NFC;', Transliterator::FORWARD);
echo $normalized = $transliterator->transliterate($string);
but you need to have http://php.net/manual/en/book.intl.php available
Solution 4
If you are using WordPress, you can use the built-in function remove_accents( $string )
https://codex.wordpress.org/Function_Reference/remove_accents
However I noticed a bug : it doesn’t work on a string with a single character.
Solution 5
Okay, found an obvious solution myself, but it's not the best concerning performance...
echo strtr(utf8_decode($input),
utf8_decode('ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ'),
'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy');
Nishan
Professional Web Developer since 2001, amateur developer since 198x. Eating and breathing JavaScript and PHP in my day-to-day live, but have seen a lot in my 30+ years of code-juggling. Adobe Certified Expert - Adobe Analytics Developer
Updated on June 02, 2020Comments
-
Nishan about 4 years
What I want to do is to remove all accents and umlauts from a string, turning "lärm" into "larm" or "andré" into "andre". What I tried to do was to utf8_decode the string and then use strtr on it, but since my source file is saved as UTF-8 file, I can't enter the ISO-8859-15 characters for all umlauts - the editor inserts the UTF-8 characters.
Obviously a solution for this would be to have an include that's an ISO-8859-15 file, but there must be a better way than to have another required include?
echo strtr(utf8_decode($input), 'ŠŒŽšœžŸ¥µÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿ', 'SOZsozYYuAAAAAAACEEEEIIIIDNOOOOOOUUUUYsaaaaaaaceeeeiiiionoooooouuuuyy');
UPDATE: Maybe I was a bit inaccurate with what I try to do: I do not actually want to remove the umlauts, but to replace them with their closest "one character ASCII" equivalent.
-
Nishan over 15 yearsI had to add "setlocale(LC_ALL, 'en_US');" (sadly no locals for Germany seem to be available on my machine :( ), but then it works. Great! :)
-
spikey about 12 yearsWhy does this solution return
"o
forö
on my machine and on the examples in the php reference it returnsoe
? -
Zebooka almost 12 yearsThis does not work for Cyrillic characters. They are converted to ? question marks instead.
-
laurent over 11 yearsIt's not the best in terms of performance and it also produces incorrect result. Letters like Œ, Æ, etc. should decompose to two letters, not to one.
-
Matt about 11 yearsThis bombs with a value of false and gives me a notice that illegal characters were encountered...
-
Michał Leon over 10 yearsTo spikey's comment: if you set your locale to de_*.UTF8 (de_DE.UTF8, de_CH.UTF8, etc.), then umlauts will be converted to *e (ü->ue). Set it to en_US..UTF8 to get the desired effect.
-
edditor almost 10 yearsI have the same problem as spikey, setlocale stuff doesn't help also.
-
Piskvor left the building over 9 yearsYou have missed
žščřďťňů
, and that's just the ones I see on my keyboard. Whitelisting known characters is not the best solution. -
PeerBr over 9 yearssetlocale() depends on your operating system, is not thread-safe and wreaks havoc if you do it wrong (such as treating commas as periods in conversions). Either be careful (using LC_CTYPE instead of LC_ALL in this case) or stay away from it unless you know exactly what you're doing.
-
Nishan over 8 years@this.lau_ As mentioned in the question: I'm looking for the closest "one character ASCII", so no - two letter decomposition would not be correct for my use case. One letter is correct for what I'm looking to do.
-
vinczemarton over 7 yearsWorks great for hungarian
-
Jose Manuel Abarca Rodríguez over 5 yearsUse
"ascii//translit//ignore"
to prevent "illegal characters encountered" error. -
Constantin Groß almost 3 yearsIf
iconv()
withASCII//TRANSLIT
doesn't work for you with German umlauts (ä/ö/ü => ae/oe/ue, despite settingsetlocale()
to a German utf8 locale, this answer to another question was the solution for me, usingtransliterator_transliterate()
withde-ASCII
supplied via the transliterator build string. -
Vladan over 2 yearsDespite not actually being an exact answer, I appreciate this answer as I'm using WordPress. So thanks! ;)