Replacing accented characters php

196,353

Solution 1

I have tried all sorts based on the variations listed in the answers, but the following worked:

$unwanted_array = array(    'Š'=>'S', 'š'=>'s', 'Ž'=>'Z', 'ž'=>'z', 'À'=>'A', 'Á'=>'A', 'Â'=>'A', 'Ã'=>'A', 'Ä'=>'A', 'Å'=>'A', 'Æ'=>'A', 'Ç'=>'C', 'È'=>'E', 'É'=>'E',
                            'Ê'=>'E', 'Ë'=>'E', 'Ì'=>'I', 'Í'=>'I', 'Î'=>'I', 'Ï'=>'I', 'Ñ'=>'N', 'Ò'=>'O', 'Ó'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'O', 'Ø'=>'O', 'Ù'=>'U',
                            'Ú'=>'U', 'Û'=>'U', 'Ü'=>'U', 'Ý'=>'Y', 'Þ'=>'B', 'ß'=>'Ss', 'à'=>'a', 'á'=>'a', 'â'=>'a', 'ã'=>'a', 'ä'=>'a', 'å'=>'a', 'æ'=>'a', 'ç'=>'c',
                            'è'=>'e', 'é'=>'e', 'ê'=>'e', 'ë'=>'e', 'ì'=>'i', 'í'=>'i', 'î'=>'i', 'ï'=>'i', 'ð'=>'o', 'ñ'=>'n', 'ò'=>'o', 'ó'=>'o', 'ô'=>'o', 'õ'=>'o',
                            'ö'=>'o', 'ø'=>'o', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'ý'=>'y', 'þ'=>'b', 'ÿ'=>'y' );
$str = strtr( $str, $unwanted_array );

Solution 2

To remove the diacritics, use iconv:

$val = iconv('ISO-8859-1','ASCII//TRANSLIT',$val);

or

$val = iconv('UTF-8','ASCII//TRANSLIT',$val);

note that php has some weird bug in that it (sometimes?) needs to have a locale set to make these conversions work, using setlocale().

edit tested, it gets all of your diacritics out of the box:

$val = "á|â|à|å|ä ð|é|ê|è|ë í|î|ì|ï ó|ô|ò|ø|õ|ö ú|û|ù|ü æ ç ß abc ABC 123";
echo iconv('UTF-8','ASCII//TRANSLIT',$val); 

output (updated 2019-12-30)

a|a|a|a|a d|e|e|e|e i|i|i|i o|o|o|o|o|o u|u|u|u ae c ss abc ABC 123

Note that ð is correctly transliterated to d instead of o, as in the accepted answer.

Solution 3

I just came accross the answer from Lizard which is extremely helpful - especially when you do some sorting. Isn't is beautiful how many chars we need to say mostly the same ;)

If anyone else if looking for a all-in solution (as far as the comments above tell), here's the copy&paste:

/**
 * Replace language-specific characters by ASCII-equivalents.
 * @param string $s
 * @return string
 */
public static function normalizeChars($s) {
    $replace = array(
        'ъ'=>'-', 'Ь'=>'-', 'Ъ'=>'-', 'ь'=>'-',
        'Ă'=>'A', 'Ą'=>'A', 'À'=>'A', 'Ã'=>'A', 'Á'=>'A', 'Æ'=>'A', 'Â'=>'A', 'Å'=>'A', 'Ä'=>'Ae',
        'Þ'=>'B',
        'Ć'=>'C', 'ץ'=>'C', 'Ç'=>'C',
        'È'=>'E', 'Ę'=>'E', 'É'=>'E', 'Ë'=>'E', 'Ê'=>'E',
        'Ğ'=>'G',
        'İ'=>'I', 'Ï'=>'I', 'Î'=>'I', 'Í'=>'I', 'Ì'=>'I',
        'Ł'=>'L',
        'Ñ'=>'N', 'Ń'=>'N',
        'Ø'=>'O', 'Ó'=>'O', 'Ò'=>'O', 'Ô'=>'O', 'Õ'=>'O', 'Ö'=>'Oe',
        'Ş'=>'S', 'Ś'=>'S', 'Ș'=>'S', 'Š'=>'S',
        'Ț'=>'T',
        'Ù'=>'U', 'Û'=>'U', 'Ú'=>'U', 'Ü'=>'Ue',
        'Ý'=>'Y',
        'Ź'=>'Z', 'Ž'=>'Z', 'Ż'=>'Z',
        'â'=>'a', 'ǎ'=>'a', 'ą'=>'a', 'á'=>'a', 'ă'=>'a', 'ã'=>'a', 'Ǎ'=>'a', 'а'=>'a', 'А'=>'a', 'å'=>'a', 'à'=>'a', 'א'=>'a', 'Ǻ'=>'a', 'Ā'=>'a', 'ǻ'=>'a', 'ā'=>'a', 'ä'=>'ae', 'æ'=>'ae', 'Ǽ'=>'ae', 'ǽ'=>'ae',
        'б'=>'b', 'ב'=>'b', 'Б'=>'b', 'þ'=>'b',
        'ĉ'=>'c', 'Ĉ'=>'c', 'Ċ'=>'c', 'ć'=>'c', 'ç'=>'c', 'ц'=>'c', 'צ'=>'c', 'ċ'=>'c', 'Ц'=>'c', 'Č'=>'c', 'č'=>'c', 'Ч'=>'ch', 'ч'=>'ch',
        'ד'=>'d', 'ď'=>'d', 'Đ'=>'d', 'Ď'=>'d', 'đ'=>'d', 'д'=>'d', 'Д'=>'D', 'ð'=>'d',
        'є'=>'e', 'ע'=>'e', 'е'=>'e', 'Е'=>'e', 'Ə'=>'e', 'ę'=>'e', 'ĕ'=>'e', 'ē'=>'e', 'Ē'=>'e', 'Ė'=>'e', 'ė'=>'e', 'ě'=>'e', 'Ě'=>'e', 'Є'=>'e', 'Ĕ'=>'e', 'ê'=>'e', 'ə'=>'e', 'è'=>'e', 'ë'=>'e', 'é'=>'e',
        'ф'=>'f', 'ƒ'=>'f', 'Ф'=>'f',
        'ġ'=>'g', 'Ģ'=>'g', 'Ġ'=>'g', 'Ĝ'=>'g', 'Г'=>'g', 'г'=>'g', 'ĝ'=>'g', 'ğ'=>'g', 'ג'=>'g', 'Ґ'=>'g', 'ґ'=>'g', 'ģ'=>'g',
        'ח'=>'h', 'ħ'=>'h', 'Х'=>'h', 'Ħ'=>'h', 'Ĥ'=>'h', 'ĥ'=>'h', 'х'=>'h', 'ה'=>'h',
        'î'=>'i', 'ï'=>'i', 'í'=>'i', 'ì'=>'i', 'į'=>'i', 'ĭ'=>'i', 'ı'=>'i', 'Ĭ'=>'i', 'И'=>'i', 'ĩ'=>'i', 'ǐ'=>'i', 'Ĩ'=>'i', 'Ǐ'=>'i', 'и'=>'i', 'Į'=>'i', 'י'=>'i', 'Ї'=>'i', 'Ī'=>'i', 'І'=>'i', 'ї'=>'i', 'і'=>'i', 'ī'=>'i', 'ij'=>'ij', 'IJ'=>'ij',
        'й'=>'j', 'Й'=>'j', 'Ĵ'=>'j', 'ĵ'=>'j', 'я'=>'ja', 'Я'=>'ja', 'Э'=>'je', 'э'=>'je', 'ё'=>'jo', 'Ё'=>'jo', 'ю'=>'ju', 'Ю'=>'ju',
        'ĸ'=>'k', 'כ'=>'k', 'Ķ'=>'k', 'К'=>'k', 'к'=>'k', 'ķ'=>'k', 'ך'=>'k',
        'Ŀ'=>'l', 'ŀ'=>'l', 'Л'=>'l', 'ł'=>'l', 'ļ'=>'l', 'ĺ'=>'l', 'Ĺ'=>'l', 'Ļ'=>'l', 'л'=>'l', 'Ľ'=>'l', 'ľ'=>'l', 'ל'=>'l',
        'מ'=>'m', 'М'=>'m', 'ם'=>'m', 'м'=>'m',
        'ñ'=>'n', 'н'=>'n', 'Ņ'=>'n', 'ן'=>'n', 'ŋ'=>'n', 'נ'=>'n', 'Н'=>'n', 'ń'=>'n', 'Ŋ'=>'n', 'ņ'=>'n', 'ʼn'=>'n', 'Ň'=>'n', 'ň'=>'n',
        'о'=>'o', 'О'=>'o', 'ő'=>'o', 'õ'=>'o', 'ô'=>'o', 'Ő'=>'o', 'ŏ'=>'o', 'Ŏ'=>'o', 'Ō'=>'o', 'ō'=>'o', 'ø'=>'o', 'ǿ'=>'o', 'ǒ'=>'o', 'ò'=>'o', 'Ǿ'=>'o', 'Ǒ'=>'o', 'ơ'=>'o', 'ó'=>'o', 'Ơ'=>'o', 'œ'=>'oe', 'Œ'=>'oe', 'ö'=>'oe',
        'פ'=>'p', 'ף'=>'p', 'п'=>'p', 'П'=>'p',
        'ק'=>'q',
        'ŕ'=>'r', 'ř'=>'r', 'Ř'=>'r', 'ŗ'=>'r', 'Ŗ'=>'r', 'ר'=>'r', 'Ŕ'=>'r', 'Р'=>'r', 'р'=>'r',
        'ș'=>'s', 'с'=>'s', 'Ŝ'=>'s', 'š'=>'s', 'ś'=>'s', 'ס'=>'s', 'ş'=>'s', 'С'=>'s', 'ŝ'=>'s', 'Щ'=>'sch', 'щ'=>'sch', 'ш'=>'sh', 'Ш'=>'sh', 'ß'=>'ss',
        'т'=>'t', 'ט'=>'t', 'ŧ'=>'t', 'ת'=>'t', 'ť'=>'t', 'ţ'=>'t', 'Ţ'=>'t', 'Т'=>'t', 'ț'=>'t', 'Ŧ'=>'t', 'Ť'=>'t', '™'=>'tm',
        'ū'=>'u', 'у'=>'u', 'Ũ'=>'u', 'ũ'=>'u', 'Ư'=>'u', 'ư'=>'u', 'Ū'=>'u', 'Ǔ'=>'u', 'ų'=>'u', 'Ų'=>'u', 'ŭ'=>'u', 'Ŭ'=>'u', 'Ů'=>'u', 'ů'=>'u', 'ű'=>'u', 'Ű'=>'u', 'Ǖ'=>'u', 'ǔ'=>'u', 'Ǜ'=>'u', 'ù'=>'u', 'ú'=>'u', 'û'=>'u', 'У'=>'u', 'ǚ'=>'u', 'ǜ'=>'u', 'Ǚ'=>'u', 'Ǘ'=>'u', 'ǖ'=>'u', 'ǘ'=>'u', 'ü'=>'ue',
        'в'=>'v', 'ו'=>'v', 'В'=>'v',
        'ש'=>'w', 'ŵ'=>'w', 'Ŵ'=>'w',
        'ы'=>'y', 'ŷ'=>'y', 'ý'=>'y', 'ÿ'=>'y', 'Ÿ'=>'y', 'Ŷ'=>'y',
        'Ы'=>'y', 'ž'=>'z', 'З'=>'z', 'з'=>'z', 'ź'=>'z', 'ז'=>'z', 'ż'=>'z', 'ſ'=>'z', 'Ж'=>'zh', 'ж'=>'zh'
    );
    return strtr($s, $replace);
}

Note some slight changes regarding the German umlauts (ä => ae)

Edit: Included more characters based on the posting from user3682119 (except for the copyright symbol) and the comment from daker.

Solution 4

In PHP 5.4 the intl extension provides a new class named Transliterator.

I believe that's the best way to remove diacritics for two reasons:

  1. Transliterator is based on ICU, so you're using the tables of the ICU library. ICU is a great project, developed over the year to provide comprehensive tables and functionalities. Whatever table you want to write yourself, it will never be as complete as the one from ICU.

  2. In UTF-8, characters could be represented differently. For example, the character ñ could be saved as a single (multi-byte) character, or as the combination of characters ˜ (multibyte) and n. In addition to this, some characters in Unicode are homograph: they look the same while having different codepoints. For this reason it's also important to normalize the string.

Here's a sample code, taken from an old answer of mine:

<?php
$transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD);
$test = ['abcd', 'èe', '€', 'àòùìéëü', 'àòùìéëü', 'tiësto'];
foreach($test as $e) {
    $normalized = $transliterator->transliterate($e);
    echo $e. ' --> '.$normalized."\n";
}
?>

Result:

abcd --> abcd
èe --> ee
€ --> €
àòùìéëü --> aouieeu
àòùìéëü --> aouieeu
tiësto --> tiesto

The first argument for the Transliterator class performs the removal of diacritics as well as the normalization of the string.

Solution 5

An updated answer based on @BurninLeo's answer

function replace_spec_char($subject) {
    $char_map = array(
        "ъ" => "-", "ь" => "-", "Ъ" => "-", "Ь" => "-",
        "А" => "A", "Ă" => "A", "Ǎ" => "A", "Ą" => "A", "À" => "A", "Ã" => "A", "Á" => "A", "Æ" => "A", "Â" => "A", "Å" => "A", "Ǻ" => "A", "Ā" => "A", "א" => "A",
        "Б" => "B", "ב" => "B", "Þ" => "B",
        "Ĉ" => "C", "Ć" => "C", "Ç" => "C", "Ц" => "C", "צ" => "C", "Ċ" => "C", "Č" => "C", "©" => "C", "ץ" => "C",
        "Д" => "D", "Ď" => "D", "Đ" => "D", "ד" => "D", "Ð" => "D",
        "È" => "E", "Ę" => "E", "É" => "E", "Ë" => "E", "Ê" => "E", "Е" => "E", "Ē" => "E", "Ė" => "E", "Ě" => "E", "Ĕ" => "E", "Є" => "E", "Ə" => "E", "ע" => "E",
        "Ф" => "F", "Ƒ" => "F",
        "Ğ" => "G", "Ġ" => "G", "Ģ" => "G", "Ĝ" => "G", "Г" => "G", "ג" => "G", "Ґ" => "G",
        "ח" => "H", "Ħ" => "H", "Х" => "H", "Ĥ" => "H", "ה" => "H",
        "I" => "I", "Ï" => "I", "Î" => "I", "Í" => "I", "Ì" => "I", "Į" => "I", "Ĭ" => "I", "I" => "I", "И" => "I", "Ĩ" => "I", "Ǐ" => "I", "י" => "I", "Ї" => "I", "Ī" => "I", "І" => "I",
        "Й" => "J", "Ĵ" => "J",
        "ĸ" => "K", "כ" => "K", "Ķ" => "K", "К" => "K", "ך" => "K",
        "Ł" => "L", "Ŀ" => "L", "Л" => "L", "Ļ" => "L", "Ĺ" => "L", "Ľ" => "L", "ל" => "L",
        "מ" => "M", "М" => "M", "ם" => "M",
        "Ñ" => "N", "Ń" => "N", "Н" => "N", "Ņ" => "N", "ן" => "N", "Ŋ" => "N", "נ" => "N", "ʼn" => "N", "Ň" => "N",
        "Ø" => "O", "Ó" => "O", "Ò" => "O", "Ô" => "O", "Õ" => "O", "О" => "O", "Ő" => "O", "Ŏ" => "O", "Ō" => "O", "Ǿ" => "O", "Ǒ" => "O", "Ơ" => "O",
        "פ" => "P", "ף" => "P", "П" => "P",
        "ק" => "Q",
        "Ŕ" => "R", "Ř" => "R", "Ŗ" => "R", "ר" => "R", "Р" => "R", "®" => "R",
        "Ş" => "S", "Ś" => "S", "Ș" => "S", "Š" => "S", "С" => "S", "Ŝ" => "S", "ס" => "S",
        "Т" => "T", "Ț" => "T", "ט" => "T", "Ŧ" => "T", "ת" => "T", "Ť" => "T", "Ţ" => "T",
        "Ù" => "U", "Û" => "U", "Ú" => "U", "Ū" => "U", "У" => "U", "Ũ" => "U", "Ư" => "U", "Ǔ" => "U", "Ų" => "U", "Ŭ" => "U", "Ů" => "U", "Ű" => "U", "Ǖ" => "U", "Ǜ" => "U", "Ǚ" => "U", "Ǘ" => "U",
        "В" => "V", "ו" => "V",
        "Ý" => "Y", "Ы" => "Y", "Ŷ" => "Y", "Ÿ" => "Y",
        "Ź" => "Z", "Ž" => "Z", "Ż" => "Z", "З" => "Z", "ז" => "Z",
        "а" => "a", "ă" => "a", "ǎ" => "a", "ą" => "a", "à" => "a", "ã" => "a", "á" => "a", "æ" => "a", "â" => "a", "å" => "a", "ǻ" => "a", "ā" => "a", "א" => "a",
        "б" => "b", "ב" => "b", "þ" => "b",
        "ĉ" => "c", "ć" => "c", "ç" => "c", "ц" => "c", "צ" => "c", "ċ" => "c", "č" => "c", "©" => "c", "ץ" => "c",
        "Ч" => "ch", "ч" => "ch",
        "д" => "d", "ď" => "d", "đ" => "d", "ד" => "d", "ð" => "d",
        "è" => "e", "ę" => "e", "é" => "e", "ë" => "e", "ê" => "e", "е" => "e", "ē" => "e", "ė" => "e", "ě" => "e", "ĕ" => "e", "є" => "e", "ə" => "e", "ע" => "e",
        "ф" => "f", "ƒ" => "f",
        "ğ" => "g", "ġ" => "g", "ģ" => "g", "ĝ" => "g", "г" => "g", "ג" => "g", "ґ" => "g",
        "ח" => "h", "ħ" => "h", "х" => "h", "ĥ" => "h", "ה" => "h",
        "i" => "i", "ï" => "i", "î" => "i", "í" => "i", "ì" => "i", "į" => "i", "ĭ" => "i", "ı" => "i", "и" => "i", "ĩ" => "i", "ǐ" => "i", "י" => "i", "ї" => "i", "ī" => "i", "і" => "i",
        "й" => "j", "Й" => "j", "Ĵ" => "j", "ĵ" => "j",
        "ĸ" => "k", "כ" => "k", "ķ" => "k", "к" => "k", "ך" => "k",
        "ł" => "l", "ŀ" => "l", "л" => "l", "ļ" => "l", "ĺ" => "l", "ľ" => "l", "ל" => "l",
        "מ" => "m", "м" => "m", "ם" => "m",
        "ñ" => "n", "ń" => "n", "н" => "n", "ņ" => "n", "ן" => "n", "ŋ" => "n", "נ" => "n", "ʼn" => "n", "ň" => "n",
        "ø" => "o", "ó" => "o", "ò" => "o", "ô" => "o", "õ" => "o", "о" => "o", "ő" => "o", "ŏ" => "o", "ō" => "o", "ǿ" => "o", "ǒ" => "o", "ơ" => "o",
        "פ" => "p", "ף" => "p", "п" => "p",
        "ק" => "q",
        "ŕ" => "r", "ř" => "r", "ŗ" => "r", "ר" => "r", "р" => "r", "®" => "r",
        "ş" => "s", "ś" => "s", "ș" => "s", "š" => "s", "с" => "s", "ŝ" => "s", "ס" => "s",
        "т" => "t", "ț" => "t", "ט" => "t", "ŧ" => "t", "ת" => "t", "ť" => "t", "ţ" => "t",
        "ù" => "u", "û" => "u", "ú" => "u", "ū" => "u", "у" => "u", "ũ" => "u", "ư" => "u", "ǔ" => "u", "ų" => "u", "ŭ" => "u", "ů" => "u", "ű" => "u", "ǖ" => "u", "ǜ" => "u", "ǚ" => "u", "ǘ" => "u",
        "в" => "v", "ו" => "v",
        "ý" => "y", "ы" => "y", "ŷ" => "y", "ÿ" => "y",
        "ź" => "z", "ž" => "z", "ż" => "z", "з" => "z", "ז" => "z", "ſ" => "z",
        "™" => "tm",
        "@" => "at",
        "Ä" => "ae", "Ǽ" => "ae", "ä" => "ae", "æ" => "ae", "ǽ" => "ae",
        "ij" => "ij", "IJ" => "ij",
        "я" => "ja", "Я" => "ja",
        "Э" => "je", "э" => "je",
        "ё" => "jo", "Ё" => "jo",
        "ю" => "ju", "Ю" => "ju",
        "œ" => "oe", "Œ" => "oe", "ö" => "oe", "Ö" => "oe",
        "щ" => "sch", "Щ" => "sch",
        "ш" => "sh", "Ш" => "sh",
        "ß" => "ss",
        "Ü" => "ue",
        "Ж" => "zh", "ж" => "zh",
    );
    return strtr($subject, $char_map);
}

$string = "Ħí ŧħə®ë, юßť å test!";
echo replace_spec_char($string);

Ħí ŧħə®ë, юßť å test! => Hi there, jusst a test!

This does not mix up upper and lower case chars except for longer chars (eg: ss,ch, sch) , added @ ® ©

Also if you want to build regex matching regardless to special chars :

rss => '[rŕřŘŗŖרŔРр](?:[sșсŜšśסşСŝ][sșсŜšśסşСŝ]|[ß])'

A vala implementation of this : https://code.launchpad.net/~jeremy-munsch/synapse-project/ascii-smart/+merge/277477

Here is the base list you could work with, with regex replacing (in sublime text) or small script you can build anything from this array to fill your needs.

"-" => "ъьЪЬ",
"A" => "АĂǍĄÀÃÁÆÂÅǺĀא",
"B" => "БבÞ",
"C" => "ĈĆÇЦצĊČ©ץ",
"D" => "ДĎĐדÐ",
"E" => "ÈĘÉËÊЕĒĖĚĔЄƏע",
"F" => "ФƑ",
"G" => "ĞĠĢĜГגҐ",
"H" => "חĦХĤה",
"I" => "IÏÎÍÌĮĬIИĨǏיЇĪІ",
"J" => "ЙĴ",
"K" => "ĸכĶКך",
"L" => "ŁĿЛĻĹĽל",
"M" => "מМם",
"N" => "ÑŃНŅןŊנʼnŇ",
"O" => "ØÓÒÔÕОŐŎŌǾǑƠ",
"P" => "פףП",
"Q" => "ק",
"R" => "ŔŘŖרР®",
"S" => "ŞŚȘŠСŜס",
"T" => "ТȚטŦתŤŢ",
"U" => "ÙÛÚŪУŨƯǓŲŬŮŰǕǛǙǗ",
"V" => "Вו",
"Y" => "ÝЫŶŸ",
"Z" => "ŹŽŻЗז",
"a" => "аăǎąàãáæâåǻāא",
"b" => "бבþ",
"c" => "ĉćçцצċč©ץ",
"ch" => "ч",
"d" => "дďđדð",
"e" => "èęéëêеēėěĕєəע",
"f" => "фƒ",
"g" => "ğġģĝгגґ",
"h" => "חħхĥה",
"i" => "iïîíìįĭıиĩǐיїīі",
"j" => "йĵ",
"k" => "ĸכķкך",
"l" => "łŀлļĺľל",
"m" => "מмם",
"n" => "ñńнņןŋנʼnň",
"o" => "øóòôõоőŏōǿǒơ",
"p" => "פףп",
"q" => "ק",
"r" => "ŕřŗרр®",
"s" => "şśșšсŝס",
"t" => "тțטŧתťţ",
"u" => "ùûúūуũưǔųŭůűǖǜǚǘ",
"v" => "вו",
"y" => "ýыŷÿ",
"z" => "źžżзזſ",
"tm" => "™",
"at" => "@",
"ae" => "ÄǼäæǽ",
"ch" => "Чч",
"ij" => "ijIJ",
"j" => "йЙĴĵ",
"ja" => "яЯ",
"je" => "Ээ",
"jo" => "ёЁ",
"ju" => "юЮ",
"oe" => "œŒöÖ",
"sch" => "щЩ",
"sh" => "шШ",
"ss" => "ß",
"tm" => "™",
"ue" => "Ü",
"zh" => "Жж"
Share:
196,353

Related videos on Youtube

Lizard
Author by

Lizard

I am a PHP Web Developer

Updated on July 08, 2022

Comments

  • Lizard
    Lizard almost 2 years

    I am trying to replace accented characters with the normal replacements. Below is what I am currently doing.

        $string = "Éric Cantona";
        $strict = strtolower($string);
    
        echo "After Lower: ".$strict;
    
        $patterns[0] = '/[á|â|à|å|ä]/';
        $patterns[1] = '/[ð|é|ê|è|ë]/';
        $patterns[2] = '/[í|î|ì|ï]/';
        $patterns[3] = '/[ó|ô|ò|ø|õ|ö]/';
        $patterns[4] = '/[ú|û|ù|ü]/';
        $patterns[5] = '/æ/';
        $patterns[6] = '/ç/';
        $patterns[7] = '/ß/';
        $replacements[0] = 'a';
        $replacements[1] = 'e';
        $replacements[2] = 'i';
        $replacements[3] = 'o';
        $replacements[4] = 'u';
        $replacements[5] = 'ae';
        $replacements[6] = 'c';
        $replacements[7] = 'ss';
    
        $strict = preg_replace($patterns, $replacements, $strict);
        echo "Final: ".$strict;
    

    This gives me:

        After Lower: éric cantona
        Final: ric cantona
    

    The above gives me ric cantona I want the output to be eric cantona.

    can anyone help me with where I am going wrong?

    • Brandon Horsley
      Brandon Horsley almost 14 years
      For what it's worth, I copied and pasted, and ran this verbatim and got "eric cantona" (using php 5.2.9-4)
    • troelskn
      troelskn almost 14 years
      @brandon it will depend on the encoding that you save the file in. I assume that lizard saved it as utf-8, and you saved it as iso-8859-1.
    • Brandon Horsley
      Brandon Horsley almost 14 years
      What version of php are you using?
    • outis
      outis almost 12 years
    • rap-2-h
      rap-2-h over 6 years
  • mvds
    mvds almost 14 years
    I would never take the regexp route unless there is no choice; use iconv to ASCII//TRANSLIT
  • MvanGeest
    MvanGeest almost 14 years
    @NullUserException I've heard about that, but my provider won't even upgrade to PHP 5.3 as that would 'break too many old scripts'. On an unrelated note, my favourite Perl has had UTF-8 support for years :P (though I never used it for CGI).
  • Daniel Egeberg
    Daniel Egeberg almost 14 years
    @NullUserException: The old PHP6 plans were scrapped.
  • troelskn
    troelskn almost 14 years
    @MvanGeest Note that you can use utf-8 with PHP as of today. You just need to be aware of a few pitfalls (Eg. most string-functions expect the input to be latin1). But it's certainly doable, and I would generally recommend that for any new applications.
  • Rowan
    Rowan over 13 years
    Worth noting that iconv will error and cut the string off at 'illegal characters'. To solve this, you can use iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $val)
  • Halil Özgür
    Halil Özgür about 12 years
    Add these for Turkish support: 'Ğ'=>'G', 'İ'=>'I', 'Ş'=>'S', 'ğ'=>'g', 'ı'=>'i', 'ş'=>'s', 'ü'=>'u',
  • Vlad
    Vlad about 11 years
    Add these for Romanian support: 'ă'=>'a', 'Ă'=>'A', 'ș'=>'s', 'Ș'=>'S', 'ț'=>'t', 'Ț'=>'T'
  • KTB
    KTB over 10 years
    There is a minor Error: 'ß' can not be translated to 'Ss' but must be replaced with 'ss'. This german exclusive character is never used in an uppercase scope.
  • Koffeehaus
    Koffeehaus almost 10 years
    I think Germans prefer to translate 'Ä'=>'AE', instead of 'Ä'=>'A'. I read somewhere that if they cannot type the two dots (like on credit cards) they put "E" after the letter, instead of just simply removing the dots. So Jäger would actually become Jaeger, instead of Jager.
  • kasimir
    kasimir almost 10 years
    Thanks for updating the list from @Lizard. Still missing some chars though, at least the Polish ones: 'Ą' => 'A', 'ą' => 'a', 'Ć' => 'C', 'ć' => 'c', 'Ę' => 'E', 'ę' => 'e', 'Ł' => 'L', 'ł' => 'l', 'Ń' => 'N', 'ń' => 'n', 'Ś' => 'S', 'ś' => 's', 'Ż' => 'Z', 'ż' => 'z', 'Ź' => 'Z', 'ź' => 'z'
  • BurninLeo
    BurninLeo almost 10 years
    Thanks a lot - added :)
  • Rafael Barros
    Rafael Barros almost 10 years
    Didn't worked here. With iconv('ISO-8859-1', 'ASCII//TRANSLIT', $val), áêìõç became 'a^e`i~oc.
  • mvds
    mvds almost 10 years
    I don't think these things are entirely related to PHP alone. Could they also depend on the locales and/or particular version of the iconv library installed?
  • Mladen B.
    Mladen B. over 9 years
    Since a lot of people have upvoted this answer, it needs to be said that the safer way is to use chr() instead of hard-coded accented characters, due to different editors the file may be opened with.
  • BurninLeo
    BurninLeo over 9 years
    Pretty nice. Who's magento?
  • ekerner
    ekerner over 9 years
    This should be in a built-in function in all web languages, for translating non valid URL characters while maintaining readable and SEO friendly URLs, since the alternative is currently to URL encode thus making the URL ugly, long, and unreadable. Of course it cant be made to efficiently support many Asian languages, but this covers most others. Worth noting that this ugly looking solution is much better than using iconv with //TRANSLIT which will leave you with many question marks and also must know the imput encoding to convert.
  • BurninLeo
    BurninLeo over 9 years
    When compared to the above postings, these characters may be added: 'Ã' => 'A', 'ã' => 'a', 'Þ' => 'B', 'Ê' => 'E', 'Ñ' => 'N', 'ð' => 'o', 'ñ' => 'n', 'ș' => 's', 'Ș' => 'S', 'ț' => 't', 'Ț' => 'T'
  • daker
    daker over 9 years
    FYI @BurninLeo The letter 'ð' should not be substituted with 'o', as it is the icelandic letter for something closer to 'd'
  • Guilherme Nascimento
    Guilherme Nascimento about 9 years
    His answer seems to me the best, maybe "merge" your suggestion to $c = mb_detect_encoding($text, mb_detect_order(), true); $val = iconv($c, 'ASCII//TRANSLIT',$val); is a good way? :) Thanks +1
  • Daniel Garcia Sanchez
    Daniel Garcia Sanchez about 9 years
    Thanks @Lizard awesome answer
  • Kwaadpepper
    Kwaadpepper over 8 years
    This is awesome, however, the lower case char are mixed with upper ones unlike uppers. eg : d => д d => Д. This is wrong, only D => Д should be in this table i think, right ?
  • Kwaadpepper
    Kwaadpepper over 8 years
    Just to mention an idea: this also allowed me to build regex matching regardless of special chars :p rss => '[rŕřŘŗŖרŔРр](?:[sșсŜšśסşСŝ][sșсŜšśסşСŝ]|[ß])'
  • Kwaadpepper
    Kwaadpepper over 8 years
    Here is a script cleaning up this answer. paste.debian.net/334940 And the full cleaned result ready to work with : paste.debian.net/334948 Note that double and triple letter index are only present on lower case to avoid multiple combination so they include lower and upper case chars
  • Javier Enríquez
    Javier Enríquez about 8 years
    Today I got this issue but this answer wasn't enough because my string had an accent in another character. So I had for example a simple 'o' and then 2 strange characters. I url encoded them and those are : "%CC%81". So I added urldecode('%CC%81') => '', to the $replace array and fixed my problem.
  • BurninLeo
    BurninLeo about 8 years
    I assume, that is the UTF-8 character ́ (COMBINING ACUTE ACCENT, see utf8-chartable.de/…) that has something like a negative margin on the left to be placed above the previous character - like this: x́ (this is an X with the character behind!) Interesting stuff :) UTF-8 knows a lot such characters - therefore, it may be sensible to preg_replace('/[^a-z0-9 ]/i', '', $s) after doing the above replacements.
  • Josh Bernfeld
    Josh Bernfeld almost 8 years
    This fixed the question marks and quotes for me setlocale(LC_ALL, "en_US.utf8"); $string = iconv('UTF-8', 'ASCII//TRANSLIT//IGNORE', $string);
  • Hitesh
    Hitesh almost 8 years
    is this possible to do using regex ?
  • Robert Sinclair
    Robert Sinclair over 7 years
    def best answer, the rest may or may not require encoding/decoding and some depend on your version of PHP
  • Terry Lin
    Terry Lin over 7 years
    Thanks. but I try your code, "olivæ" is still "olivæ" not "olivae"
  • Terry Lin
    Terry Lin over 7 years
    I use transliterator_transliterate('Any-Latin; Latin-ASCII', "A æ Übérmensch på høyeste nivå! И я люблю PHP! fi") to solve my problem
  • Matt Browne
    Matt Browne over 7 years
    Thanks for this. I wanted to do this on a Wordpress site and didn't realize Wordpress had a built-in function for it :)
  • CTala
    CTala about 7 years
    Not the most elegant solution, but a simple solution that works. Thanks !
  • moreirapontocom
    moreirapontocom almost 7 years
    Works like a charm. Thank you.
  • Rey0bs
    Rey0bs over 6 years
    Yes \Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', \Transliterator::FORWARD) will do the job
  • Rey0bs
    Rey0bs over 6 years
    If you want also to replace other caracters like 'æ', you can use \Transliterator::createFromRules(':: Any-Latin; :: Latin-ASCII; :: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', \Transliterator::FORWARD) instead
  • Trung Nguyen
    Trung Nguyen over 5 years
    Why do you convert S to Z? - Last item on Z ("S" => "Z")
  • Xavi Montero
    Xavi Montero about 5 years
    Definitively agree with going to standards instead of reinventing the wheel. ICU seems the best reference. Instead, the documentation at https://www.php.net/manual/en/transliterator.createfromrules‌​.php does not talk about the "rules". Where can we find a full description of what's accepted by createFromRules()?
  • Rodrigo
    Rodrigo almost 5 years
    Isn't it better to use iconv?
  • ItalyPaleAle
    ItalyPaleAle over 4 years
    @XaviMontero check out the documentation for ICU: userguide.icu-project.org/transforms/general/rules
  • f7n
    f7n over 4 years
    and what is $str?
  • Umair Ayub
    Umair Ayub over 4 years
    above PHP example gives me ?|?|?|?|? ?|?|?|?|? ?|?|?|? ?|?|?|?|?|? ?|?|?|? ae ? ss abc ABC 123
  • Tom
    Tom about 4 years
    This doesn't work for me and just removes accented letters
  • Tom
    Tom about 4 years
    @mvds Setting locale to "en_US.utf8" helps but it's not in this answer. This is my answer: stackoverflow.com/a/60816979/1404447
  • CheddarLizzard
    CheddarLizzard about 4 years
    The solution of Terry Lin seems to work well, many thanks! transliterator_transliterate('Any-Latin; Latin-ASCII', $string)
  • Ivan
    Ivan over 3 years
    Awesome solution. Works like a charm. However you should add the "slash" too for taking care of the norwegian oslash html entity as well: $str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde|ring|s‌​lash);/','$1',$str);
  • Sirsemy
    Sirsemy over 2 years
    Add these for Hungarian support: 'ű'=>'u', 'Ű'=>'U', 'ő'=>'o', 'Ő'=>'O', 'ü'=>'u'
  • albanx
    albanx about 2 years
    for some reason it is adding a double quote in the replaced accent char ë => "e