Comparing UTF-8 String
Solution 1
IMPORTANT
This answer is meant for situations where it's not possible to run/install the 'intl' extension, and only sorts strings by replacing accented characters to non-accented characters. To sort accented characters according to a specific locale, using a Collator is a better approach -- see the other answer to this question for more information.
Sorting by non-accented characters in PHP 5.2
You may try converting both strings to ASCII using iconv() and the //TRANSLIT option to get rid of accented characters;
$str1 = iconv('utf-8', 'ascii//TRANSLIT', $str1);
Then do the comparison
See the documentation here:
http://www.php.net/manual/en/function.iconv.php
[updated, in response to @Esailija's remark] I overlooked the problem of //TRANSLIT translating accented characters in unexpected ways. This problem is mentioned in this question: php iconv translit for removing accents: not working as excepted?
To make the 'iconv()' approach work, I've added a code sample below that strips all non-word characters from the resulting string using preg_replace().
<?php
setLocale(LC_ALL, 'fr_FR');
$names = array(
'Zoey and another (word) ',
'Émilie and another word',
'Amber',
);
$converted = array();
foreach($names as $name) {
$converted[] = preg_replace('#[^\w\s]+#', '', iconv('UTF-8', 'ASCII//TRANSLIT', $name));
}
sort($converted);
echo '<pre>'; print_r($converted);
// Array
// (
// [0] => Amber
// [1] => Emilie and another word
// [2] => Zoey and another word
// )
Solution 2
There is no native way to do this, however a PECL extension: http://php.net/manual/de/class.collator.php
$c = new Collator('fr_FR');
if ($c->compare('Émily', 'Zoey') < 0) { echo 'Émily < Zoey'; }
Solution 3
I recomend to use the usort
function, to avoid modifying the values, and still compare them correctly.
Example:
<?php
setLocale(LC_ALL, 'fr_FR');
$names = [
'Zoey and another (word) ',
'Émilie and another word',
'Amber'
];
function compare(string $a, string $b) {
$a = preg_replace('#[^\w\s]+#', '', iconv('utf-8', 'ascii//TRANSLIT', $a));
$b = preg_replace('#[^\w\s]+#', '', iconv('utf-8', 'ascii//TRANSLIT', $b));
return strcmp($a, $b);
}
usort($names, 'compare');
echo '<pre>';
print_r($names);
echo '</pre>';
with result:
Array
(
[0] => Amber
[1] => Émilie and another word
[2] => Zoey and another (word)
)
poudigne
I am full-time C# programmer with 10 years of experience. I do game developpement as hobby.
Updated on June 30, 2021Comments
-
poudigne almost 3 years
I'm trying to compare two string lets say Émilie and Zoey. Well 'E' comes before 'Z' but on the ASCII chart Z comes before É so a normal
if ( str1 > str2 )
Won't work.I tried with
if (strcmp(str1,str2) > 0)
still don't work. So i'm looking into a native way to compare string with UTF-8 characters. -
poudigne over 11 yearsSeems a good solution but the client's server is on php 5.2, well if there's no native way i'll rollback to the massive string replace solution :(
-
thaJeztah over 11 yearsYup, iconv() is native. You may need to use set_locale() first, you'll find some examples in the comments below the documentation page
-
Fabian Schmengler over 11 yearsThen better convert them instead of replacing characters manually, the
iconv
solution seems appropiate. -
Esailija over 11 years@PLAudet -1 This is misleading because of the example. The result string is
'Emilie
, so yes, it will appear beforeZ
but it will also appear beforeA
. Please use a collator from php.net/manual/en/intl.requirements.php -
thaJeztah over 11 years@Esailija I stand corrected, you're right about the quote in front of Emilie, hadn't realized that //TRANSLIT did this. I agree that the 'Collator' approach is the official way to do it, but OP stated that he didn't have the option to use that. I added a fix to my answer by preg_replacing non-word characters from the string
-
Esailija over 11 years@thaJeztah only because he thinks it requires php 5.3, however it can be used with 5.2 which is why I linked the requirements page so he can read it more carefully this time ;P
-
thaJeztah over 11 years@Esailija I've added your information to the answer fab provided so that the OP is able to make a decision which approach to take. My edit is currently waiting for review.
-
nickohrn almost 10 yearsThanks for this. Worked for me!
-
Nux over 2 yearsThis is built-in as of PHP 5.3. So you can use e.g.
$collator = collator_create('pl_PL');
and then use a compare functionreturn collator_compare($collator, $a, $b);
. php.net/manual/en/collator.compare.php