UTF-8 to Unicode Code Points

php unicode utf-8

35,626

Solution 1

Converting one character set to another can be done with iconv:

http://php.net/manual/en/function.iconv.php

Note that UTF is already an Unicode encoding.

Another way is simply using htmlentities with the right character set:

http://php.net/manual/en/function.htmlentities.php

Solution 2

For a readable-form I would go with JSON. It's not required to escape non-ASCII characters in JSON, but PHP does:

echo json_encode("tchüß");

"tch\u00fc\u00df"

Solution 3

With PHP 7, there is a new IntlChar::ord() to find the Unicode Code Point from a given UTF-8 character:

var_dump(sprintf('U+%04X', IntlChar::ord('ß')));

# Outputs: string(6) "U+00DF"

Solution 4

I guess you're going to print out your strings on a website?

I'm storing all my databases in uft8, using html_entities($string) before output.

Maybe you have to try html_entities(utf8_encode($string));

Solution 5

I once created a function called _convert() which encodes safely everything to UTF-8.

View more solutions

35,626

Author by

Adrien Hingert

Updated on July 22, 2022

Comments

Adrien Hingert almost 2 years
Is there a function that will change UTF-8 to Unicode leaving non special characters as normal letters and numbers?

ie the German word "tchüß" would be rendered as something like "tch\20AC\21AC" (please note that I am making the Unicode codes up).

EDIT: I am experimenting with the following function, but although this one works well with ASCII 32-127, it seems to fail for double byte chars:
```
function strToHex ($string)
{
    $hex = '';
    for ($i = 0; $i < mb_strlen ($string, "utf-8"); $i++)
    {
        $id = ord (mb_substr ($string, $i, 1, "utf-8"));
        $hex .= ($id <= 128) ? mb_substr ($string, $i, 1, "utf-8") : "&#" . $id . ";";
}

    return ($hex);
}
```
Any ideas?

EDIT 2: Found solution: The PHP ord() function does not work for double byte chars. Use instead: http://nl.php.net/manual/en/function.ord.php#78032
Amit Patil over 12 years

htmlentities only converts characters for which there are entities defined in the HTML language, though, which only covers a small subset of Unicode. Unfortunately it does not create &#...; character references for other characters.
Luwe over 12 years

I'm aware, but also iconv tends to give some problems. Not all characters seem to get perfectly converted for every character set. That's why I mentioned the htmlentities function. It was also suggested in the comments on the iconv function page: nl.php.net/manual/en/function.iconv.php#81494
Adrien Hingert over 12 years

Interesting, never thought of this!
Anthony about 11 years

Brilliant! Works like a charm.. :)
eis almost 7 years

Note that you need extension=php_intl.dll enabled in PHP.ini for this class to be present.
eis almost 7 years

you could add the answer here, and not as a link.
William R about 6 years

JSON requires, by default, the escaping of non-ASCII characters. And you should do it every time.
Basster almost 6 years

Great solution!
Ulrich Eckhardt over 5 years

@WilliamR, why do you think so? JSON is by definition UTF-8, which is fully Unicode-capable. Escaping anything that is Unicode is not necessary.
William R over 5 years

Well, this is obvious to use UTF-8 for JSON. But escaping unicodes by ASCII ("é" comes \u00e9) is a good way to protect your data against a bad "charset" set in the headers of a HTTP transmission or over badly programmed code or even worse, a JSON inside a CDATA tag in a ISO-Latin1 XML file.