How to convert 'u00e9' into a utf8 char, in mysql or php?

66,282

Solution 1

There's a way. Replace all uXXXX with their HTML representation and do an html_entity_decode()

I.e. echo html_entity_decode("Jalostotitlán");

Every UTF character in the form u1234 could be printed in HTML as ሴ. But doing a replace is quite hard, because there could be much false positives if there is no other char that identifies the beginning of an UTF sequence. A simple regex could be

preg_replace('/u([\da-fA-F]{4})/', '&#x\1;', $str)

Solution 2

/* Function php for convert utf8 html to ansi */

public static function Utf8_ansi($valor='') {

    $utf8_ansi2 = array(
    "\u00c0" =>"À",
    "\u00c1" =>"Á",
    "\u00c2" =>"Â",
    "\u00c3" =>"Ã",
    "\u00c4" =>"Ä",
    "\u00c5" =>"Å",
    "\u00c6" =>"Æ",
    "\u00c7" =>"Ç",
    "\u00c8" =>"È",
    "\u00c9" =>"É",
    "\u00ca" =>"Ê",
    "\u00cb" =>"Ë",
    "\u00cc" =>"Ì",
    "\u00cd" =>"Í",
    "\u00ce" =>"Î",
    "\u00cf" =>"Ï",
    "\u00d1" =>"Ñ",
    "\u00d2" =>"Ò",
    "\u00d3" =>"Ó",
    "\u00d4" =>"Ô",
    "\u00d5" =>"Õ",
    "\u00d6" =>"Ö",
    "\u00d8" =>"Ø",
    "\u00d9" =>"Ù",
    "\u00da" =>"Ú",
    "\u00db" =>"Û",
    "\u00dc" =>"Ü",
    "\u00dd" =>"Ý",
    "\u00df" =>"ß",
    "\u00e0" =>"à",
    "\u00e1" =>"á",
    "\u00e2" =>"â",
    "\u00e3" =>"ã",
    "\u00e4" =>"ä",
    "\u00e5" =>"å",
    "\u00e6" =>"æ",
    "\u00e7" =>"ç",
    "\u00e8" =>"è",
    "\u00e9" =>"é",
    "\u00ea" =>"ê",
    "\u00eb" =>"ë",
    "\u00ec" =>"ì",
    "\u00ed" =>"í",
    "\u00ee" =>"î",
    "\u00ef" =>"ï",
    "\u00f0" =>"ð",
    "\u00f1" =>"ñ",
    "\u00f2" =>"ò",
    "\u00f3" =>"ó",
    "\u00f4" =>"ô",
    "\u00f5" =>"õ",
    "\u00f6" =>"ö",
    "\u00f8" =>"ø",
    "\u00f9" =>"ù",
    "\u00fa" =>"ú",
    "\u00fb" =>"û",
    "\u00fc" =>"ü",
    "\u00fd" =>"ý",
    "\u00ff" =>"ÿ");

    return strtr($valor, $utf8_ansi2);      

}

Solution 3

My twitter timeline script returns the special characters like é into \u00e9 so I stripped the backslash and used @rubbude his preg_replace.

// Fix uxxxx charcoding to html
$text = "De #Haarstichting is h\u00e9t medium voor alles Into:  De #Haarstichting is hét medium voor alles";
$str     = str_replace('\u','u',$text);
$str_replaced = preg_replace('/u([\da-fA-F]{4})/', '&#x\1;', $str);

echo $str_replaced;

It workes for me and it turns: De #Haarstichting is h\u00e9t medium voor alles Into: De #Haarstichting is hét medium voor alles

Share:
66,282

Related videos on Youtube

carpii
Author by

carpii

Updated on September 03, 2020

Comments

  • carpii
    carpii over 3 years

    Im doing some data cleansing on some messy data which is being imported into mysql.

    The data contains 'pseudo' unicode chars, which are actually embedded into the strings as 'u00e9' etc.

    So one field might be.. 'Jalostotitlu00e1n' I need to rip out that clumsy 'u00e1n' and replace it with the corresponding utf character

    I can do this in either mysql, using substring and CHR maybe, but Im preprocssing the data via PHP, so I could do it there also.

    I already know all about how to configure mysql and php to work with utf data. The problem is really just in the source data Im importing.

    Thanks

    • Ignacio Vazquez-Abrams
      Ignacio Vazquez-Abrams almost 13 years
      There is no such thing as "a UTF-8 character". Perhaps you meant "the UTF-8 encoding of the Unicode character with that codepoint".
    • Ignacio Vazquez-Abrams
      Ignacio Vazquez-Abrams almost 13 years
      @deceze: Technically that's called a "UTF-8 sequence".
  • rabudde
    rabudde over 10 years
    no! don't strip the backslash from \u, because it could be used as identifier. use a modified regex preg_replace('/\\u([\da-fA-F]{4})/', '&#x\1;', $str) instead
  • Theo
    Theo over 10 years
    Right, that's what I need. Offcourse my stripping is wrong, it strips the only identifier I had. Thank you @rabbude I am testing this tonight and will update this answer with your preg_replace.
  • Theo
    Theo over 10 years
    Right @rabbude, now I remember why I didn't use the \\u myself: Warning: preg_replace() [function.preg-replace]: Compilation failed: PCRE does not support \L, \l, \N{name}, \U, or \u at offset 1
  • rabudde
    rabudde over 10 years
    Sorry, this could be my fault, try to double escape it: preg_replace('/\\\\u([\da-fA-F]{4})/', '&#x\1;', $str)