An apostrophe is rendering as â€tm. What PHP function will display it as ' ? something_Decode?

35,716

Solution 1

You could try to use the following function:

function htmlallentities($str){
  $res = '';
  $strlen = strlen($str);
  for($i=0; $i<$strlen; $i++){
    $byte = ord($str[$i]);
    if($byte < 128) // 1-byte char
      $res .= $str[$i];
    elseif($byte < 192); // invalid utf8
    elseif($byte < 224) // 2-byte char
      $res .= '&#'.((63&$byte)*64 + (63&ord($str[++$i]))).';';
    elseif($byte < 240) // 3-byte char
      $res .= '&#'.((15&$byte)*4096 + (63&ord($str[++$i]))*64 + (63&ord($str[++$i]))).';';
    elseif($byte < 248) // 4-byte char
      $res .= '&#'.((15&$byte)*262144 + (63&ord($str[++$i]))*4096 + (63&ord($str[++$i]))*64 + (63&ord($str[++$i]))).';';
  }
  return $res;
}

call:

$str = htmlallentities($str);

this will change utf-8-chars into htmlentities, so you can display them in different encodings.

Solution 2

I was having trouble in Chrome with this.

Adding a

<meta http-equiv="content-type" content="text/html;charset=utf-8" />

to the "head" section fixes it

Solution 3

I battled with this for almost a day and then found that this function will work 100% of the time. It works with utf-8 and unicode and converts characters that are beyond the base ascii set into their html entities. It's good for cleaning up MS Word rubbish.

function filterText($text) 
{ 

    //UTF-8 filter
    $conv = array(
          "\xC2\xA0" => '&nbsp;',
          "\xC2\xA1" => '&iexcl;',
          "\xC2\xA2" => '&cent;',
          "\xC2\xA3" => '&pound;',
          "\xC2\xA4" => '&curren;',
          "\xC2\xA5" => '&yen;',
          "\xC2\xA6" => '&brvbar;',
          "\xC2\xA7" => '&sect;',
          "\xC2\xA8" => '&uml;',
          "\xC2\xA9" => '&copy;',
          "\xC2\xAA" => '&ordf;',
          "\xC2\xAB" => '&laquo;',
          "\xC2\xAC" => '&not;',
          "\xC2\xAD" => '&shy;',
          "\xC2\xAE" => '&reg;',
          "\xC2\xAF" => '&macr;',
          "\xC2\xB0" => '&deg;',
          "\xC2\xB1" => '&plusmn;',
          "\xC2\xB2" => '&sup2;',
          "\xC2\xB3" => '&sup3;',
          "\xC2\xB4" => '&acute;',
          "\xC2\xB5" => '&micro;',
          "\xC2\xB6" => '&para;',
          "\xC2\xB7" => '&middot;',
          "\xC2\xB8" => '&cedil;',
          "\xC2\xB9" => '&sup1;',
          "\xC2\xBA" => '&ordm;',
          "\xC2\xBB" => '&raquo;',
          "\xC2\xBC" => '&frac14;',
          "\xC2\xBD" => '&frac12;',
          "\xC2\xBE" => '&frac34;',
          "\xC2\xBF" => '&iquest;',
          "\xC3\x80" => '&Agrave;',
          "\xC3\x81" => '&Aacute;',
          "\xC3\x82" => '&Acirc;',
          "\xC3\x83" => '&Atilde;',
          "\xC3\x84" => '&Auml;',
          "\xC3\x85" => '&Aring;',
          "\xC3\x86" => '&AElig;',
          "\xC3\x87" => '&Ccedil;',
          "\xC3\x88" => '&Egrave;',
          "\xC3\x89" => '&Eacute;',
          "\xC3\x8A" => '&Ecirc;',
          "\xC3\x8B" => '&Euml;',
          "\xC3\x8C" => '&Igrave;',
          "\xC3\x8D" => '&Iacute;',
          "\xC3\x8E" => '&Icirc;',
          "\xC3\x8F" => '&Iuml;',
          "\xC3\x90" => '&ETH;',
          "\xC3\x91" => '&Ntilde;',
          "\xC3\x92" => '&Ograve;',
          "\xC3\x93" => '&Oacute;',
          "\xC3\x94" => '&Ocirc;',
          "\xC3\x95" => '&Otilde;',
          "\xC3\x96" => '&Ouml;',
          "\xC3\x97" => '&times;',
          "\xC3\x98" => '&Oslash;',
          "\xC3\x99" => '&Ugrave;',
          "\xC3\x9A" => '&Uacute;',
          "\xC3\x9B" => '&Ucirc;',
          "\xC3\x9C" => '&Uuml;',
          "\xC3\x9D" => '&Yacute;',
          "\xC3\x9E" => '&THORN;',
          "\xC3\x9F" => '&szlig;',
          "\xC3\xA0" => '&agrave;',
          "\xC3\xA1" => '&aacute;',
          "\xC3\xA2" => '&acirc;',
          "\xC3\xA3" => '&atilde;',
          "\xC3\xA4" => '&auml;',
          "\xC3\xA5" => '&aring;',
          "\xC3\xA6" => '&aelig;',
          "\xC3\xA7" => '&ccedil;',
          "\xC3\xA8" => '&egrave;',
          "\xC3\xA9" => '&eacute;',
          "\xC3\xAA" => '&ecirc;',
          "\xC3\xAB" => '&euml;',
          "\xC3\xAC" => '&igrave;',
          "\xC3\xAD" => '&iacute;',
          "\xC3\xAE" => '&icirc;',
          "\xC3\xAF" => '&iuml;',
          "\xC3\xB0" => '&eth;',
          "\xC3\xB1" => '&ntilde;',
          "\xC3\xB2" => '&ograve;',
          "\xC3\xB3" => '&oacute;',
          "\xC3\xB4" => '&ocirc;',
          "\xC3\xB5" => '&otilde;',
          "\xC3\xB6" => '&ouml;',
          "\xC3\xB7" => '&divide;',
          "\xC3\xB8" => '&oslash;',
          "\xC3\xB9" => '&ugrave;',
          "\xC3\xBA" => '&uacute;',
          "\xC3\xBB" => '&ucirc;',
          "\xC3\xBC" => '&uuml;',
          "\xC3\xBD" => '&yacute;',
          "\xC3\xBE" => '&thorn;',
          "\xC3\xBF" => '&yuml;',
          // Latin Extended-A
          "\xC5\x92" => '&OElig;',
          "\xC5\x93" => '&oelig;',
          "\xC5\xA0" => '&Scaron;',
          "\xC5\xA1" => '&scaron;',
          "\xC5\xB8" => '&Yuml;',
          // Spacing Modifier Letters
          "\xCB\x86" => '&circ;',
          "\xCB\x9C" => '&tilde;',
          // General Punctuation
          "\xE2\x80\x82" => '&ensp;',
          "\xE2\x80\x83" => '&emsp;',
          "\xE2\x80\x89" => '&thinsp;',
          "\xE2\x80\x8C" => '&zwnj;',
          "\xE2\x80\x8D" => '&zwj;',
          "\xE2\x80\x8E" => '&lrm;',
          "\xE2\x80\x8F" => '&rlm;',
          "\xE2\x80\x93" => '&ndash;',
          "\xE2\x80\x94" => '&mdash;',
          "\xE2\x80\x98" => '&lsquo;',
          "\xE2\x80\x99" => '&rsquo;',
          "\xE2\x80\x9A" => '&sbquo;',
          "\xE2\x80\x9C" => '&ldquo;',
          "\xE2\x80\x9D" => '&rdquo;',
          "\xE2\x80\x9E" => '&bdquo;',
          "\xE2\x80\xA0" => '&dagger;',
          "\xE2\x80\xA1" => '&Dagger;',
          "\xE2\x80\xB0" => '&permil;',
          "\xE2\x80\xB9" => '&lsaquo;',
          "\xE2\x80\xBA" => '&rsaquo;',
          "\xE2\x82\xAC" => '&euro;',
          // Latin Extended-B
          "\xC6\x92" => '&fnof;',
          // Greek
          "\xCE\x91" => '&Alpha;',
          "\xCE\x92" => '&Beta;',
          "\xCE\x93" => '&Gamma;',
          "\xCE\x94" => '&Delta;',
          "\xCE\x95" => '&Epsilon;',
          "\xCE\x96" => '&Zeta;',
          "\xCE\x97" => '&Eta;',
          "\xCE\x98" => '&Theta;',
          "\xCE\x99" => '&Iota;',
          "\xCE\x9A" => '&Kappa;',
          "\xCE\x9B" => '&Lambda;',
          "\xCE\x9C" => '&Mu;',
          "\xCE\x9D" => '&Nu;',
          "\xCE\x9E" => '&Xi;',
          "\xCE\x9F" => '&Omicron;',
          "\xCE\xA0" => '&Pi;',
          "\xCE\xA1" => '&Rho;',
          "\xCE\xA3" => '&Sigma;',
          "\xCE\xA4" => '&Tau;',
          "\xCE\xA5" => '&Upsilon;',
          "\xCE\xA6" => '&Phi;',
          "\xCE\xA7" => '&Chi;',
          "\xCE\xA8" => '&Psi;',
          "\xCE\xA9" => '&Omega;',
          "\xCE\xB1" => '&alpha;',
          "\xCE\xB2" => '&beta;',
          "\xCE\xB3" => '&gamma;',
          "\xCE\xB4" => '&delta;',
          "\xCE\xB5" => '&epsilon;',
          "\xCE\xB6" => '&zeta;',
          "\xCE\xB7" => '&eta;',
          "\xCE\xB8" => '&theta;',
          "\xCE\xB9" => '&iota;',
          "\xCE\xBA" => '&kappa;',
          "\xCE\xBB" => '&lambda;',
          "\xCE\xBC" => '&mu;',
          "\xCE\xBD" => '&nu;',
          "\xCE\xBE" => '&xi;',
          "\xCE\xBF" => '&omicron;',
          "\xCF\x80" => '&pi;',
          "\xCF\x81" => '&rho;',
          "\xCF\x82" => '&sigmaf;',
          "\xCF\x83" => '&sigma;',
          "\xCF\x84" => '&tau;',
          "\xCF\x85" => '&upsilon;',
          "\xCF\x86" => '&phi;',
          "\xCF\x87" => '&chi;',
          "\xCF\x88" => '&psi;',
          "\xCF\x89" => '&omega;',
          "\xCF\x91" => '&thetasym;',
          "\xCF\x92" => '&upsih;',
          "\xCF\x96" => '&piv;',
          // General Punctuation
          "\xE2\x80\xA2" => '&bull;',
          "\xE2\x80\xA6" => '&hellip;',
          "\xE2\x80\xB2" => '&prime;',
          "\xE2\x80\xB3" => '&Prime;',
          "\xE2\x80\xBE" => '&oline;',
          "\xE2\x81\x84" => '&frasl;',
          // Letterlike Symbols
          "\xE2\x84\x98" => '&weierp;',
          "\xE2\x84\x91" => '&image;',
          "\xE2\x84\x9C" => '&real;',
          "\xE2\x84\xA2" => '&trade;',
          "\xE2\x84\xB5" => '&alefsym;',
          // Arrows
          "\xE2\x86\x90" => '&larr;',
          "\xE2\x86\x91" => '&uarr;',
          "\xE2\x86\x92" => '&rarr;',
          "\xE2\x86\x93" => '&darr;',
          "\xE2\x86\x94" => '&harr;',
          "\xE2\x86\xB5" => '&crarr;',
          "\xE2\x87\x90" => '&lArr;',
          "\xE2\x87\x91" => '&uArr;',
          "\xE2\x87\x92" => '&rArr;',
          "\xE2\x87\x93" => '&dArr;',
          "\xE2\x87\x94" => '&hArr;',
          // Mathematical Operators
          "\xE2\x88\x80" => '&forall;',
          "\xE2\x88\x82" => '&part;',
          "\xE2\x88\x83" => '&exist;',
          "\xE2\x88\x85" => '&empty;',
          "\xE2\x88\x87" => '&nabla;',
          "\xE2\x88\x88" => '&isin;',
          "\xE2\x88\x89" => '&notin;',
          "\xE2\x88\x8B" => '&ni;',
          "\xE2\x88\x8F" => '&prod;',
          "\xE2\x88\x91" => '&sum;',
          "\xE2\x88\x92" => '&minus;',
          "\xE2\x88\x97" => '&lowast;',
          "\xE2\x88\x9A" => '&radic;',
          "\xE2\x88\x9D" => '&prop;',
          "\xE2\x88\x9E" => '&infin;',
          "\xE2\x88\xA0" => '&ang;',
          "\xE2\x88\xA7" => '&and;',
          "\xE2\x88\xA8" => '&or;',
          "\xE2\x88\xA9" => '&cap;',
          "\xE2\x88\xAA" => '&cup;',
          "\xE2\x88\xAB" => '&int;',
          "\xE2\x88\xB4" => '&there4;',
          "\xE2\x88\xBC" => '&sim;',
          "\xE2\x89\x85" => '&cong;',
          "\xE2\x89\x88" => '&asymp;',
          "\xE2\x89\xA0" => '&ne;',
          "\xE2\x89\xA1" => '&equiv;',
          "\xE2\x89\xA4" => '&le;',
          "\xE2\x89\xA5" => '&ge;',
          "\xE2\x8A\x82" => '&sub;',
          "\xE2\x8A\x83" => '&sup;',
          "\xE2\x8A\x84" => '&nsub;',
          "\xE2\x8A\x86" => '&sube;',
          "\xE2\x8A\x87" => '&supe;',
          "\xE2\x8A\x95" => '&oplus;',
          "\xE2\x8A\x97" => '&otimes;',
          "\xE2\x8A\xA5" => '&perp;',
          "\xE2\x8B\x85" => '&sdot;',
          // Miscellaneous Technical
          "\xE2\x8C\x88" => '&lceil;',
          "\xE2\x8C\x89" => '&rceil;',
          "\xE2\x8C\x8A" => '&lfloor;',
          "\xE2\x8C\x8B" => '&rfloor;',
          "\xE2\x8C\xA9" => '&lang;',
          "\xE2\x8C\xAA" => '&rang;',
          // Geometric Shapes
          "\xE2\x97\x8A" => '&loz;',
          // Miscellaneous Symbols
          "\xE2\x99\xA0" => '&spades;',
          "\xE2\x99\xA3" => '&clubs;',
          "\xE2\x99\xA5" => '&hearts;',
          "\xE2\x99\xA6" => '&diams;'
   );

    $string = strtr($text, $conv); 

    //now translate any unicode stuff... 
    $conv = array(
        chr(128) => "&euro;",
        chr(130) => "&sbquo;",
        chr(131) => "&fnof;",
        chr(132) => "&bdquo;",
        chr(133) => "&hellip;",
        chr(134) => "&dagger;",
        chr(135) => "&Dagger;",
        chr(136) => "&circ;",
        chr(137) => "&permil;",
        chr(138) => "&Scaron;",
        chr(139) => "&lsaquo;",
        chr(140) => "&OElig;",
        chr(145) => "&lsquo;",
        chr(146) => "&rsquo;",
        chr(147) => "&ldquo;",
        chr(148) => "&rdquo;",
        chr(149) => "&bull;",
        chr(150) => "&ndash;",
        chr(151) => "&mdash;",
        chr(152) => "&tilde;",
        chr(153) => "&trade;",
        chr(154) => "&scaron;",
        chr(155) => "&rsaquo;",
        chr(156) => "&oelig;",
        chr(159) => "&yuml;",
        chr(160) => "&nbsp;",
        chr(161) => "&iexcl;",
        chr(162) => "&cent;",
        chr(163) => "&pound;",
        chr(164) => "&curren;",
        chr(165) => "&yen;",
        chr(166) => "&brvbar;",
        chr(167) => "&sect;",
        chr(168) => "&uml;",
        chr(169) => "&copy;",
        chr(170) => "&ordf;",
        chr(171) => "&laquo;",
        chr(172) => "&not;",
        chr(173) => "&shy;",
        chr(174) => "&reg;",
        chr(175) => "&macr;",
        chr(176) => "&deg;",
        chr(177) => "&plusmn;",
        chr(178) => "&sup2;",
        chr(179) => "&sup3;",
        chr(180) => "&acute;",
        chr(181) => "&micro;",
        chr(182) => "&para;",
        chr(183) => "&middot;",
        chr(184) => "&cedil;",
        chr(185) => "&sup1;",
        chr(186) => "&ordm;",
        chr(187) => "&raquo;",
        chr(188) => "&frac14;",
        chr(189) => "&frac12;",
        chr(190) => "&frac34;",
        chr(191) => "&iquest;",
        chr(192) => "&Agrave;",
        chr(193) => "&Aacute;",
        chr(194) => "&Acirc;",
        chr(195) => "&Atilde;",
        chr(196) => "&Auml;",
        chr(197) => "&Aring;",
        chr(198) => "&AElig;",
        chr(199) => "&Ccedil;",
        chr(200) => "&Egrave;",
        chr(201) => "&Eacute;",
        chr(202) => "&Ecirc;",
        chr(203) => "&Euml;",
        chr(204) => "&Igrave;",
        chr(205) => "&Iacute;",
        chr(206) => "&Icirc;",
        chr(207) => "&Iuml;",
        chr(208) => "&ETH;",
        chr(209) => "&Ntilde;",
        chr(210) => "&Ograve;",
        chr(211) => "&Oacute;",
        chr(212) => "&Ocirc;",
        chr(213) => "&Otilde;",
        chr(214) => "&Ouml;",
        chr(215) => "&times;",
        chr(216) => "&Oslash;",
        chr(217) => "&Ugrave;",
        chr(218) => "&Uacute;",
        chr(219) => "&Ucirc;",
        chr(220) => "&Uuml;",
        chr(221) => "&Yacute;",
        chr(222) => "&THORN;",
        chr(223) => "&szlig;",
        chr(224) => "&agrave;",
        chr(225) => "&aacute;",
        chr(226) => "&acirc;",
        chr(227) => "&atilde;",
        chr(228) => "&auml;",
        chr(229) => "&aring;",
        chr(230) => "&aelig;",
        chr(231) => "&ccedil;",
        chr(232) => "&egrave;",
        chr(233) => "&eacute;",
        chr(234) => "&ecirc;",
        chr(235) => "&euml;",
        chr(236) => "&igrave;",
        chr(237) => "&iacute;",
        chr(238) => "&icirc;",
        chr(239) => "&iuml;",
        chr(240) => "&eth;",
        chr(241) => "&ntilde;",
        chr(242) => "&ograve;",
        chr(243) => "&oacute;",
        chr(244) => "&ocirc;",
        chr(245) => "&otilde;",
        chr(246) => "&ouml;",
        chr(247) => "&divide;",
        chr(248) => "&oslash;",
        chr(249) => "&ugrave;",
        chr(250) => "&uacute;",
        chr(251) => "&ucirc;",
        chr(252) => "&uuml;",
        chr(253) => "&yacute;",
        chr(254) => "&thorn;",
        chr(255) => "&yuml;");


return strtr($string, $conv);   


}

Solution 4

str_replace('â€tm', "'", $dirty_string) might give you a quick and dirty fix. But it seems to me like a character encoding problem. You may read the tweets using an encoding and displaying them in another encoding.

You'd have to check your code and make sure you use the same encoding all over the place if you wanna do this the "clean way".

Share:
35,716

Related videos on Youtube

bflora2
Author by

bflora2

Updated on November 14, 2020

Comments

  • bflora2
    bflora2 over 3 years

    I'm grabbing some tweets and printing them out on my site and curly apostrophes are being rendered as "â€tm". This is not good. What php function should I run the string through to get these weird characters to display as something closer to '?

    • bflora2
      bflora2 over 13 years
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  • bflora2
    bflora2 over 13 years
    Is there a PHP function I can run this string through on the fly to clean it up? I.E. UTF8_encode($string); ?
  • bflora2
    bflora2 over 13 years
    I'm trying the str_replace approach, but for some reason the function isn't "seeing" ’ when it goes through it looking for the string to replace. So nothing happens when I print out that function. The text stays unchanged.
  • bflora2
    bflora2 over 13 years
    I added that function to my page, ran the string through an printed out the result, but it didn't change anything in the string. Is there something more I should do with this code to make it do what it do something?
  • Floern
    Floern over 13 years
    Nothing changed? I think you have to call utf8_decode() before: $str = htmlallentities(utf8_decode($str));
  • bflora2
    bflora2 over 13 years
    Progress! That changed the foreign characters into ⿿. :) Now what?
  • Quamis
    Quamis over 13 years
    are you sure you're using utf8 encoding? how did you get the â€tm char? did you use view-source or simply copy/paste from the page? you should use view source. The character might appear differently in the page source
  • bflora2
    bflora2 over 13 years
    It did not. After that last step, I checked the source and the weird square things were shown as "&#12287;". I ran a string replace for them and they're showing ' now. It's dirty, but it worked. Thanks.
  • bflora2
    bflora2 over 13 years
    Good point! View source showed it as &#12287;. String replace worked for replace those. Thanks.
  • Mark
    Mark over 11 years
    This was all I needed to fix it displaying in my circumstances, cheers.

Related