Convert utf8-characters to iso-88591 and back in PHP

257,328

Solution 1

Have a look at iconv() or mb_convert_encoding(). Just by the way: why don't utf8_encode() and utf8_decode() work for you?

utf8_decode — Converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1

utf8_encode — Encodes an ISO-8859-1 string to UTF-8

So essentially

$utf8 = 'ÄÖÜ'; // file must be UTF-8 encoded
$iso88591_1 = utf8_decode($utf8);
$iso88591_2 = iconv('UTF-8', 'ISO-8859-1', $utf8);
$iso88591_2 = mb_convert_encoding($utf8, 'ISO-8859-1', 'UTF-8');

$iso88591 = 'ÄÖÜ'; // file must be ISO-8859-1 encoded
$utf8_1 = utf8_encode($iso88591);
$utf8_2 = iconv('ISO-8859-1', 'UTF-8', $iso88591);
$utf8_2 = mb_convert_encoding($iso88591, 'UTF-8', 'ISO-8859-1');

all should do the same - with utf8_en/decode() requiring no special extension, mb_convert_encoding() requiring ext/mbstring and iconv() requiring ext/iconv.

Solution 2

First of all, don't use different encodings. It leads to a mess, and UTF-8 is definitely the one you should be using everywhere.

Chances are your input is not ISO-8859-1, but something else (ISO-8859-15, Windows-1252). To convert from those, use iconv or mb_convert_encoding.

Nevertheless, utf8_encode and utf8_decode should work for ISO-8859-1. It would be nice if you could post a link to a file or a uuencoded or base64 example string for which the conversion fails or yields unexpected results.

Solution 3

It is much better to use

$value = mb_convert_encode($value,'HTML-ENTITIES','UTF-8');

Specially when you are using AJAX call for submitting 'ISO-8859-1' characters. It works for Chinese, Japanese, Czech, German and many more languages.

Solution 4

Use html_entity_decode() and htmlentities().

$html = html_entity_decode(htmlentities($html, ENT_QUOTES, 'UTF-8'), ENT_QUOTES , 'ISO-8859-1');

htmlentities() formats your input into UTF8 and html_entity_decode() formats it back to ISO-8859-1.

Solution 5

set meta tag in head as

 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" /> 

use the link http://www.i18nqa.com/debug/utf8-debug.html to replace the symbols character you want.

then use str_replace like

    $find = array('“', '’', '…', '—', '–', '‘', 'é', 'Â', '•', 'Ëœ', 'â€'); // en dash
                        $replace = array('“', '’', '…', '—', '–', '‘', 'é', '', '•', '˜', '”');
$content = str_replace($find, $replace, $content);

Its the method i use and help alot. Thanks!

Share:
257,328

Related videos on Youtube

qualbeen
Author by

qualbeen

Into web-development, mainly using PHP or node.js

Updated on October 15, 2020

Comments

  • qualbeen
    qualbeen over 3 years

    Some of my script are using different encoding, and when I try to combine them, this has becom an issue.

    But I can't change the encoding they use, instead I want to change the encodig of the result from script A, and use it as parameter in script B.

    So: is there any simple way to change a string from UTF-8 to ISO-88591 in PHP? I have looked at utf_encode and _decode, but they doesn't do what i want. Why doesn't there exsist any "utf2iso()"-function, or similar?

    I don't think I have characters that can't be written in ISO-format, so that shouldn't be an huge issue.

    • Nishan
      Nishan over 15 years
      utf8_decode should exactly be your utf2iso?!?
    • Xeoncross
      Xeoncross over 8 years
      It's worth noting that PHP continues to move to utf-8 internally so any strings you have probably are coming from outside. Set cURL, file access functions, streams, PDO/MySQL, or any other API for accessing outside data to use UTF-8 so that it will already be correct when PHP gets it.
  • qualbeen
    qualbeen over 15 years
    Thanks for a good answer, and you and the others here are right: utf8_decode() seems to get the work done. There must have been some problems with files or my browser. At least I'm no longer able to reproduce the errors... (Maybe I did something wrong with my browser-charset-settings?)
  • thicolares
    thicolares almost 12 years
    Just for the record: I'd faced some situation like that, but I've noticed the iconv has been called twice (nested) to the same str var. After I removed that first call, works like a charm. (utf8_decode and mb_convert_enconding haven't be used)
  • Tyler
    Tyler over 11 years
    This advice helped me to solve a peculiar problem where a UTF-8 string ("Atlántico") was first literally encoded into ISO-8859-1 (looked like "Atlántico") and then these single-byte characters were reencoded back to UTF-8 (looked exactly the same "Atlántico" but each character was UTF-8 encoded this time). utf8_decode() helped because it decoded the UTF-8 characters into their literal ANSI substitutes which were then somehow mysteriously properly read&displayed as UTF-8 characters. Does it makes sense or not? Hmm..
  • Toon Krijthe
    Toon Krijthe over 11 years
    Please try to add some explanation to the code to enhance the educational value of the post.
  • Benubird
    Benubird about 8 years
    iconv, or mb_convert_encoding? iconv requires knowing the input encoding, which might not be the case.
  • phihag
    phihag about 8 years
    @Benubird If you're guessing encoding, you're likely to get into even worse problems (now it's not easily reproducible, since it may depend on the frequency of characters). But you're right, mb_convert_encoding definitely belongs into this answer. Added.
  • GordonM
    GordonM about 7 years
    "Avoid any encoding other than UTF8" is good advice in general but sometimes it's not possible. For example we're trying to get a 3rd party integration working where the party demands XML in Latin 1 format.
  • b4tch
    b4tch over 3 years
    For anyone else that uses this solution, be aware the function is actually mb_convert_encoding
  • Harinarayan
    Harinarayan about 2 years
    how to convert "सूगी, पोखरी तहसील में भारत à¤" characters to unicode human readable please suggest.
  • Stefan Gehrig
    Stefan Gehrig about 2 years
    @Harinarayan If you don't have a clue what encoding might have been used for the string, you're out of luck. There's no way to determine the encoding by just looking at the string. You can only guess.