Convert utf8-characters to iso-88591 and back in PHP

php encoding utf-8 iso-8859-1

257,328

Solution 1

Have a look at iconv() or mb_convert_encoding(). Just by the way: why don't utf8_encode() and utf8_decode() work for you?

utf8_decode — Converts a string with ISO-8859-1 characters encoded with UTF-8 to single-byte ISO-8859-1

utf8_encode — Encodes an ISO-8859-1 string to UTF-8

So essentially

$utf8 = 'ÄÖÜ'; // file must be UTF-8 encoded
$iso88591_1 = utf8_decode($utf8);
$iso88591_2 = iconv('UTF-8', 'ISO-8859-1', $utf8);
$iso88591_2 = mb_convert_encoding($utf8, 'ISO-8859-1', 'UTF-8');

$iso88591 = 'ÄÖÜ'; // file must be ISO-8859-1 encoded
$utf8_1 = utf8_encode($iso88591);
$utf8_2 = iconv('ISO-8859-1', 'UTF-8', $iso88591);
$utf8_2 = mb_convert_encoding($iso88591, 'UTF-8', 'ISO-8859-1');

all should do the same - with utf8_en/decode() requiring no special extension, mb_convert_encoding() requiring ext/mbstring and iconv() requiring ext/iconv.

Solution 2

First of all, don't use different encodings. It leads to a mess, and UTF-8 is definitely the one you should be using everywhere.

Chances are your input is not ISO-8859-1, but something else (ISO-8859-15, Windows-1252). To convert from those, use iconv or mb_convert_encoding.

Nevertheless, utf8_encode and utf8_decode should work for ISO-8859-1. It would be nice if you could post a link to a file or a uuencoded or base64 example string for which the conversion fails or yields unexpected results.

Solution 3

It is much better to use

$value = mb_convert_encode($value,'HTML-ENTITIES','UTF-8');

Specially when you are using AJAX call for submitting 'ISO-8859-1' characters. It works for Chinese, Japanese, Czech, German and many more languages.

Solution 4

Use html_entity_decode() and htmlentities().

$html = html_entity_decode(htmlentities($html, ENT_QUOTES, 'UTF-8'), ENT_QUOTES , 'ISO-8859-1');

htmlentities() formats your input into UTF8 and html_entity_decode() formats it back to ISO-8859-1.

Solution 5

set meta tag in head as

 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />

use the link http://www.i18nqa.com/debug/utf8-debug.html to replace the symbols character you want.

then use str_replace like

    $find = array('â€œ', 'â€™', 'â€¦', 'â€”', 'â€“', 'â€˜', 'Ã©', 'Â', 'â€¢', 'Ëœ', 'â€'); // en dash
                        $replace = array('“', '’', '…', '—', '–', '‘', 'é', '', '•', '˜', '”');
$content = str_replace($find, $replace, $content);

Its the method i use and help alot. Thanks!

View more solutions

257,328

qualbeen

Into web-development, mainly using PHP or node.js

Updated on October 15, 2020

Comments

qualbeen over 3 years

Some of my script are using different encoding, and when I try to combine them, this has becom an issue.

But I can't change the encoding they use, instead I want to change the encodig of the result from script A, and use it as parameter in script B.

So: is there any simple way to change a string from UTF-8 to ISO-88591 in PHP? I have looked at utf_encode and _decode, but they doesn't do what i want. Why doesn't there exsist any "utf2iso()"-function, or similar?

I don't think I have characters that can't be written in ISO-format, so that shouldn't be an huge issue.
- Nishan over 15 years
  
  utf8_decode should exactly be your utf2iso?!?
- Xeoncross over 8 years
  
  It's worth noting that PHP continues to move to utf-8 internally so any strings you have probably are coming from outside. Set cURL, file access functions, streams, PDO/MySQL, or any other API for accessing outside data to use UTF-8 so that it will already be correct when PHP gets it.
qualbeen over 15 years

Thanks for a good answer, and you and the others here are right: utf8_decode() seems to get the work done. There must have been some problems with files or my browser. At least I'm no longer able to reproduce the errors... (Maybe I did something wrong with my browser-charset-settings?)
thicolares almost 12 years

Just for the record: I'd faced some situation like that, but I've noticed the iconv has been called twice (nested) to the same str var. After I removed that first call, works like a charm. (utf8_decode and mb_convert_enconding haven't be used)
Tyler over 11 years

This advice helped me to solve a peculiar problem where a UTF-8 string ("Atlántico") was first literally encoded into ISO-8859-1 (looked like "AtlÃ¡ntico") and then these single-byte characters were reencoded back to UTF-8 (looked exactly the same "AtlÃ¡ntico" but each character was UTF-8 encoded this time). utf8_decode() helped because it decoded the UTF-8 characters into their literal ANSI substitutes which were then somehow mysteriously properly read&displayed as UTF-8 characters. Does it makes sense or not? Hmm..
Toon Krijthe over 11 years

Please try to add some explanation to the code to enhance the educational value of the post.
Benubird about 8 years

iconv, or mb_convert_encoding? iconv requires knowing the input encoding, which might not be the case.
phihag about 8 years

@Benubird If you're guessing encoding, you're likely to get into even worse problems (now it's not easily reproducible, since it may depend on the frequency of characters). But you're right, mb_convert_encoding definitely belongs into this answer. Added.
GordonM about 7 years

"Avoid any encoding other than UTF8" is good advice in general but sometimes it's not possible. For example we're trying to get a 3rd party integration working where the party demands XML in Latin 1 format.
b4tch over 3 years

For anyone else that uses this solution, be aware the function is actually mb_convert_encoding
Harinarayan about 2 years

how to convert "à¤¸à¥‚à¤—à¥€, à¤ªà¥‹à¤–à¤°à¥€ à¤¤à¤¹à¤¸à¥€à¤² à¤®à¥‡à¤‚ à¤à¤¾à¤°à¤¤ à¤" characters to unicode human readable please suggest.
Stefan Gehrig about 2 years

@Harinarayan If you don't have a clue what encoding might have been used for the string, you're out of luck. There's no way to determine the encoding by just looking at the string. You can only guess.