Convert ISO8859 String to UTF8? ÄÖÜ => ÃÃ why?

32,431

Solution 1

I hope this will solve your problem.

String readable = "äöüÄÖÜßáéíóúÁÉÍÓÚàèìòùÀÈÌÒÙñÑ";

try {
    String unreadable = new String(readable.getBytes("UTF-8"), "ISO-8859-15");
    // unreadable -> äöüÃÃÃÃáéíóúÃÃÃÃÃàèìòùÃÃÃÃÃñÃ
} catch (UnsupportedEncodingException e) {
    // handle error
}

And:

String unreadable = "äöüÃÃÃÃáéíóúÃÃÃÃÃàèìòùÃÃÃÃÃñÃ";

try {
    String readable = new String(unreadable.getBytes("ISO-8859-15"), "UTF-8");
    // readable -> äöüÄÖÜßáéíóúÁÉÍÓÚàèìòùÀÈÌÒÙñÑ
} catch (UnsupportedEncodingException e) {
    // ...
}

Solution 2

A construct such as new String("Üü?öäABC".getBytes(), "ISO-8859-15"); is almost always an error.

What you're doing here is taking a String object, getting the corresponding byte[] in the platform default encoding and re-interpreting it as ISO-8859-15 to convert it back to a String.

If the platform default encoding happens to be ISO-8859-15 (or near enough to make no difference for this particular String, for example ISO-8859-1), then it is a no-op (i.e. it has no real effect).

In all other cases it will most likely destroy the String.

If you try to "fix" a String, then you're probably too late: if you have to use a specific encoding to read data, then you should use it at the point where binary data is converted to String data. For example if you read from an InputStream, you need to pass the correct encoding to the constructor of the InputStreamReader.

Trying to fix the problem "after the fact" will be

  1. harder to do and
  2. often not even possible (because decoding a byte[] with the wrong encoding can be a destructive operation).

Solution 3

String s = new String("Üü?öäABC".getBytes(), "ISO-8859-15"); //bug

All this code does is corrupt data. It transcodes UTF-16 data to the system encoding (whatever that is) and the takes those bytes, pretends they're valid ISO-8859-15 and transcodes them to UTF-16.

Then how to convert an input String like "ÃÃŒ?öÀABC" to normal? (if I know that the string is from an ISO8859 file).

The correct way to perform this operation would be like this:

byte[] iso859_15 = { (byte) 0xc3, (byte) 0xc3, (byte) 0xbc, 0x3f,
  (byte) 0xc3, (byte) 0xb6, (byte) 0xc3, (byte) 0xa4, 0x41, 0x42,
         0x43 };
String utf16 = new String(iso859_15, Charset.forName("ISO-8859-15"));

Strings in Java are always UTF-16. All other encodings must be represented using the byte type.

Now, if you use System.out to output the resultant string, that might not appear correctly, but that is a different transcoding issue. For example, the Windows console default encoding doesn't match the system encoding. The encoding used by System.out must match the encoding of the device receiving the data. You should also take care to ensure that you are reading your source files with the same encoding your editor is using.

To understand how treatment of character data varies between languages, read this.

Solution 4

Here is an easy way with String output (I created a method to do this):

public static String (String input){
String output = "";
try {
    /* From ISO-8859-1 to UTF-8 */
    output = new String(input.getBytes("ISO-8859-1"), "UTF-8");
    /* From UTF-8 to ISO-8859-1 */
    output = new String(input.getBytes("UTF-8"), "ISO-8859-1");
} catch (UnsupportedEncodingException e) {
    e.printStackTrace();
}
return output;

}

// Example
input = "Música";
output = "Música";

it works!! :)

Solution 5

Java Strings are internally always stored as UTF16 arrays (and as UTF8 in the class file after compliation), so you can't simply interpret a string as if it was a byte array. If you want to create a byte array from a string in a certain encoding, you must first convert into this encoding:

byte[] b = "Üü?öäABC".getBytes("ISO-8859-15");

System.out.println(new String(b, "ISO-8859-15")); // will be ok
System.out.println(new String(b, "UTF-8")); // will look garbled
Share:
32,431
Lissy
Author by

Lissy

Updated on July 26, 2022

Comments

  • Lissy
    Lissy almost 2 years

    Whats the problem with this code? I made an ISO8859 String. So most of the ÄÖÜ are some krypooutput. Thats fine. But how to Convert them back to normal chars (UTF8 or something)?

        String s = new String("Üü?öäABC".getBytes(), "ISO-8859-15");
    
        System.out.println(s);
        //ÃÃŒ?öÀABC => ok(?)
        System.out.println(new String(s.getBytes(), "ISO-8859-15"));
        //ÃÂÃÅ?öÃâ¬ABC => ok(?)
        System.out.println(new String(s.getBytes(), "UTF-8"));
        //ÃÃŒ?öÀABC => huh?
    
  • McDowell
    McDowell almost 13 years
    I should not that the byte array contains ÃÃŒ?öÀABC encoded as ISO-8859-15, which is perhaps not the String the OP wants. Üü?öäABC encoded as ISO-8859-15 would be the array { 0x22, (byte) 0xdc, (byte) 0xfc, 0x3f, (byte) 0xf6, (byte) 0xe4, 0x41, 0x42, 0x43, 0x22 }
  • Sundhar
    Sundhar about 11 years
    Hi Jooce, I tried the same, it seems it is working fine, thank you for this