Convert byte array to understandable String

13,312

The byte array doesn't look like UTF-8. Note that \ufffd (named REPLACEMENT CHARACTER) is "used to replace an incoming character whose value is unknown or unrepresentable in Unicode."

Addendum: Here's a simple example of how this can happen. When cast to a byte, the code point for ñ is neither UTF-8 nor US-ASCII; but it is valid ISO-8859-1. In effect, you have to know what the bytes represent before you can encode them into a String.

public class Hello {

    public static void main(String[] args)
            throws java.io.UnsupportedEncodingException {
        String s = "Hola, señor!";
        System.out.println(s);
        byte[] b = new byte[s.length()];
        for (int i = 0; i < b.length; i++) {
            int cp = s.codePointAt(i);
            b[i] = (byte) cp;
            System.out.print((byte) cp + " ");
        }
        System.out.println();
        System.out.println(new String(b, "UTF-8"));
        System.out.println(new String(b, "US-ASCII"));
        System.out.println(new String(b, "ISO-8859-1"));
    }
}

Output:

Hola, señor!
72 111 108 97 44 32 115 101 -15 111 114 33 
Hola, se�or!
Hola, se�or!
Hola, señor!
Share:
13,312
Mike B
Author by

Mike B

Experienced Software Engineer with a decade of experience building software.

Updated on June 04, 2022

Comments

  • Mike B
    Mike B almost 2 years

    I have a program that handles byte arrays in Java, and now I would like to write this into a XML file. However, I am unsure as to how I can convert the following byte array into a sensible String to write to a file. Assuming that it was Unicode characters I attempted the following code:

    String temp = new String(encodedBytes, "UTF-8");
    

    Only to have the debugger show that the encodedBytes contain "\ufffd\ufffd ^\ufffd\ufffd-m\ufffd\ufffd\/ufffd \ufffd\ufffdIA\ufffd\ufffd". The String should contain a hash in alphanumerical format.

    How would I turn the above String into a sensible String for output?