Convert byte[] to String using binary encoding

11,983

Solution 1

You could use the ASCII encoding for 7-bit characters

String s = "Hello World!";
byte[] b = s.getBytes("ASCII");
System.out.println(new String(b, "ASCII"));

or 8-bit ascii

String s = "Hello World! \u00ff";
byte[] b = s.getBytes("ISO-8859-1");
System.out.println(new String(b, "ISO-8859-1"));

EDIT

System.out.println("ASCII => " + Charset.forName("ASCII"));
System.out.println("US-ASCII => " + Charset.forName("US-ASCII"));
System.out.println("ISO-8859-1 => " + Charset.forName("ISO-8859-1"));

prints

ASCII => US-ASCII
US-ASCII => US-ASCII
ISO-8859-1 => ISO-8859-1

Solution 2

You could skip the step of a char array and putting in String and even use a StringBuilder (or StringBuffer if you are worried about multi-threading). My example shows StringBuilder.

byte[] bytes = ...;
StringBuilder sb = new StringBuilder(bytes.length);
for (int i = 0; i < bytes.length; i++) {
  sb.append((char) (bytes[i] & 0xFF));
}

return sb.toString();

I know it doesn't answer your other question. Just seeking to help with simplifying the "boilerplate" code.

Share:
11,983
fernacolo
Author by

fernacolo

Updated on June 13, 2022

Comments

  • fernacolo
    fernacolo almost 2 years

    I want to translate each byte from a byte[] into a char, then put those chars on a String. This is the so-called "binary" encoding of some databases. So far, the best I could find is this huge boilerplate:

    byte[] bytes = ...;
    char[] chars = new char[bytes.length];
    for (int i = 0; i < bytes.length; ++i) {
        chars[i] = (char) (bytes[i] & 0xFF);
    }
    String s = new String(chars);
    

    Is there another option from Java SE or perhaps from Apache Commons? I wish I could have something like this:

    final Charset BINARY_CS = Charset.forName("BINARY");
    String s = new String(bytes, BINARY_CS);
    

    But I'm not willing to write a Charset and their codecs (yet). Is there such a ready binary Charset in JRE or in Apache Commons?

  • fernacolo
    fernacolo over 12 years
    Thanks, but there will exist some 8-bit characteres.
  • jtahlborn
    jtahlborn over 12 years
    i believe the encoding is "US-ASCII".
  • fernacolo
    fernacolo over 12 years
    UTF-8 will translate some multi-byte characters in single-char characteres, so it won't work. ASCII only handles 7-bit characteres, and there will exist some 7-bit characters.
  • Vishy
    Vishy over 12 years
    US-ASCII is an alias for ASCII
  • ColinD
    ColinD over 12 years
    There is no good reason not to use StringBuilder instead of StringBuffer if you're using it as a local variable like in your example.
  • jtahlborn
    jtahlborn over 12 years
    other way around, "ASCII" is an alias for "US-ASCII". obviously, both will work, i'm just saying that this is the "official" name java uses.
  • ColinD
    ColinD over 12 years
    @Peter Lawrey: From the article you linked: "US-ASCII is the Internet Assigned Numbers Authority (IANA) preferred charset name for ASCII." Also, I believe the charset that's required to be present in all Java implementations is "US-ASCII".
  • jtahlborn
    jtahlborn over 12 years
    yeah, ISO-8859-1 (or some other 8bit encoding) is probably the encoding to use.
  • fernacolo
    fernacolo over 12 years
    ISO-8859-1 did the trick. Very interesting. I thought that it would map some bytes to 0x7F, because not all byte values have meaning in this encoding (according to en.wikipedia.org/wiki/ISO/IEC_8859-1).
  • Vishy
    Vishy over 12 years
    Ok, I have found how to determine which is the alias. See my edit.
  • ColinD
    ColinD over 12 years
    The standard charset identifiers are listed in the Charset Javadoc: docs.oracle.com/javase/6/docs/api/java/nio/charset/Charset.h‌​tml
  • Vishy
    Vishy over 12 years
    @fernacolo Characters without a mapping are converted to ?
  • Andrew Rasmussen
    Andrew Rasmussen over 12 years
    Why would this get a downvote? I told you that the String constructor does exactly what you want it to do. Sorry for not doing your research for you as to what charset to use...
  • fernacolo
    fernacolo over 12 years
    @PeterLawrey Yes. All bytes from 0 to 127 are converted to chars 0 to 127, and bytes -128 to -1 are converted to chars 128 to 255. And vice-versa worked too. It's a perfect binary conversion.
  • fernacolo
    fernacolo over 12 years
    I just hope this is standard and not an implementation-specific feature.
  • fernacolo
    fernacolo over 12 years
    @PeterLawrey I think your edits are trying to answer something else. It would be enough if your answer only showed String s = new String(bytes, "ISO-8859-1").
  • fernacolo
    fernacolo over 12 years
    This downvote was not from me. By the way, thanks for trying to answer.
  • Chris Aldrich
    Chris Aldrich over 12 years
    @ColinD Modified to StringBuilder. You are right there. Used to using StringBuffer as that was all we had before Java 5. Also, we have a multi-threaded app, so StringBuffer works well for us. But ++ to your point.