Converting char array into byte array and back again

79,231

Solution 1

The problem is your use of the String(byte[]) constructor, which uses the platform default encoding. That's almost never what you should be doing - if you pass in "UTF-16" as the character encoding to work, your tests will probably pass. Currently I suspect that passwordBytes1AsString and passwordBytes2AsString are each 16 characters long, with every other character being U+0000.

Solution 2

Conversion between char and byte is character set encoding and decoding.I prefer to make it as clear as possible in code. It doesn't really mean extra code volume:

 Charset latin1Charset = Charset.forName("ISO-8859-1"); 
 charBuffer = latin1Charset.decode(ByteBuffer.wrap(byteArray)); // also decode to String
 byteBuffer = latin1Charset.encode(charBuffer);                 // also decode from String

Aside:

java.nio classes and java.io Reader/Writer classes use ByteBuffer & CharBuffer (which use byte[] and char[] as backing arrays). So often preferable if you use these classes directly. However, you can always do:

 byteArray = ByteBuffer.array();  byteBuffer = ByteBuffer.wrap(byteArray);  
 byteBuffer.get(byteArray);       charBuffer.put(charArray);
 charArray = CharBuffer.array();  charBuffer = ByteBuffer.wrap(charArray);
 charBuffer.get(charArray);       charBuffer.put(charArray);

Solution 3

Original Answer

    public byte[] charsToBytes(char[] chars){
        Charset charset = Charset.forName("UTF-8");
        ByteBuffer byteBuffer = charset.encode(CharBuffer.wrap(chars));
        return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
    }

    public char[] bytesToChars(byte[] bytes){
        Charset charset = Charset.forName("UTF-8");
        CharBuffer charBuffer = charset.decode(ByteBuffer.wrap(bytes));
        return Arrays.copyOf(charBuffer.array(), charBuffer.limit());    
    }

Edited to use StandardCharsets

public byte[] charsToBytes(char[] chars)
{
    final ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(CharBuffer.wrap(chars));
    return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
}

public char[] bytesToChars(byte[] bytes)
{
    final CharBuffer charBuffer = StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes));
    return Arrays.copyOf(charBuffer.array(), charBuffer.limit());    
}

Here is a JavaDoc page for StandardCharsets. Note this on the JavaDoc page:

These charsets are guaranteed to be available on every implementation of the Java platform.

Solution 4

I would do is use a loop to convert to bytes and another to conver back to char.

char[] chars = "password".toCharArray();
byte[] bytes = new byte[chars.length*2];
for(int i=0;i<chars.length;i++) {
   bytes[i*2] = (byte) (chars[i] >> 8);
   bytes[i*2+1] = (byte) chars[i];
}
char[] chars2 = new char[bytes.length/2];
for(int i=0;i<chars2.length;i++) 
   chars2[i] = (char) ((bytes[i*2] << 8) + (bytes[i*2+1] & 0xFF));
String password = new String(chars2);

Solution 5

If you want to use a ByteBuffer and CharBuffer, don't do the simple .asCharBuffer(), which simply does an UTF-16 (LE or BE, depending on your system - you can set the byte-order with the order method) conversion (since the Java Strings and thus your char[] internally uses this encoding).

Use Charset.forName(charsetName), and then its encode or decode method, or the newEncoder /newDecoder.

When converting your byte[] to String, you also should indicate the encoding (and it should be the same one).

Share:
79,231
Scott
Author by

Scott

Updated on September 23, 2020

Comments

  • Scott
    Scott over 3 years

    I'm looking to convert a Java char array to a byte array without creating an intermediate String, as the char array contains a password. I've looked up a couple of methods, but they all seem to fail:

    char[] password = "password".toCharArray();
    
    byte[] passwordBytes1 = new byte[password.length*2];
    ByteBuffer.wrap(passwordBytes1).asCharBuffer().put(password);
    
    byte[] passwordBytes2 = new byte[password.length*2];
    for(int i=0; i<password.length; i++) {
        passwordBytes2[2*i] = (byte) ((password[i]&0xFF00)>>8); 
        passwordBytes2[2*i+1] = (byte) (password[i]&0x00FF); 
    }
    
    String passwordAsString = new String(password);
    String passwordBytes1AsString = new String(passwordBytes1);
    String passwordBytes2AsString = new String(passwordBytes2);
    
    System.out.println(passwordAsString);
    System.out.println(passwordBytes1AsString);
    System.out.println(passwordBytes2AsString);
    assertTrue(passwordAsString.equals(passwordBytes1) || passwordAsString.equals(passwordBytes2));
    

    The assertion always fails (and, critically, when the code is used in production, the password is rejected), yet the print statements print out password three times. Why are passwordBytes1AsString and passwordBytes2AsString different from passwordAsString, yet appear identical? Am I missing out a null terminator or something? What can I do to make the conversion and unconversion work?

  • Scott
    Scott about 13 years
    I just tried that (i.e. String passwordBytes1AsString = new String(passwordBytes1, "UTF-16");) and there's no change. I also tried checking the length of the strings - String.length() returns 8. Would it count U+0000 characters?
  • Jon Skeet
    Jon Skeet about 13 years
    @Scott: Try printing out the lengths of the strings, and the individual characters (as int values). That'll show you where the differences are.
  • Scott
    Scott about 13 years
    112,97,115,115,119,111,114,100 for both the original and the converted ones.
  • Scott
    Scott about 13 years
    Have just noticed that I was using the wrong parameters to equals() in the assertion. *facepalm* Your original supposition was indeed the correct one. Many thanks.
  • Prashant
    Prashant over 11 years
    dont use String#getBytes() without specifying an encoding, that gets you into all kinds of portability trouble.
  • Cerber
    Cerber about 10 years
    not appropriate to the use case : this line was just an easy way to get char[] in this example.
  • Simon MᶜKenzie
    Simon MᶜKenzie over 8 years
    The question doesn't mention getBytes, so this isn't really relevant. Are you trying to comment on one of the other answers?
  • junqiang chen
    junqiang chen over 8 years
    Just want to declare that the usages of String 's getBytes Function. And what should be taking care of when using new String(Byte[]) . Hope it helps.
  • Tom Blodget
    Tom Blodget almost 7 years
    Nice use of ByteBuffer. However, without it being stated otherwise, the password is Unicode, so StandardCharset.UTF_8 would be better than corrupting the data by reducing it to ASCII.
  • Cassian
    Cassian almost 7 years
    You can use any charset you need
  • Cassian
    Cassian almost 7 years
    I have edited the post changing from US-ASCII to UTF-8. You are right. The ideea is to keep same encoding. The US-ASCII does not have as many chars as UTF-8, for example - no letters with accents, and if you use first UTF-8 and after US-ASCII you loose some info.
  • Cassian
    Cassian almost 7 years
    After storing sensitive data in char[] or byte[] you need to clear the sensitive data as Andrii explains in usage from here stackoverflow.com/a/9670279/1582089
  • RoutesMaps.com
    RoutesMaps.com almost 6 years
    Nice example. But in my case it works with Charset charset = Charset.forName("ISO-8859-1");