Converting char array into byte array and back again
Solution 1
The problem is your use of the String(byte[])
constructor, which uses the platform default encoding. That's almost never what you should be doing - if you pass in "UTF-16" as the character encoding to work, your tests will probably pass. Currently I suspect that passwordBytes1AsString
and passwordBytes2AsString
are each 16 characters long, with every other character being U+0000.
Solution 2
Conversion between char and byte is character set encoding and decoding.I prefer to make it as clear as possible in code. It doesn't really mean extra code volume:
Charset latin1Charset = Charset.forName("ISO-8859-1");
charBuffer = latin1Charset.decode(ByteBuffer.wrap(byteArray)); // also decode to String
byteBuffer = latin1Charset.encode(charBuffer); // also decode from String
Aside:
java.nio classes and java.io Reader/Writer classes use ByteBuffer & CharBuffer (which use byte[] and char[] as backing arrays). So often preferable if you use these classes directly. However, you can always do:
byteArray = ByteBuffer.array(); byteBuffer = ByteBuffer.wrap(byteArray);
byteBuffer.get(byteArray); charBuffer.put(charArray);
charArray = CharBuffer.array(); charBuffer = ByteBuffer.wrap(charArray);
charBuffer.get(charArray); charBuffer.put(charArray);
Solution 3
Original Answer
public byte[] charsToBytes(char[] chars){
Charset charset = Charset.forName("UTF-8");
ByteBuffer byteBuffer = charset.encode(CharBuffer.wrap(chars));
return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
}
public char[] bytesToChars(byte[] bytes){
Charset charset = Charset.forName("UTF-8");
CharBuffer charBuffer = charset.decode(ByteBuffer.wrap(bytes));
return Arrays.copyOf(charBuffer.array(), charBuffer.limit());
}
Edited to use StandardCharsets
public byte[] charsToBytes(char[] chars)
{
final ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(CharBuffer.wrap(chars));
return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
}
public char[] bytesToChars(byte[] bytes)
{
final CharBuffer charBuffer = StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes));
return Arrays.copyOf(charBuffer.array(), charBuffer.limit());
}
Here is a JavaDoc page for StandardCharsets. Note this on the JavaDoc page:
These charsets are guaranteed to be available on every implementation of the Java platform.
Solution 4
I would do is use a loop to convert to bytes and another to conver back to char.
char[] chars = "password".toCharArray();
byte[] bytes = new byte[chars.length*2];
for(int i=0;i<chars.length;i++) {
bytes[i*2] = (byte) (chars[i] >> 8);
bytes[i*2+1] = (byte) chars[i];
}
char[] chars2 = new char[bytes.length/2];
for(int i=0;i<chars2.length;i++)
chars2[i] = (char) ((bytes[i*2] << 8) + (bytes[i*2+1] & 0xFF));
String password = new String(chars2);
Solution 5
If you want to use a ByteBuffer and CharBuffer, don't do the simple .asCharBuffer()
, which simply does an UTF-16 (LE or BE, depending on your system - you can set the byte-order with the order
method) conversion (since the Java Strings and thus your char[]
internally uses this encoding).
Use Charset.forName(charsetName)
, and then its encode
or decode
method, or the newEncoder
/newDecoder
.
When converting your byte[] to String, you also should indicate the encoding (and it should be the same one).
Scott
Updated on September 23, 2020Comments
-
Scott over 3 years
I'm looking to convert a Java char array to a byte array without creating an intermediate
String
, as the char array contains a password. I've looked up a couple of methods, but they all seem to fail:char[] password = "password".toCharArray(); byte[] passwordBytes1 = new byte[password.length*2]; ByteBuffer.wrap(passwordBytes1).asCharBuffer().put(password); byte[] passwordBytes2 = new byte[password.length*2]; for(int i=0; i<password.length; i++) { passwordBytes2[2*i] = (byte) ((password[i]&0xFF00)>>8); passwordBytes2[2*i+1] = (byte) (password[i]&0x00FF); } String passwordAsString = new String(password); String passwordBytes1AsString = new String(passwordBytes1); String passwordBytes2AsString = new String(passwordBytes2); System.out.println(passwordAsString); System.out.println(passwordBytes1AsString); System.out.println(passwordBytes2AsString); assertTrue(passwordAsString.equals(passwordBytes1) || passwordAsString.equals(passwordBytes2));
The assertion always fails (and, critically, when the code is used in production, the password is rejected), yet the print statements print out password three times. Why are
passwordBytes1AsString
andpasswordBytes2AsString
different frompasswordAsString
, yet appear identical? Am I missing out a null terminator or something? What can I do to make the conversion and unconversion work? -
Scott about 13 yearsI just tried that (i.e.
String passwordBytes1AsString = new String(passwordBytes1, "UTF-16");
) and there's no change. I also tried checking the length of the strings -String.length()
returns 8. Would it count U+0000 characters? -
Jon Skeet about 13 years@Scott: Try printing out the lengths of the strings, and the individual characters (as int values). That'll show you where the differences are.
-
Scott about 13 years112,97,115,115,119,111,114,100 for both the original and the converted ones.
-
Scott about 13 yearsHave just noticed that I was using the wrong parameters to
equals()
in the assertion. *facepalm* Your original supposition was indeed the correct one. Many thanks. -
Prashant over 11 yearsdont use
String#getBytes()
without specifying an encoding, that gets you into all kinds of portability trouble. -
Cerber about 10 yearsnot appropriate to the use case : this line was just an easy way to get char[] in this example.
-
Simon MᶜKenzie over 8 yearsThe question doesn't mention
getBytes
, so this isn't really relevant. Are you trying to comment on one of the other answers? -
junqiang chen over 8 yearsJust want to declare that the usages of String 's getBytes Function. And what should be taking care of when using new String(Byte[]) . Hope it helps.
-
Tom Blodget almost 7 yearsNice use of ByteBuffer. However, without it being stated otherwise, the password is Unicode, so StandardCharset.UTF_8 would be better than corrupting the data by reducing it to ASCII.
-
Cassian almost 7 yearsYou can use any charset you need
-
Cassian almost 7 yearsI have edited the post changing from US-ASCII to UTF-8. You are right. The ideea is to keep same encoding. The US-ASCII does not have as many chars as UTF-8, for example - no letters with accents, and if you use first UTF-8 and after US-ASCII you loose some info.
-
Cassian almost 7 yearsAfter storing sensitive data in char[] or byte[] you need to clear the sensitive data as Andrii explains in usage from here stackoverflow.com/a/9670279/1582089
-
RoutesMaps.com almost 6 yearsNice example. But in my case it works with Charset charset = Charset.forName("ISO-8859-1");