Java Strings storing byte arrays

22,834

Solution 1

It's not a good idea to store binary data in a String object. You'd be better off using something like Base64 encoding, which is intended to make binary data into a printable string, and is completely reversible.

In fact, I just found a public domain base64 encoder for Java: http://iharder.sourceforge.net/current/java/base64/

Solution 2

Several people have pointed out that this is not a proper use of the String(byte[]) constructor. It is important to remember that in Java a String is made up of characters, which happen to be 16 bits, and not 8 bits, as a byte is. You are also forgetting about character encoding. Remember, a character is often not a byte.

Let's break it down bit by bit:

String s = "test123";
byte[] a = s.getBytes();

At this point your byte array most likely contains 8 bytes if your system's default character encoding is Windows-1252 or iso-8859-1 or UTF-8.

byte[] b = env.encrypt(a);

Now b contains some seemingly random data depending on your encryption, and isn't even guaranteed to be a certain length. Many encryption engines pad the input data so that the output matches a certain block size.

String t = new String(b);

This is taking your random bytes and asking Java to interpret them as character data. These characters may appear as gibberish and some sequences of bits are not valid characters for every encoding. Java dutifully does its best and creates a sequence of 16-bit chars.

byte[] c = t.getBytes();

This may or may not give you the same byte array as b, depending on the encoding. You state in the problem description that you are seeing c as 16 bytes long; this is probably because the garbage in t doesn't convert well in the default character encoding.

byte[] d = env.decrypt(c);

This won't work because c is not the data you expect it to be but rather is corrupt.

Solutions:

  1. Just store the byte array directly in the database or wherever. However you are still forgetting about the character encoding problem, more on that in a sec.
  2. Take the byte array data and encode it using Base64 or as hexadecimal digits and store that string:

    byte[] cypherBytes = env.encrypt(getBytes(plainText));
    StringBuffer cypherText = new StringBuffer(cypherBytes.length * 2);
    for (byte b : cypherBytes) {
      String hex = String.format("%02X", b); //$NON-NLS-1$
      cypherText.append(hex);
    }
    return cypherText.toString();
    

Character encoding:

A user's password may not be ASCII and thus your system is susceptible to problems because you don't specify the encoding.

Compare:

String s = "tést123";
byte[] a = s.getBytes();
byte[] b = env.encrypt(a);

with

String s = "tést123";
byte[] a = s.getBytes("UTF-8");
byte[] b = env.encrypt(a);

The byte array a won't have the same value with the UTF-8 encoding as with the system default (unless your system default is UTF-8). It doesn't matter what encoding you use as long as A) you're consistent and B) your encoding can represent all the allowable characters for your data. You probably can't store Chinese text in the system default encoding. If your application is ever deployed on more than one computer, and one of those has a different system-default encoding, passwords encrypted on one system will become gibberish on the other system.

Moral of the story: Characters are not bytes and bytes are not characters. You have to remember which you are dealing with and how to convert back and forth between them.

Solution 3

In both cases, you are using the OS default non-Unicode charset (which depends on locale). If you're passing the string from one system to another, they may have different locales, and thus different default charsets. You need to use one well-defined charset to do what you're trying to do; e.g. ISO-8859-1.

Better yet, don't do the conversion, and pass the byte[] array directly.

Solution 4

This is somewhat of an abuse of the String(byte[]) constructor and related methods.

This would work with certain encodings, and fail with others. Presumably your platform's default encoding is one of the ones where it fails.

You should use something like Commons Codec to convert these bytes to hex or base64.

Also why are you encrypting passwords instead of hashing them with salt anyway?

Solution 5

Implement a StringWrapper class whose constructor takes a String arg and coverts it to a byte[]. Use "ISO-8859-1" encoding to ensure each char will be just 8 bits instead of 16. You can then obviously use encoding/decoding methods to manipulate those bytes.

Share:
22,834
Jon
Author by

Jon

Updated on May 18, 2020

Comments

  • Jon
    Jon almost 4 years

    I want to store a byte array wrapped in a String object. Here's the scenario

    1. The user enters a password.
    2. The bytes of that password are obtained using the getBytes() String method.
    3. They bytes are encrypted using java's crypo package.
    4. Those bytes are then converted into a String using the constructor new String(bytes[])
    5. That String is stored, or otherwise passed around (NOT changed)
    6. The bytes of that String are obtained and they are different then the encoded bytes.

    Here's a snippet of code that describes what I'm talking about.

    String s = "test123";
    byte[] a = s.getBytes();
    byte[] b = env.encrypt(a);
    String t = new String(b);
    byte[] c = t.getBytes();
    byte[] d = env.decrypt(c);
    

    Where env.encrypt() and env.decrypt() do the encryption and decryption. The problem I'm having is that the b array is of length 8 and the c array is of length 16. I would think that they would be equal. What's going on here? I tried to modify the code as below

    String s = "test123";
    Charset charset = Charset.getDefaultCharset();
    byte[] a = s.getBytes(charset);
    byte[] b = env.encrypt(a);
    String t = new String(b, charset);
    byte[] c = t.getBytes(charset);
    byte[] d = env.decrypt(c);
    

    but that didn't help.

    Any ideas?

  • skaffman
    skaffman over 14 years
    +1 take the password, encrypt it, convert to base64 string (suggest using Apache Commons Codec for the last bit).
  • atk
    atk over 14 years
    It's also not a good idea to store secrets in String objects (the password input or the decrypted output) unless you have absolutely no choice. This is because there's no way to clear a String - once it's in memory, a String doesn't get overwritten until the memory is garbage collected AND the memory allocator decides to reallocate that section of memory.
  • arkon
    arkon almost 12 years
    Care to explain why it's a bad idea to store binary data in a string obj? I'm not saying I disagree, but it's usually a good idea to substantiate your claims.
  • Jonathan
    Jonathan almost 12 years
    The API documentation for the String class itself says "The String class represents character strings." Further down, it also says "A String represents a string in the UTF-16 format...". Basically, it is a smart object for storing character data, where you want a dumb object for storing binary data.
  • Cory Kendall
    Cory Kendall over 9 years
    Thanks for the informative answer. I'm currently blocked by a third-party's poor choice of Types; I need to pass a String through their system which I will get back later in the same JVM, and I need to use bytes. The encrypted size is very compact and I'm hoping to avoid Base64 encoding. I wish there was a Charset which said "every-bit pattern is valid".
  • Mr. Shiny and New 安宇
    Mr. Shiny and New 安宇 over 9 years
    @CoryKendall You're mixing concepts. Lots of character encodings have "every bit pattern is valid". But in Java Strings there is only UTF-16.
  • Cory Kendall
    Cory Kendall over 9 years
    Ahh I see, so wouldn't I be able to convert any bit-pattern to a ISO-8859-1 String and back into bytes without seeing changes? Which would answer this question in a couple lines?
  • Mr. Shiny and New 安宇
    Mr. Shiny and New 安宇 over 9 years
    @CoryKendall The point is that there is no such thing as an ISO-8859-1 String in Java. You can have a byte array that contains such data, but not a char[] or String.