Convert String to/from byte array without encoding

11,573

Solution 1

Here is a sample code which will convert String to byte array and back to String without encoding.

public class Test
{

    public static void main(String[] args)
    {
        Test t = new Test();
        t.Test();
    }

    public void Test()
    {
        String input = "Hèllo world";
        byte[] inputBytes = GetBytes(input);
        String output = GetString(inputBytes);
        System.out.println(output);
    }

    public byte[] GetBytes(String str)
    {
        char[] chars = str.toCharArray();
        byte[] bytes = new byte[chars.length * 2];
        for (int i = 0; i < chars.length; i++)
        {
            bytes[i * 2] = (byte) (chars[i] >> 8);
            bytes[i * 2 + 1] = (byte) chars[i];
        }

        return bytes;
    }

    public String GetString(byte[] bytes)
    {
        char[] chars = new char[bytes.length / 2];
        char[] chars2 = new char[bytes.length / 2];
        for (int i = 0; i < chars2.length; i++)
            chars2[i] = (char) ((bytes[i * 2] << 8) + (bytes[i * 2 + 1] & 0xFF));

        return new String(chars2);

    }
}

Solution 2

No, you aren't missing anything. There is no easy way to do that because String and char are for text. You apparently don't want to handle your data as text—which would make complete sense if it isn't text. You could do it the hard way that you propose.

An alternative is to assume a character encoding that allows arbitrary sequences of arbitrary byte values (0-255). ISO-8859-1 or IBM437 both qualify. (Windows-1252 only has 251 codepoints. UTF-8 doesn't allow arbitrary sequences.) If you use ISO-8859-1, the resulting string will be the same as your hard way.

As for efficiency, the most efficient way to handle an array of bytes is to keep it as an array of bytes.

Solution 3

This will convert a byte array to a String while only filling the upper 8 bits.

public static String stringFromBytes(byte byteData[]) {
    char charData[] = new char[byteData.length];
    for(int i = 0; i < charData.length; i++) {
        charData[i] = (char) (((int) byteData[i]) & 0xFF);
    }
    return new String(charData);
}

The efficiency should be quite good. Like Ben Thurley said, if performance is really such an issue don't convert to a String in the first place but work with the byte array instead.

Share:
11,573
Admin
Author by

Admin

Updated on June 13, 2022

Comments

  • Admin
    Admin almost 2 years

    I have a byte array read over a network connection that I need to transform into a String without any encoding, that is, simply by treating each byte as the low end of a character and leaving the high end zero. I also need to do the converse where I know that the high end of the character will always be zero.

    Searching the web yields several similar questions that have all got responses indicating that the original data source must be changed. This is not an option so please don't suggest it.

    This is trivial in C but Java appears to require me to write a conversion routine of my own that is likely to be very inefficient. Is there an easy way that I have missed?