How to detect end of string in byte array to string conversion?

18,419

Solution 1

0 isn't an "end of string character". It's just a byte. Whether or not it only comes at the end of the string depends on what encoding you're using (and what the text can be). For example, if you used UTF-16, every other byte would be 0 for ASCII characters.

If you're sure that the first 0 indicates the end of the string, you can use something like the code you've given, but I'd rewrite it as:

int size = 0;
while (size < data.length)
{
    if (data[size] == 0)
    {
        break;
    }
    size++;
}

// Specify the appropriate encoding as the last argument
String myString = new String(data, 0, size, "UTF-8");

I strongly recommend that you don't just use the platform default encoding - it's not portable, and may well not allow for all Unicode characters. However, you can't just decide arbitrarily - you need to make sure that everything producing and consuming this data agrees on the encoding.

If you're in control of the protocol, it would be much better if you could introduce a length prefix before the string, to indicate how many bytes are in the encoded form. That way you'd be able to read exactly the right amount of data (without "over-reading") and you'd be able to tell if the data was truncated for some reason.

Solution 2

May be its too late, But it may help others. The simplest thing you can do is new String(myBuffer).trim() that gives you exactly what you want.

Solution 3

You can always start at the end of the byte array and go backwards until you hit the first non-zero. Then just copy that into a new byte and then String it. Hope this helps:

    byte[] foo = {28,6,3,45,0,0,0,0};
    int i = foo.length - 1;

    while (foo[i] == 0)
    {
        i--;
    }

    byte[] bar = Arrays.copyOf(foo, i+1);

    String myString = new String(bar, "UTF-8");
    System.out.println(myString.length());

Will give you a result of 4.

Solution 4

Strings in Java aren't ended with a 0, like in some other languages. 0 will get turned into the so-called null character, which is allowed to appear in a String. I suggest you use some trimming scheme that either detects the first index of the array that's a 0 and uses a sub-array to construct the String (assuming all the rest will be 0 after that), or just construct the String and call trim(). That'll remove leading and trailing whitespace, which is any character with ASCII code 32 or lower.

The latter won't work if you have leading whitespace you must preserve. Using a StringBuilder and deleting characters at the end as long as they're the null character would work better in that case.

Solution 5

Not to dive into the protocol considerations that the original OP mentioned, how about this for trimming the trailing zeroes ?

public static String bytesToString(byte[] data) {
    String dataOut = "";
    for (int i = 0; i < data.length; i++) {
        if (data[i] != 0x00)
            dataOut += (char)data[i];
    }
    return dataOut;
}
Share:
18,419
grunk
Author by

grunk

French Web developper Blog : http://blog.oroger.fr Twitter : @olivrog Gplus : gplus.to/oroger SOreadytohelp

Updated on June 26, 2022

Comments

  • grunk
    grunk almost 2 years

    I receive from socket a string in a byte array which look like :

    [128,5,6,3,45,0,0,0,0,0]
    

    The size given by the network protocol is the total lenght of the string (including zeros) so , in my exemple 10.

    If i simply do :

    String myString = new String(myBuffer); 
    

    I have at the end of the string 5 non correct caracter. The conversion don't seems to detect the end of string caracter (0).

    To get the correct size and the correct string i do this :

    int sizeLabelTmp = 0;
    //Iterate over the 10 bit to get the real size of the string
    for(int j = 0; j<(sizeLabel); j++) {
        byte charac = datasRec[j];
        if(charac == 0)
            break;
        sizeLabelTmp ++;
    }
    // Create a temp byte array to make a correct conversion
    byte[] label    = new byte[sizeLabelTmp];
    for(int j = 0; j<(sizeLabelTmp); j++) {
        label[j] = datasRec[j];
    }
    String myString = new String(label);
    

    Is there a better way to handle the problem ?

    Thanks