Char into byte? (Java)

89,929

Solution 1

As trojanfoe states, your confusion on the results of your code is partly due to sign-extension. I'll try to add a more detailed explanation that may help with your confusion.

char a = '\uffff';
byte b = (byte)a;  // b = 0xFF

As you noted, this DOES result in the loss of information. This is considered a narrowing conversion. Converting a char to a byte "simply discards all but the n lowest order bits".
The result is: 0xFFFF -> 0xFF

char c = (char)b;  // c = 0xFFFF

Converting a byte to a char is considered a special conversion. It actually performs TWO conversions. First, the byte is SIGN-extended (the new high order bits are copied from the old sign bit) to an int (a normal widening conversion). Second, the int is converted to a char with a narrowing conversion.
The result is: 0xFF -> 0xFFFFFFFF -> 0xFFFF

int d = (int)c;  // d = 0x0000FFFF

Converting a char to an int is considered a widening conversion. When a char type is widened to an integral type, it is ZERO-extended (the new high order bits are set to 0).
The result is: 0xFFFF -> 0x0000FFFF. When printed, this will give you 65535.

The three links I provided are the official Java Language Specification details on primitive type conversions. I HIGHLY recommend you take a look. They are not terribly verbose (and in this case relatively straightforward). It details exactly what java will do behind the scenes with type conversions. This is a common area of misunderstanding for many developers. Post a comment if you are still confused with any step.

Solution 2

It's sign extension. Try \u1234 instead of \uffff and see what happens.

Solution 3

java byte is signed. it's counter intuitive. in almost all situations where a byte is used, programmers would want an unsigned byte instead. it's extremely likely a bug if a byte is cast to int directly.

This does the intended conversion correctly in almost all programs:

int c = 0xff & b ;

Empirically, the choice of signed byte is a mistake.

Share:
89,929
wakachamo
Author by

wakachamo

Updated on July 09, 2020

Comments

  • wakachamo
    wakachamo almost 4 years

    How come this happens:

    char a = '\uffff'; //Highest value that char can take - 65535
    byte b = (byte)a; //Casting a 16-bit value into 8-bit data type...! Isn't data lost here?
    char c = (char)b; //Let's get the value back
    int d = (int)c;
    System.out.println(d); //65535... how?
    

    Basically, I saw that a char is 16-bit. Therefore, if you cast it into a byte, how come no data is lost? (Value is the same after casting into an int)

    Thanks in advance for answering this little ignorant question of mine. :P

    EDIT: Woah, found out that my original output actually did as expected, but I just updated the code above. Basically, a character is cast into a byte and then cast back into a char, and its original, 2-byte value is retained. How does this happen?