Char into byte? (Java)
Solution 1
As trojanfoe states, your confusion on the results of your code is partly due to sign-extension. I'll try to add a more detailed explanation that may help with your confusion.
char a = '\uffff';
byte b = (byte)a; // b = 0xFF
As you noted, this DOES result in the loss of information. This is considered a narrowing conversion. Converting a char to a byte "simply discards all but the n lowest order bits".
The result is: 0xFFFF -> 0xFF
char c = (char)b; // c = 0xFFFF
Converting a byte to a char is considered a special conversion. It actually performs TWO conversions. First, the byte is SIGN-extended (the new high order bits are copied from the old sign bit) to an int (a normal widening conversion). Second, the int is converted to a char with a narrowing conversion.
The result is: 0xFF -> 0xFFFFFFFF -> 0xFFFF
int d = (int)c; // d = 0x0000FFFF
Converting a char to an int is considered a widening conversion. When a char type is widened to an integral type, it is ZERO-extended (the new high order bits are set to 0).
The result is: 0xFFFF -> 0x0000FFFF
. When printed, this will give you 65535.
The three links I provided are the official Java Language Specification details on primitive type conversions. I HIGHLY recommend you take a look. They are not terribly verbose (and in this case relatively straightforward). It details exactly what java will do behind the scenes with type conversions. This is a common area of misunderstanding for many developers. Post a comment if you are still confused with any step.
Solution 2
It's sign extension. Try \u1234
instead of \uffff
and see what happens.
Solution 3
java byte
is signed. it's counter intuitive. in almost all situations where a byte is used, programmers would want an unsigned byte instead. it's extremely likely a bug if a byte is cast to int directly.
This does the intended conversion correctly in almost all programs:
int c = 0xff & b ;
Empirically, the choice of signed byte is a mistake.
wakachamo
Updated on July 09, 2020Comments
-
wakachamo almost 4 years
How come this happens:
char a = '\uffff'; //Highest value that char can take - 65535 byte b = (byte)a; //Casting a 16-bit value into 8-bit data type...! Isn't data lost here? char c = (char)b; //Let's get the value back int d = (int)c; System.out.println(d); //65535... how?
Basically, I saw that a
char
is 16-bit. Therefore, if you cast it into abyte
, how come no data is lost? (Value is the same after casting into an int)Thanks in advance for answering this little ignorant question of mine. :P
EDIT: Woah, found out that my original output actually did as expected, but I just updated the code above. Basically, a character is cast into a byte and then cast back into a char, and its original, 2-byte value is retained. How does this happen?