Java - stream of bytes vs. stream of characters?
Solution 1
In Java, a byte
is not the same thing as a char
. Therefore a byte stream is different from a character stream. Bytes are intended for arbitrary binary data; characters are specifically for data representing the building blocks of strings.
but if a char is only 1 byte in width
Except that it's not.
As per the JLS §4.2.1 a char
is a number in the range:
from
'\u0000'
to'\uffff'
inclusive, that is, from 0 to 65535
But a byte
is a number in the range
from -128 to 127, inclusive
Solution 2
Stream of byte is just plain byte, like how you would see it when you open a file in HEX Editor.
Character is different from just plain byte. ASCII encoding uses exactly 1 byte per character, but that is not true for many other encoding. For example, UTF-8 encoding may use from 1 to 4 bytes to encode a single character. Stream of character is designed to abstract away the underlying encoding, and produce char
of one type of encoding (in Java, char
and String
uses UTF-16 encoding).
As a rule of thumb:
When you are dealing with text, you must use stream of character to decode the byte into character with the appropriate encoding.
When you are dealing with binary data or mixed of binary and text, you must use stream of byte, since it doesn't make sense otherwise. If a sequence of byte represents a String in certain encoding, then you can always pick those bytes out and use
String(byte[] bytes, Charset charset)
constructor to get back the String.
Solution 3
They are different. char
is a 2-byte datatype in Java: byte
is a 1-byte datatype.
Edit: char
is also an unsigned type, while byte
is not.
IAmYourFaja
my father is a principal at burgoyne intnl and got me this job programming lisp and development. I aspire to unittesting with a concentration in mobile platforms.
Updated on June 03, 2022Comments
-
IAmYourFaja almost 2 years
Title is pretty self-explanatory. In a lot of the JRE javadocs I see the phrases "stream of bytes" and "stream of characters" all over the place.
But aren't they the same thing? Or are they slightly different (e.g. interpreted differently) in Java-land? Thanks in advance.
-
IAmYourFaja about 11 yearsThanks @Matt Ball - I understand they are different as far as types go (
byte
,char
, etc.), but if achar
is only 1 byte in width, then what's different about storing an input stream as a byte array vs char array? That was at the root of my question. -
Matt Ball about 11 yearsWho says a
char
is only 1 byte in width? docs.oracle.com/javase/7/docs/api/java/lang/Character.html -
Matt Ball about 11 years"All world will burn when bytes will stop being 8 bits." Hardly. en.wikipedia.org/wiki/Byte#History