Java - stream of bytes vs. stream of characters?

11,300

Solution 1

In Java, a byte is not the same thing as a char. Therefore a byte stream is different from a character stream. Bytes are intended for arbitrary binary data; characters are specifically for data representing the building blocks of strings.

but if a char is only 1 byte in width

Except that it's not. As per the JLS §4.2.1 a char is a number in the range:

from '\u0000' to '\uffff' inclusive, that is, from 0 to 65535

But a byte is a number in the range

from -128 to 127, inclusive

Solution 2

Stream of byte is just plain byte, like how you would see it when you open a file in HEX Editor.

Character is different from just plain byte. ASCII encoding uses exactly 1 byte per character, but that is not true for many other encoding. For example, UTF-8 encoding may use from 1 to 4 bytes to encode a single character. Stream of character is designed to abstract away the underlying encoding, and produce char of one type of encoding (in Java, char and String uses UTF-16 encoding).

As a rule of thumb:

  • When you are dealing with text, you must use stream of character to decode the byte into character with the appropriate encoding.

  • When you are dealing with binary data or mixed of binary and text, you must use stream of byte, since it doesn't make sense otherwise. If a sequence of byte represents a String in certain encoding, then you can always pick those bytes out and use String(byte[] bytes, Charset charset) constructor to get back the String.

Solution 3

They are different. char is a 2-byte datatype in Java: byte is a 1-byte datatype.

Edit: char is also an unsigned type, while byte is not.

Share:
11,300
IAmYourFaja
Author by

IAmYourFaja

my father is a principal at burgoyne intnl and got me this job programming lisp and development. I aspire to unittesting with a concentration in mobile platforms.

Updated on June 03, 2022

Comments

  • IAmYourFaja
    IAmYourFaja almost 2 years

    Title is pretty self-explanatory. In a lot of the JRE javadocs I see the phrases "stream of bytes" and "stream of characters" all over the place.

    But aren't they the same thing? Or are they slightly different (e.g. interpreted differently) in Java-land? Thanks in advance.

  • IAmYourFaja
    IAmYourFaja about 11 years
    Thanks @Matt Ball - I understand they are different as far as types go (byte, char, etc.), but if a char is only 1 byte in width, then what's different about storing an input stream as a byte array vs char array? That was at the root of my question.
  • Matt Ball
    Matt Ball about 11 years
    Who says a char is only 1 byte in width? docs.oracle.com/javase/7/docs/api/java/lang/Character.html
  • Matt Ball
    Matt Ball about 11 years
    "All world will burn when bytes will stop being 8 bits." Hardly. en.wikipedia.org/wiki/Byte#History