Representing char as a byte in Java

55,837

Solution 1

To convert characters to bytes, you need to specify a character encoding. Some character encodings use one byte per character, while others use two or more bytes. In fact, for many languages, there are far too many characters to encode with a single byte.

In Java, the simplest way to convert from characters to bytes is with the String class's getBytes(Charset) method. (The StandardCharsets class defines some common encodings.) However, this method will silently replace characters with � if the character cannot be mapped under the specified encoding. If you need more control, you can configure a CharsetEncoder to handle this case with an error or use a different replacement character.

Solution 2

A char is indeed 16 bits in Java (and is also the only unsigned type!!).

If you are sure the encoding of your characters is ASCII, then you can just cast them away on a byte (since ASCII uses only the lower 7 bits of the char).

If you do not need to modify the characters, or understand their signification within a String, you can just store chars on two bytes, like:

char[] c = ...;
byte[] b = new byte[c.length*2];
for(int i=0; i<c.length; i++) {
    b[2*i] = (byte) (c[i]&0xFF00)>>8; 
    b[2*i+1] = (byte) (c[i]&0x00FF); 
}

(It may be advisable to replace the 2* by a right shift, if speed matters).

Note however that some actual (displayed) characters (or, more accurately, Unicode code-points) are written on two consecutive chars. So cutting between two chars does not ensure that you are cutting between actual characters.

If you need to decode/encode or otherwise manipulate your char array in a String-aware manner, you should rather try to decode and encode your char array or String using the java.io tools, that ensure proper character manipulation.

Solution 3

To extend what others are saying, if you have a char that you need as a byte array, then you first create a String containing that char and then get the byte array from the String:

private byte[] charToBytes(final char x) {
  String temp = new String(new char[] {x});
  try {
    return temp.getBytes("ISO-8859-1");
  } catch (UnsupportedEncodingException e) {
    // Log a complaint
    return null;
  }
}

Of course, use the appropriate character set. Much more efficient that this would be to start working with Strings rather than take a char at a time, convert to a String, then convert to a byte array.

Share:
55,837
jbu
Author by

jbu

Updated on July 28, 2022

Comments

  • jbu
    jbu almost 2 years

    I must convert a char into a byte or a byte array. In other languages I know that a char is just a single byte. However, looking at the Java Character class, its min value is \u0000 and its max value is \uFFFF. This makes it seem like a char is 2 bytes long.

    Will I be able to store it as a byte or do I need to store it as two bytes?

    Before anyone asks, I will say that I'm trying to do this because I'm working under an interface that expects my results to be a byte array. So I have to convert my char to one.

    Please let me know and help me understand this.

    Thanks, jbu