What does it mean for a char to be signed?

40,807

Solution 1

It won't make a difference for strings. But in C you can use a char to do math, when it will make a difference.

In fact, when working in constrained memory environments, like embedded 8 bit applications a char will often be used to do math, and then it makes a big difference. This is because there is no byte type by default in C.

Solution 2

In terms of the values they represent:

unsigned char:

  • spans the value range 0..255 (00000000..11111111)
  • values overflow around low edge as:

    0 - 1 = 255 (00000000 - 00000001 = 11111111)

  • values overflow around high edge as:

    255 + 1 = 0 (11111111 + 00000001 = 00000000)

  • bitwise right shift operator (>>) does a logical shift:

    10000000 >> 1 = 01000000 (128 / 2 = 64)

signed char:

  • spans the value range -128..127 (10000000..01111111)
  • values overflow around low edge as:

    -128 - 1 = 127 (10000000 - 00000001 = 01111111)

  • values overflow around high edge as:

    127 + 1 = -128 (01111111 + 00000001 = 10000000)

  • bitwise right shift operator (>>) does an arithmetic shift:

    10000000 >> 1 = 11000000 (-128 / 2 = -64)

I included the binary representations to show that the value wrapping behaviour is pure, consistent binary arithmetic and has nothing to do with a char being signed/unsigned (expect for right shifts).

Update

Some implementation-specific behaviour mentioned in the comments:

Solution 3

#include <stdio.h>

int main(int argc, char** argv)
{
    char a = 'A';
    char b = 0xFF;
    signed char sa = 'A';
    signed char sb = 0xFF;
    unsigned char ua = 'A';
    unsigned char ub = 0xFF;
    printf("a > b: %s\n", a > b ? "true" : "false");
    printf("sa > sb: %s\n", sa > sb ? "true" : "false");
    printf("ua > ub: %s\n", ua > ub ? "true" : "false");
    return 0;
}


[root]# ./a.out
a > b: true
sa > sb: true
ua > ub: false

It's important when sorting strings.

Solution 4

There are a couple of difference. Most importantly, if you overflow the valid range of a char by assigning it a too big or small integer, and char is signed, the resulting value is implementation defined or even some signal (in C) could be risen, as for all signed types. Contrast that to the case when you assign something too big or small to an unsigned char: the value wraps around, you will get precisely defined semantics. For example, assigning a -1 to an unsigned char, you will get an UCHAR_MAX. So whenever you have a byte as in a number from 0 to 2^CHAR_BIT, you should really use unsigned char to store it.

The sign also makes a difference when passing to vararg functions:

char c = getSomeCharacter(); // returns 0..255
printf("%d\n", c);

Assume the value assigned to c would be too big for char to represent, and the machine uses two's complement. Many implementation behave for the case that you assign a too big value to the char, in that the bit-pattern won't change. If an int will be able to represent all values of char (which it is for most implementations), then the char is being promoted to int before passing to printf. So, the value of what is passed would be negative. Promoting to int would retain that sign. So you will get a negative result. However, if char is unsigned, then the value is unsigned, and promoting to an int will yield a positive int. You can use unsigned char, then you will get precisely defined behavior for both the assignment to the variable, and passing to printf which will then print something positive.

Note that a char, unsigned and signed char all are at least 8 bits wide. There is no requirement that char is exactly 8 bits wide. However, for most systems that's true, but for some, you will find they use 32bit chars. A byte in C and C++ is defined to have the size of char, so a byte in C also is not always exactly 8 bits.

Another difference is, that in C, a unsigned char must have no padding bits. That is, if you find CHAR_BIT is 8, then an unsigned char's values must range from 0 .. 2^CHAR_BIT-1. THe same is true for char if it's unsigned. For signed char, you can't assume anything about the range of values, even if you know how your compiler implements the sign stuff (two's complement or the other options), there may be unused padding bits in it. In C++, there are no padding bits for all three character types.

Solution 5

"What does it mean for a char to be signed?"

Traditionally, the ASCII character set consists of 7-bit character encodings. (As opposed to the 8 bit EBCIDIC.)

When the C language was designed and implemented this was a significant issue. (For various reasons like data transmission over serial modem devices.) The extra bit has uses like parity.

A "signed character" happens to be perfect for this representation.

Binary data, OTOH, is simply taking the value of each 8-bit "chunk" of data, thus no sign is needed.

Share:
40,807
dsimcha
Author by

dsimcha

Updated on July 09, 2022

Comments

  • dsimcha
    dsimcha almost 2 years

    Given that signed and unsigned ints use the same registers, etc., and just interpret bit patterns differently, and C chars are basically just 8-bit ints, what's the difference between signed and unsigned chars in C? I understand that the signedness of char is implementation defined, and I simply can't understand how it could ever make a difference, at least when char is used to hold strings instead of to do math.