Type conversion - unsigned to signed int/char

32,200

Solution 1

This is because of the various implicit type conversion rules in C. There are two of them that a C programmer must know: the usual arithmetic conversions and the integer promotions (the latter are part of the former).

In the char case you have the types (signed char) == (unsigned char). These are both small integer types. Other such small integer types are bool and short. The integer promotion rules state that whenever a small integer type is an operand of an operation, its type will get promoted to int, which is signed. This will happen no matter if the type was signed or unsigned.

In the case of the signed char, the sign will be preserved and it will be promoted to an int containing the value -5. In the case of the unsigned char, it contains a value which is 251 (0xFB ). It will be promoted to an int containing that same value. You end up with

if( (int)-5 == (int)251 )

In the integer case you have the types (signed int) == (unsigned int). They are not small integer types, so the integer promotions do not apply. Instead, they are balanced by the usual arithmetic conversions, which state that if two operands have the same "rank" (size) but different signedness, the signed operand is converted to the same type as the unsigned one. You end up with

if( (unsigned int)-5 == (unsigned int)-5)

Solution 2

Cool question!

The int comparison works, because both ints contain exactly the same bits, so they are essentially the same. But what about the chars?

Ah, C implicitly promotes chars to ints on various occasions. This is one of them. Your code says if(a==b), but what the compiler actually turns that to is:

if((int)a==(int)b) 

(int)a is -5, but (int)b is 251. Those are definitely not the same.

EDIT: As @Carbonic-Acid pointed out, (int)b is 251 only if a char is 8 bits long. If int is 32 bits long, (int)b is -32764.

REDIT: There's a whole bunch of comments discussing the nature of the answer if a byte is not 8 bits long. The only difference in this case is that (int)b is not 251 but a different positive number, which isn't -5. This is not really relevant to the question which is still very cool.

Solution 3

Welcome to integer promotion. If I may quote from the website:

If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

C can be really confusing when you do comparisons such as these, I recently puzzled some of my non-C programming friends with the following tease:

#include <stdio.h>
#include <string.h>

int main()
{
    char* string = "One looooooooooong string";

    printf("%d\n", strlen(string));

    if (strlen(string) < -1) printf("This cannot be happening :(");

    return 0;
}

Which indeed does print This cannot be happening :( and seemingly demonstrates that 25 is smaller than -1!

What happens underneath however is that -1 is represented as an unsigned integer which due to the underlying bits representation is equal to 4294967295 on a 32 bit system. And naturally 25 is smaller than 4294967295.

If we however explicitly cast the size_t type returned by strlen as a signed integer:

if ((int)(strlen(string)) < -1)

Then it will compare 25 against -1 and all will be well with the world.

A good compiler should warn you about the comparison between an unsigned and signed integer and yet it is still so easy to miss (especially if you don't enable warnings).

This is especially confusing for Java programmers as all primitive types there are signed. Here's what James Gosling (one of the creators of Java) had to say on the subject:

Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.

Solution 4

The hex representation of -5 is:

  • 8-bit, two's complement signed char: 0xfb
  • 32-bit, two's complement signed int: 0xfffffffb

When you convert a signed number to an unsigned number, or vice versa, the compiler does ... precisely nothing. What is there to do? The number is either convertible or it isn't, in which case undefined or implementation-defined behaviour follows (I've not actually checked which) and the most efficient implementation-defined behaviour is to do nothing.

So, the hex representation of (unsigned <type>)-5 is:

  • 8-bit, unsigned char: 0xfb
  • 32-bit, unsigned int: 0xfffffffb

Look familiar? They're bit-for-bit the same as the signed versions.

When you write if (a == b), where a and b are of type char, what the compiler is actually required to read is if ((int)a == (int)b). (This is that "integer promotion" that everyone else is banging on about.)

So, what happens when we convert char to int?

  • 8-bit signed char to 32-bit signed int: 0xfb -> 0xfffffffb
    • Well, that makes sense because it matches the representations of -5 above!
    • It's called a "sign-extend", because it copies the top bit of the byte, the "sign-bit", leftwards into the new, wider value.
  • 8-bit unsigned char to 32-bit signed int: 0xfb -> 0x000000fb
    • This time it does a "zero-extend" because the source type is unsigned, so there is no sign-bit to copy.

So, a == b really does 0xfffffffb == 0x000000fb => no match!

And, c == d really does 0xfffffffb == 0xfffffffb => match!

Solution 5

My point is: didn't you get a warning at compile time "comparing signed and unsigned expression"?

The compiler is trying to inform you that he is entitled to do crazy stuff! :) I would add, crazy stuff will happen using big values, close to the capacity of the primitive type. And

 unsigned int d = -5;

is assigning definitely a big value to d, it's equivalent (even if, probably not guaranteed to be equivalent) to be:

 unsigned int d = UINT_MAX -4; ///Since -1 is UINT_MAX

Edit:

However, it is interesting to notice that only the second comparison gives a warning (check the code). So it means that the compiler applying the conversion rules is confident that there won't be errors in the comparison between unsigned char and char (during comparison they will be converted to a type that can safely represent all its possible values). And he is right on this point. Then, it informs you that this won't be the case for unsigned int and int: during the comparison one of the 2 will be converted to a type that cannot fully represent it.

For completeness, I checked it also for short: the compiler behaves in the same way than for chars, and, as expected, there are no errors at runtime.

.

Related to this topic, I recently asked this question (yet, C++ oriented).

Share:
32,200
user2522685
Author by

user2522685

Updated on July 15, 2022

Comments

  • user2522685
    user2522685 almost 2 years

    I tried the to execute the below program:

    #include <stdio.h>
    
    int main() {
        signed char a = -5;
        unsigned char b = -5;
        int c = -5;
        unsigned int d = -5;
    
        if (a == b)
            printf("\r\n char is SAME!!!");
        else
            printf("\r\n char is DIFF!!!");
    
        if (c == d)
            printf("\r\n int is SAME!!!");
        else
            printf("\r\n int is DIFF!!!");
    
        return 0;
    }
    

    For this program, I am getting the output:

    char is DIFF!!! int is SAME!!!

    Why are we getting different outputs for both?
    Should the output be as below ?

    char is SAME!!! int is SAME!!!

    A codepad link.

  • Admin
    Admin about 11 years
    (int)b is 251 - if char is 8 bits long, which it does not need to be. Better say (1 << CHAR_BIT) - 5.
  • zmbq
    zmbq about 11 years
    True, but I'd rather not confuse OP. I'll add a small remark.
  • Admin
    Admin about 11 years
    Sorry to say that, but your correction made it worse. "only if a char is one byte" - it always is, it's required by the C standard. "Only if char is 8 bits" and "if int is 32 bits" would be a better choice.
  • zmbq
    zmbq about 11 years
    Where, in the last 40 years, have you seen a byte with more than 8 bits?
  • user2522685
    user2522685 about 11 years
    But why has char been promoted to int for conversion? Logical operators do not require only int as operands. So why this promotion to int??
  • Lundin
    Lundin about 11 years
    @user2522685 Because the C language demands it, and the C language is not rational, consistent nor logical.
  • Admin
    Admin about 11 years
    @zmbq No-one cares about portability, folks? "If a char is two bytes" - it never is.
  • Antonio
    Antonio about 11 years
    Downvote because...? It would be constructive to know the reason.
  • Christoph
    Christoph about 11 years
    @zmbq: DSPs need not have a 8 bits per byte, Unisys is still in the mainframe business, there are some weird Forth processors out there (which need not come with a C compiler, though) - if you look hard enough, you still can find such systems produced today
  • Christoph
    Christoph about 11 years
    also note that the answer is misleading - the int comparison does not work because the variables contain the same bits, but because their values compare equal after conversion; the C language mostly does not care about representation - (unsigned)-1 == UINT_MAX holds true even if sign-magnitude representation is used, where, in contrast to two's complement, the conversion is not a noop
  • Christoph
    Christoph about 11 years
    slight inaccuracy: it's also possible to promote to unsigned int if int isn't large enough to represent all values of the type of lesser conversion rank; for example, assume int and short are both 16-bit types; then, conversion of unsigned short to int cannot preserve values in general, so we go with unsigned int instead
  • Lundin
    Lundin about 11 years
    Because you didn't answer the question? Why it doesn't work with char but with int?
  • Antonio
    Antonio about 11 years
    @Lundin My motivation is that this code trigger a case in which the behaviour is undefined. Different compilers will give (are entitled to give) different results. You can try to guess why this particular compiler gave this result, but I do not think there is any point in this: undefined behaviours should be avoided and that is it.
  • Antonio
    Antonio about 11 years
    @Lundin And, by the way, it does work (it has correct behaviour) with chars, and not with int :)
  • torek
    torek almost 11 years
    @zmbq: char was 10 bits on the BBN "C machines" (so-called because, supposedly, they were designed to run C code! with 10-bit char!?!). Those date back to the early 1980s, which would be only about 30 years ago. Meanwhile (same time frame) the Univac 11xx had 9-bit bytes, packed 2 to an 18-bit int and 4 to a 36-bit long, but at least those machines were designed in the 1960s, so they had an excuse. :D
  • torek
    torek almost 11 years
    Technically that should be: if char is a conventional 15 bits (not 32) and int is 32 bits then (int)(unsigned char)-5 is 32763 (not -32764). Unsigned arithmetic is always well-defined, although it depends on the bit-size. For k bits you simply compute (value) mod 2<sup>k</sup>. Signed potentially gets messy, but if the machine is two's complement—most are—it's not bad.
  • noufal
    noufal over 10 years
    Does it promote to int if I explicitly cast the variables like if((unsigned char) 128 == (signed char) -128)?
  • Lundin
    Lundin over 10 years
    @noufal Yes. There is no way to avoid the integer promotion through code. The only thing you can do is to cast the result of the operation to the intended type, which happens to be 100% safe. However, the compiler an optimize away the promotion as long as it doesn't change the outcome of the result. Which in turn means that if you have unexpected side-effects like silent signedness changes, they will be present in the optimized code as well.
  • supercat
    supercat about 10 years
    @Lundin: Is there any guarantee that (unsigned short)((unsigned short)65535u * (unsigned short)65535u) would yield 1 rather than launching the nuclear warheads? Would there be any way to compute the lower 16 bits of the product of two 16-bit numbers would be efficient on a 16-bit machine but guaranteed-correct on a 32-bit machine?
  • supercat
    supercat about 10 years
    Gosling's rationale fails to note that unsigned Byte types in Pascal posed no difficulty whatsoever. The only unsigned types which are at all problematic are those which are larger than the default integer size.
  • Lundin
    Lundin almost 10 years
    Converting a signed integer type to an unsigned one is well-defined behavior as per C11 6.3.1.3/2 "Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type."
  • Lundin
    Lundin almost 10 years
    Converting an unsigned integer to a signed one is either well-defined or impl. defined, C11 6.3.1.3/1 and 6.3.1.3/3 respectively: When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged. /--/ "Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised."
  • ams
    ams almost 10 years
    @lundin interesting. I can't quite work out what it means to repeatedly add or subtract one more than the maximum value of the type, although I'd imagine the result is what I showed above.
  • Lundin
    Lundin almost 10 years
    Here is an explanation. They even use -5 as example, by coincidence :)
  • user13107
    user13107 almost 7 years
    I dont get why (int)(strlen(string)) < -1 works. YOu only changed left side of comparison, right side will still be 4294967295 according to previous explanation. (cc @Xolve )
  • Fabio Pozzi
    Fabio Pozzi almost 7 years
    @user13107 strlen(string) returns a size_t result that cannot be represented as an int. Thus all operands of the comparison are promoted to unsigned int.