Does cast between signed and unsigned int maintain exact bit pattern of variable in memory?

c casting int unsigned htonl

29,617

Solution 1

In general, casting in C is specified in terms of values, not bit patterns - the former will be preserved (if possible), but the latter not necessarily so. In case of two's complement representations without padding - which is mandatory for the fixed-with integer types - this distinction does not matter and the cast will indeed be a noop.

But even if the conversion from signed to unsigned would have changed the bit pattern, converting it back again would have restored the original value - with the caveat that out-of-range unsigned to signed conversion is implementation-defined and may raise a signal on overflow.

For full portability (which will probably be overkill), you'll need to use type punning instead of conversion. This can be done in one of two ways:

Via pointer casts, ie

uint32_t u = *(uint32_t*)&x;

which you should be careful with as it may violate effective typing rules (but is fine for signed/unsigned variants of integer types) or via unions, ie

uint32_t u = ((union { int32_t i; uint32_t u; }){ .i = x }).u;

which can also be used to eg convert from double to uint64_t, which you may not do with pointer casts if you want to avoid undefined behaviour.

Solution 2

Casts are used in C to mean both "type conversion" and "type disambiguation". If you have something like

(float) 3

Then it's a type conversion, and the actual bits change. If you say

(float) 3.0

it's a type disambiguation.

Assuming a 2's complement representation (see comments below), when you cast an int to unsigned int, the bit pattern is not changed, only its semantical meaning; if you cast it back, the result will always be correct. It falls into the case of type disambiguation because no bits are changed, only the way that the computer interprets them.

Note that, in theory, 2's complement may not be used, and unsigned and signed can have very different representations, and the actual bit pattern can change in that case.

However, from C11 (the current C standard), you actually are guaranteed that sizeof(int) == sizeof(unsigned int):

(§6.2.5/6) For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements [...]

I would say that in practice, you can assume it is safe.

Solution 3

This should always be safe, because the intXX_t types are guaranteed to be in two's complement if they exist:

7.20.1.1 Exact-width integer types The typedef name intN_t designates a signed integer type with width N , no padding bits, and a two’s complement representation. Thus, int8_t denotes such a signed integer type with a width of exactly 8 bits.

Theoretically, the back-conversion from uint32_t to int32_t is implementation defined, as for all unsigned to signed conversions. But I can't much imagine that a platform would do differently than what you expect.

If you want to be really sure of this you still could to that conversion manually. You'd just have to test a value for > INT32_MAX and then do a little bit of math. Even if you do that systematically, a decent compiler should be able to detect that and optimize it out.

29,617

Author by

Flash

Updated on October 22, 2020

Comments

Flash over 3 years
I want to pass a 32-bit signed integer x through a socket. In order that the receiver knows which byte order to expect, I am calling htonl(x) before sending. htonl expects a uint32_t though and I want to be sure of what happens when I cast my int32_t to a uint32_t.
```
int32_t x = something;
uint32_t u = (uint32_t) x;
```
Is it always the case that the bytes in x and u each will be exactly the same? What about casting back:
```
uint32_t u = something;
int32_t x = (int32_t) u;
```
I realise that negative values cast to large unsigned values but that doesn't matter since I'm just casting back on the other end. However if the cast messes with the actual bytes then I can't be sure casting back will return the same value.
Christoph over 10 years

this is incorrect - signed to unsigned conversion may change bit patterns - it just so happens that it does not in case of two's complement representation with identical padding
Flash over 10 years

Great, this is what I wanted to know. So it will always work provided signed ints are represented using two's complement and this is pretty much always the case. Is that correct?
unwind over 10 years

The initial example is wrong, since 3.0 is a double, the expression (float) 3.0 most certainly is a type conversion, too.
Christoph over 10 years

@Andrew: correct - and even on non-two's complement hardware, the compiler would have to either fake it for the fixed-width integer types or not provide them at all; raising a signal on overflow should also not be a problem in practice
Filipe Gonçalves over 10 years

@unwind According to expert c programming - deep C secrets (page 223), it's a type disambiguation because the compiler can plant the correct bits in the first place. It's like 3.0f
Christoph over 10 years

well, I can imagine that a platform that traps integer overflow might be convinced to do so on type conversion (eg the compiler might generate a useless add 0 instruction to trigger it); I'd be very surprised if there's a compiler that actually does so by default (or at all); also, I'd rather go for type punning than checks for INT32_MAX - the fixed-with integers do not come with trap representations, so as far as the C standard is concerned, it's as safe as it gets and actually captures the progarmmers intent
Filipe Gonçalves over 10 years

Had I used double x = 3.0; float y; y = (float) x;, then it would certainly be a type conversion
Zulan about 8 years

Does the pointer casting example not violate the strict aliasing rule? int32_t and uint32_t are incompatible. Are they not?
Christoph about 8 years

@Zulan: signed and unsigned versions of a type may alias (cf C11 section 6.5 §7)
Cecil Ward over 7 years

Which dialect of C or C++, or which standard, is required to be able to use the syntax used in the union example?
Christoph over 7 years

@CecilWard: C99 due to the use of compound literals; you could use a temporary variable instead to get C90 and C++ compatibility
Eric over 6 years

That book doesn't seem correct to me, or at least its distinction not meaningfull. The compiler can plant the correct bits for (float) 3 as well, as it performs constant folding.
polynomial_donut about 6 years

"which you should be careful with as it may violate effective typing rules" can you elaborate on this?