Is signed integer overflow still undefined behavior in C++?

36,987

Solution 1

is still overflow of these types an undefined behavior?

Yes. Per Paragraph 5/4 of the C++11 Standard (regarding any expression in general):

If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined. [...]

The fact that a two's complement representation is used for those signed types does not mean that arithmetic modulo 2^n is used when evaluating expressions of those types.

Concerning unsigned arithmetic, on the other hand, the Standard explicitly specifies that (Paragraph 3.9.1/4):

Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2^n where n is the number of bits in the value representation of that particular size of integer

This means that the result of an unsigned arithmetic operation is always "mathematically defined", and the result is always within the representable range; therefore, 5/4 does not apply. Footnote 46 explains this:

46) This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.

Solution 2

Just because a type is defined to use 2s complement representation, it doesn't follow that arithmetic overflow in that type becomes defined.

The undefined behaviour of signed arithmetic overflow is used to enable optimisations; for example, the compiler can assume that if a > b then a + 1 > b also; this doesn't hold in unsigned arithmetic where the second check would need to be carried out because of the possibility that a + 1 might wrap around to 0. Also, some platforms can generate a trap signal on arithmetic overflow (see e.g. http://www.gnu.org/software/libc/manual/html_node/Program-Error-Signals.html); the standard continues to allow this to occur.

Solution 3

I would bet so.

From the standard documentation (pg.4 and 5):

1.3.24 undefined behavior

behavior for which this International Standard imposes no requirements

[ Note: Undefined behavior may be expected when this International Standard omits any explicit definition of behavior or when a program uses an erroneous construct or erroneous data. Permissible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message). Many erroneous program constructs do not engender undefined behavior; they are required to be diagnosed.-- end note]

Share:
36,987

Related videos on Youtube

Archie
Author by

Archie

Updated on February 18, 2020

Comments

  • Archie
    Archie over 4 years

    As we know, signed integer overflow is undefined behavior. But there is something interesting in C++11 cstdint documentation:

    signed integer type with width of exactly 8, 16, 32 and 64 bits respectively with no padding bits and using 2's complement for negative values (provided only if the implementation directly supports the type)

    See link

    And here is my question: since the standard says explicitly that for int8_t, int16_t, int32_t and int64_t negative numbers are 2's complement, is still overflow of these types an undefined behavior?

    Edit I checked C++11 and C11 Standards and here is what I found:

    C++11, §18.4.1:

    The header defines all functions, types, and macros the same as 7.20 in the C standard.

    C11, §7.20.1.1:

    The typedef name intN_t designates a signed integer type with width N, no padding bits, and a two’s complement representation. Thus, int8_t denotes such a signed integer type with a width of exactly 8 bits.

    • Nicol Bolas
      Nicol Bolas about 11 years
      Never forget that the only primary documentation for C++ is the standard. Everything else, even a wiki like CppReference, is a secondary source. That doesn't mean it's wrong; just not entirely reliable.
    • Daniel Fischer
      Daniel Fischer about 11 years
      I would expect it to be UB, there is no exemption for these types in C, I don't see why C++ would add one.
    • ecatmur
      ecatmur about 11 years
    • YSC
      YSC over 8 years
      I'm a bit confused: where is the verb in the sentence "signed integer type with width of exactly 8, 16, 32 and 64 bits respectively with no padding bits and using 2's complement for negative values (provided only if the implementation directly supports the type)?" Is a bit missing? What does it mean?
    • L. F.
      L. F. almost 5 years
      C++11 is based on C99, not C11. But this is not important anyway
  • Archie
    Archie about 11 years
    This paragraph would also imply that unsigned overflow is undefined, which is not.
  • Lightness Races in Orbit
    Lightness Races in Orbit about 11 years
    @Archie: Not really, since unsigned values are defined modulo the unsigned range.
  • Andy Prowl
    Andy Prowl about 11 years
    @Archie: I tried to clarify, but basically you got the answer from LightnessRacesinOrbit
  • supercat
    supercat over 9 years
    It may be worth noting that many people "worry" more about the possibilities of traps, but compiler assumptions are actually more insidious (one of the reasons I wish there were a category between Implementation-Defined and Undefined Behavior--unlike implementation-defined behavior which requires particular implementations to do something in consistent documented fashion, I'd like an "implementation-constrained" behavior which would require implementations to specify everything that could happen as a consequence of something (the specs could explicitly include Undefined Behavior, but...
  • supercat
    supercat over 9 years
    ...implementations would be encouraged to be more specific when practical). On a hardware where two's-complement numbers would "wrap" naturally, there's no sensible reason for code which wants a wrapped-integer result to execute many instructions trying to perform without integer overflow a computation which the hardware could do in just one or two instructions.
  • Ruslan
    Ruslan over 7 years
    @supercat In fact, the code which wants wrapped result can (on 2's complement CPUs) just cast operands to corresponding unsigned types and perform the operation (and then cast back, getting implementation-defined value): this works for addition, subtraction and multiplication. The only problem is with division, modulo and such functions as abs. For those operations when it works it doesn't require more instructions than with signed arthmetic.
  • supercat
    supercat over 7 years
    @Ruslan: In cases where code needs a precisely-wrapped result, a cast to unsigned would be ugly but not necessarily generate extra code. A bigger issue would be with code that needs to quickly identify "potentially interesting" candidates, which will spend most of its time rejecting uninteresting candidates. If one gives a compiler the freedom to arbitrarily keep or discard extra precision with signed integer values, but requires that a cast back to an integer type truncate any such precision, that will enable most of the useful optimizations that would be achieved by making overflow UB, ...
  • supercat
    supercat over 7 years
    ...but would allow code that needs precise wrapping to use one cast rather than two (e.g. (int)(x+y)>z would compare a wrapped result), and would also allow programmers to write x+y>z in cases where it would be acceptable for code to yield 0 or 1 in case of overflow provided it has no other side-effect. If either 0 or 1 would be an equally acceptable result, letting the programmer write that rather than either (long)x+y>z or (int)((unsigned)x+y)>z would allow compilers to select whichever of the latter functions was cheaper in any given context [each would be cheaper in some cases].
  • supercat
    supercat over 7 years
    @Ruslan: In any case, my biggest beef is with compilers which, given a function like "unsigned mul(unsigned short x, unsigned short y) { return x*y; } will not generate code that works reliably for products between INT_MAX+1 and UINT_MAX. Part of the reason short unsigned values were made to promote as signed was that such constructs would pose no problem on implementations that were commonplace in 1989, and I see no reason they should be more problematic today.
  • Ruslan
    Ruslan over 7 years
    @supercat I don't quite get what you mean by "extra precision". If you cast signed to unsigned, then do add/subtract/multiply, then cast the result back to signed, then on two's-complement CPU (with no-op cast back and forth) the result would be identical to the operation on signed, but in addition will also be UB free.
  • supercat
    supercat over 7 years
    @Ruslan: Consider the statement i64 += i32a*i32b;, where each variable is of the indicated size, on the x86 platform. Depending upon register availability, it might be cheaper to use a 32x32->64 multiply instruction and add the lower and upper halves of the result to i64, or to use a 32x32->32 multiply instruction, sign-extend the result, and then perform the addition. Casting to unsigned before the multiply and then back to signed would force the compiler to use the latter approach even in cases where the former would be more efficient. If either approach would meet requirements...
  • supercat
    supercat over 7 years
    ...a programmer should not be required to choose one or the other, but should instead be able to let the compiler choose whether to keep the upper bits from the multiply.
  • Aconcagua
    Aconcagua over 5 years
    It actually does not matter if unsigned overflow is defined or not if it cannot occur due to modulo calculation...
  • Toby Speight
    Toby Speight over 5 years
    There are unsigned operations whose result isn't "mathematically defined" - particularly division by zero - so perhaps your wording isn't quite what you meant in that sentence. ITYM that when the result is mathematically defined, then it's also defined in C++.
  • milahu
    milahu almost 4 years
    relevant paragraphs in c++20 draft are 6.8.1.2 and 7.1.4