Detecting signed overflow in C/C++

34,383

Solution 1

Your approach with subtraction is correct and well-defined. A compiler cannot optimize it away.

Another correct approach, if you have a larger integer type available, is to perform the arithmetic in the larger type and then check that the result fits in the smaller type when converting it back

int sum(int a, int b)
{
    long long c;
    assert(LLONG_MAX>INT_MAX);
    c = (long long)a + b;
    if (c < INT_MIN || c > INT_MAX) abort();
    return c;
}

A good compiler should convert the entire addition and if statement into an int-sized addition and a single conditional jump-on-overflow and never actually perform the larger addition.

Edit: As Stephen pointed out, I'm having trouble getting a (not-so-good) compiler, gcc, to generate the sane asm. The code it generates is not terribly slow, but certainly suboptimal. If anyone knows variants on this code that will get gcc to do the right thing, I'd love to see them.

Solution 2

No, your 2nd code isn't correct, but you are close: if you set

int half = INT_MAX/2;
int half1 = half + 1;

the result of an addition is INT_MAX. (INT_MAX is always an odd number). So this is valid input. But in your routine you will have INT_MAX - half == half1 and you would abort. A false positive.

This error can be repaired by putting < instead of <= in both checks.

But then also your code isn't optimal. The following would do:

int add(int lhs, int rhs)
{
 if (lhs >= 0) {
  if (INT_MAX - lhs < rhs) {
   /* would overflow */
   abort();
  }
 }
 else {
  if (rhs < INT_MIN - lhs) {
   /* would overflow */
   abort();
  }
 }
 return lhs + rhs;
}

To see that this is valid, you have to symbolically add lhs on both sides of the inequalities, and this gives you exactly the arithmetical conditions that your result is out of bounds.

Solution 3

For the gcc case, from gcc 5.0 Release notes we can see it now provides a __builtin_add_overflow for checking overflow in addition:

A new set of built-in functions for arithmetics with overflow checking has been added: __builtin_add_overflow, __builtin_sub_overflow and __builtin_mul_overflow and for compatibility with clang also other variants. These builtins have two integral arguments (which don't need to have the same type), the arguments are extended to infinite precision signed type, +, - or * is performed on those, and the result is stored in an integer variable pointed to by the last argument. If the stored value is equal to the infinite precision result, the built-in functions return false, otherwise true. The type of the integer variable that will hold the result can be different from the types of the first two arguments.

For example:

__builtin_add_overflow( rhs, lhs, &result )

We can see from the gcc document Built-in Functions to Perform Arithmetic with Overflow Checking that:

[...]these built-in functions have fully defined behavior for all argument values.

clang also provides a set of checked arithmetic builtins:

Clang provides a set of builtins that implement checked arithmetic for security critical applications in a manner that is fast and easily expressable in C.

in this case the builtin would be:

__builtin_sadd_overflow( rhs, lhs, &result )

Solution 4

IMHO, the eastiest way to deal with overflow sentsitive C++ code is to use SafeInt<T>. This is a cross platform C++ template hosted on code plex which provides the safety guarantees that you desire here.

I find it very intuitive to use as it provides the many of the same usage patterns as normal numerical opertations and expresses over and under flows via exceptions.

Solution 5

The fastest possible way is to use the GCC builtin:

int add(int lhs, int rhs) {
    int sum;
    if (__builtin_add_overflow(lhs, rhs, &sum))
        abort();
    return sum;
}

On x86, GCC compiles this into:

    mov %edi, %eax
    add %esi, %eax
    jo call_abort 
    ret
call_abort:
    call abort

which uses the processor's built-in overflow detection.

If you're not OK with using GCC builtins, the next fastest way is to use bit operations on the sign bits. Signed overflow in addition occurs when:

  • the two operands have the same sign, and
  • the result has a different sign than the operands.

The sign bit of ~(lhs ^ rhs) is on iff the operands have the same sign, and the sign bit of lhs ^ sum is on iff the result has a different sign than the operands. So you can do the addition in unsigned form to avoid undefined behavior, and then use the sign bit of ~(lhs ^ rhs) & (lhs ^ sum):

int add(int lhs, int rhs) {
    unsigned sum = (unsigned) lhs + (unsigned) rhs;
    if ((~(lhs ^ rhs) & (lhs ^ sum)) & 0x80000000)
        abort();
    return (int) sum;
}

This compiles into:

    lea (%rsi,%rdi), %eax
    xor %edi, %esi
    not %esi
    xor %eax, %edi
    test %edi, %esi
    js call_abort
    ret
call_abort:
    call abort

which is quite a lot faster than casting to a 64-bit type on a 32-bit machine (with gcc):

    push %ebx
    mov 12(%esp), %ecx
    mov 8(%esp), %eax
    mov %ecx, %ebx
    sar $31, %ebx
    clt
    add %ecx, %eax
    adc %ebx, %edx
    mov %eax, %ecx
    add $-2147483648, %ecx
    mov %edx, %ebx
    adc $0, %ebx
    cmp $0, %ebx
    ja call_abort
    pop %ebx
    ret
call_abort:
    call abort
Share:
34,383
Channel72
Author by

Channel72

Updated on October 16, 2021

Comments

  • Channel72
    Channel72 over 2 years

    At first glance, this question may seem like a duplicate of How to detect integer overflow?, however it is actually significantly different.

    I've found that while detecting an unsigned integer overflow is pretty trivial, detecting a signed overflow in C/C++ is actually more difficult than most people think.

    The most obvious, yet naive, way to do it would be something like:

    int add(int lhs, int rhs)
    {
     int sum = lhs + rhs;
     if ((lhs >= 0 && sum < rhs) || (lhs < 0 && sum > rhs)) {
      /* an overflow has occurred */
      abort();
     }
     return sum; 
    }
    

    The problem with this is that according to the C standard, signed integer overflow is undefined behavior. In other words, according to the standard, as soon as you even cause a signed overflow, your program is just as invalid as if you dereferenced a null pointer. So you can't cause undefined behavior, and then try to detect the overflow after the fact, as in the above post-condition check example.

    Even though the above check is likely to work on many compilers, you can't count on it. In fact, because the C standard says signed integer overflow is undefined, some compilers (like GCC) will optimize away the above check when optimization flags are set, because the compiler assumes a signed overflow is impossible. This totally breaks the attempt to check for overflow.

    So, another possible way to check for overflow would be:

    int add(int lhs, int rhs)
    {
     if (lhs >= 0 && rhs >= 0) {
      if (INT_MAX - lhs <= rhs) {
       /* overflow has occurred */
       abort();
      }
     }
     else if (lhs < 0 && rhs < 0) {
      if (lhs <= INT_MIN - rhs) {
       /* overflow has occurred */
       abort();
      }
     }
    
     return lhs + rhs;
    }
    

    This seems more promising, since we don't actually add the two integers together until we make sure in advance that performing such an add will not result in overflow. Thus, we don't cause any undefined behavior.

    However, this solution is unfortunately a lot less efficient than the initial solution, since you have to perform a subtract operation just to test if your addition operation will work. And even if you don't care about this (small) performance hit, I'm still not entirely convinced this solution is adequate. The expression lhs <= INT_MIN - rhs seems exactly like the sort of expression the compiler might optimize away, thinking that signed overflow is impossible.

    So is there a better solution here? Something that is guaranteed to 1) not cause undefined behavior, and 2) not provide the compiler with an opportunity to optimize away overflow checks? I was thinking there might be some way to do it by casting both operands to unsigned, and performing checks by rolling your own two's-complement arithmetic, but I'm not really sure how to do that.

  • Mike DeSimone
    Mike DeSimone over 13 years
    +1 This is another way of saying "If C won't define it, then you're forced into platform-specific behavior." So many things that are easily taken care of in assembly are undefined in C, creating mountains out of molehills in the name of portability.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    Remove the bitwise-and and cast from the return statement. They're incorrect as written. Conversion from larger signed integer types to smaller ones is perfectly well-defined as long as the value fits in the smaller type, and it does not need an explicit cast. Any compiler that gives a warning and suggests that you add a cast when you're just checked that the value does not overflow is a broken compiler.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    Signed overflow if perfectly detectable and avoidable in plain C. -1 for suggesting asm instead of suggesting C code that will generate the correct asm on a good compiler.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    To anyone wanting to use this, be sure you're looking at my edited version. In the original I stupidly omitted the cast to long long before the addition.
  • wheaties
    wheaties over 13 years
    @R I have to agree with Rook.
  • Matthieu M.
    Matthieu M. over 13 years
    @R. perfectly detectable yes, however for integers operations one needs to care about performance. Many libraries are quite intensive in their integer usage (compression / decompression, encoding / decoding), it would be great to be able to use safe integer there too.
  • Jonathan
    Jonathan over 13 years
    @R You are correct, I just like being explicit about my casts. I'll change it for correctness, though. For future readers, the return line read return (int32_t)(sum & 0xffffffff);.
  • Mike DeSimone
    Mike DeSimone over 13 years
    Then, by your definition gcc 4.4.4 is not a good compiler. I just tried this at -O2 and -O3 and got the same code, 12 lines of assembly (not counting assembler commands) where the actual addition is performed by leaq (%rax,%rcx), %rcx, or addl %eax, %ecx; adcl %edx, %ebx if the -m32 flag is given.
  • Mike DeSimone
    Mike DeSimone over 13 years
    Also, gcc didn't remove the if statement from the generated code.
  • Channel72
    Channel72 over 13 years
    @R, are you certain that the approach with subtraction is correct? Since signed-overflow is undefined, isn't it possible the compiler could see the expression (INT_MAX - lhs <= rhs) and optimize it away, thinking that such an expression must always be true?
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    Yes I am certain. There is no overflow in the expression INT_MAX - lhs <= rhs (assuming lhs is positive). Depending on the values of lhs and rhs, this expression can evaluate to 0 or 1. A compiler cannot optimize away the computation of expressions with nonconstant values. It could, of course, optimize this to an addition followed by an overflow check, if the machine supports such a thing.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    Note that if you write sum & 0xffffffff, sum is implicitly converted to type unsigned int (assuming 32-bit int) because 0xffffffff has type unsigned int. Then the result of the bitwise and is an unsigned int, and if sum was negative, it will be outside the range of values supported by int32_t. The conversion to int32_t then has implementation-defined behavior.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    I gave a downvote for an asm answer to a C question. As I have said, there are correct, portable ways to write the check in C which will generate the exact same asm that you would write by hand. Naturally if you use these, the performance impact will be the same, and it will be much less of an impact than the C++ safeint stuff you also recommended.
  • Stephen Canon
    Stephen Canon over 13 years
    Out of curiosity, have you been successful in getting a compiler to do this optimization? A quick test against a few compilers didn't turn up any that could do it.
  • David Thornley
    David Thornley over 13 years
    @Channel72: Also importantly, none of the subexpressions can overflow. If lhs is an int and nonnegative, INT_MAX - lhs yields a value between 0 and INT_MAX, inclusive, and that's perfectly legal, so nothing about this triggers undefined behavior.
  • David Thornley
    David Thornley over 13 years
    @Matthieu: If you are writing code that will only be used on one implementation, and that implementation guarantees that something will work, and you need good integer performance, you can certainly use implementation-specific tricks. That isn't what the OP was asking for, though.
  • Amir Zadeh
    Amir Zadeh over 13 years
    Agreed with rook. I guess overflow flags are present in many platforms.
  • ruslik
    ruslik over 13 years
    What I can't understand is why someone who is worried about int32 overflows on the 64bit machine will continue to use int32? And using this in 32bit mode will be quite inneficient.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    On x86_64, there's nothing inefficient about using 32-bit integers. Performance is identical to 64-bit ones. One motivation for using smaller-than-native-word-size types is that it's extremely efficient to handle overflow conditions or carry (for arbitrary-precision arithmetic) since the overflow/carry happens in a directly-accessible location.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    @Stephen: Oddly I can't get it to work at the moment either, but I've seen plenty of cases where gcc optimizes out unnecessary 64-bit arithmetic like this. I wonder if it's too stupid to figure out how to do it for signed values; I normally only use unsigned arithmetic for things like this.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    C distinguishes implementation-defined behavior and undefined behavior for good reasons, and even if something with UB "works" in the current version of your implementation, that doesn't mean it will continue to work in future versions. Consider gcc and signed overflow behavior...
  • Channel72
    Channel72 over 13 years
    You're missing the point here. Your second line of code sum = a + b could yield undefined behavior.
  • Nils Pipenbrinck
    Nils Pipenbrinck over 13 years
    if you cast sum, a and b to unsigned during your test-addition your code will work btw..
  • ruslik
    ruslik over 13 years
    It's undefined not because the program will crash or will behave differently. It's the exact thing the processor is doing to compute the OF flag. The standard is just trying to protect itself from non-standard cases, but it doesn't mean that you are not allowed to do this.
  • ruslik
    ruslik over 13 years
    @Nils yeah, i wanted to do that, but I thought that four (usngined int)s will make it much more unreadable. (you know, you first read it, and try it only if you liked it).
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    Since I based my -1 on a claim that we could get C code to generate the identical asm, I guess it's only fair to retract it when all the major compilers turn out to be junk in this regard..
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    -1 for misuse of volatile to mask undefined behavior. If it "works", you're still just "getting lucky".
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    Well you could make it more readable by taking out the useless int keywords. unsigned int is just an overly verbose way of writing unsigned.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    You're still storing the result in sum, which is an int. That results in either an implementation-defined result or an implementation-defined signal being raised if the value of (unsigned)lhs + (unsigned)rhs is greater than INT_MAX.
  • Chris Dodd
    Chris Dodd over 13 years
    @R: that's the whole point -- the behavior is implementation defined, rather than undefined, so the implementation must document what it does, and do it consistently. A signal can only be raised if the implementation documents it, in which case it must always be raised and you can use that behavior.
  • Jens Gustedt
    Jens Gustedt over 13 years
    @R., @Steven: no the subtraction code that the OP gave is not correct, please see my answer. I also give a code there that just does it with at most two comparisons. Perhaps the compilers will do better with that.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    +1 for nice alternate approach that's perhaps more intuitive.
  • Jens Gustedt
    Jens Gustedt over 13 years
    @R.: after looking into it more closely it is also not wonder that gcc doesn't get away with just one jump on overflow. In fact, the conditions for overflow and underflow are in general not symmetric: on nowadays architectures usually INT_MIN == (-INT_MAX) - 1. So the two bound checks must be different.
  • Jens Gustedt
    Jens Gustedt over 13 years
    @R.: after thinking it over once more, I am not even convinced that your proposal of blowing it up to a wider type is worth it. Just use the corresponding unsigned type.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    @Jens: Which unsigned approach? There are lots and I'm not sure which is most efficient for handling signed inputs.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 13 years
    @Jens: As for gcc, x86 has a perfectly good jo instruction (jump on overflow) it should be using.
  • nategoose
    nategoose over 13 years
    @R: If it doesn't work the compiler isn't implementing volatile correctly. All I was trying for was a simpler solution to a very common problem on an already answered question.
  • nategoose
    nategoose over 13 years
    Where it might fail, though, would be a system whose numeric representation wrap to lower values upon overflow for integers.
  • nategoose
    nategoose over 13 years
    That last comment should have a "didn't" or "doesn't" in it.
  • davmac
    davmac over 10 years
    @nategoose, your assertion that "if it doesn't work the compiler isn't implementing volatile correctly" is wrong. For one thing, in two's complement arithmetic, it will always be true that lhs = sum - rhs even if overflow occurred. Even if that were not the case, and although this particular example is a bit contrived, the compiler might for instance generate code which performs the addition, stores the result value, reads the value back into another register, compares the stored value with the read value and notices that they are the same and therefore assume no overflow has occurred.
  • davmac
    davmac over 10 years
    (you're also assuming that causing the overflow won't cause the comparison afterwards to go wrong or even be skipped, which "undefined behavior" allows).
  • davmac
    davmac over 10 years
    @ruslik, it's undefined which specifically means the compiler can assume it won't happen. In this case the compiler can assume that the addition won't overflow, and a really smart compiler could then recognise that the condition in the 'if' statement must then always be false and so completely remove the check; you're back to the original problem.
  • davmac
    davmac over 10 years
    I think this - result = (n1 - INT_MAX)+n2; - could overflow, if n1 was small (say 0) and n2 was negative.
  • Pacerier
    Pacerier over 10 years
    @R.. so with two choices 1) using subtration and 2) converting to a wider type, which would be more performant?
  • chux - Reinstate Monica
    chux - Reinstate Monica over 9 years
    +1 for the best answer. Minor: suggest /* overflow will occurred */ to emphasize that the whole point is to detect that overflow would have occurred if code did lhs + rhs without doing actually the sum.
  • chux - Reinstate Monica
    chux - Reinstate Monica over 9 years
    This approach does not work in the uncommon platform where sizeof(long long) == sizeof(int). C specifies only that sizeof(long long) >= sizeof(int).
  • phuclv
    phuclv almost 9 years
    the undefined behavior is in C, not after compiling to assembly
  • chux - Reinstate Monica
    chux - Reinstate Monica over 8 years
    This function appears to be very useful except for one thing: int result; __builtin_add_overflow(INT_MAX, 1, &result); does not explicitly say what is stored in result on overflow and unfortunately is quiet on specifying undefined behavior does not occur. Certainly that was the intent - no UB. Better if it specified that.
  • Shafik Yaghmour
    Shafik Yaghmour over 8 years
    @chux good point, it states here the result is always defined, I updated my answer. It would be rather ironic if that was not the case.
  • chux - Reinstate Monica
    chux - Reinstate Monica over 8 years
    Interesting your new reference does not have a (unsigned) long long *result for __builtin_(s/u)addll_overflow. Certainly these are in error. Makes one wonder as to the veracity of other aspects. IAC, good to see these __builtin_add/sub/mull_overflow(). Hope they make it to the C spec someday.
  • supercat
    supercat over 7 years
    @davmac: Hmm... maybe it's necessary to break out three cases: start with one for (n1 ^ n2) < 0, which on a two's-complement machine would imply that the values have opposite sign and may be added directly. If the values have the same sign, then the approach given above would be safe. On the other hand, I'm curious if the authors of the Standard expected that implementations for two's-complement silent-overflow hardware would jump the rails in case of overflow in a way that didn't force an immediate Abnormal Program Termination, but caused unpredictable disruption of other computations.
  • supercat
    supercat almost 7 years
    Defining "did last operation overflow" flag behavior can be very expensive, since optimal code generation often requires rearranging computations. If the expression x+y is evaluated within a loop but neither x nor y changes within that loop, it may be helpful for a compiler to evaluate that expression just once--before the loop starts. Such reordering could totally disrupt any code that tries to examine the overflow flag, however.
  • Alex Reinking
    Alex Reinking almost 5 years
    +1 this generates much better assembly than anything you could get in standard C, at least not without relying on your compiler's optimizer to figure out what you're doing. One should detect when such builtins are available and only use a standard solution when the compiler doesn't provide one.
  • rtx13
    rtx13 about 4 years
    Note that this won't work in ILP64 environments where ints are 64-bit.
  • Dyskord
    Dyskord about 3 years
    a small optimization you could do, I suppose this would depend on your hardware, I'm not sure which is better, but if you use an if and an else if with (lhs > 0 && rhs > 0) and (lhs < 0 && rhs < 0) this would allow you to skip making a subtraction in cases where the signs do not match or either value is 0, but it would require 4 comparisons in those cases and it would require an extra comparison in the case that both values are negative. Which are faster in the hardware? comparisons or arithmetic operations such as subtractions?
  • Jay-Pi
    Jay-Pi over 2 years
    You can add that with C2X making signed integer overflow defined (as all produced architectures now work on 2s complement), this can be simplified to the approach of Hacker's Delight: var sum: ST = a +% b; (+% being wrapping addition). if (((sum ^ a) & (sum ^ b)) < 0) overflow(); return sum;