At what point in the loop does integer overflow become undefined behavior?

11,610

Solution 1

If you're interested in a purely theoretical answer, the C++ standard allows undefined behaviour to "time travel":

[intro.execution]/5: A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation)

As such, if your program contains undefined behaviour, then the behaviour of your whole program is undefined.

Solution 2

First, let me correct the title of this question:

Undefined Behavior is not (specifically) of the realm of execution.

Undefined Behavior affects all steps: compiling, linking, loading and executing.

Some examples to cement this, bear in mind that no section is exhaustive:

  • the compiler can assume that portions of code that contain Undefined Behavior are never executed, and thus assume the execution paths that would lead to them are dead code. See What every C programmer should know about undefined behavior by none other than Chris Lattner.
  • the linker can assume that in the presence of multiple definitions of a weak symbol (recognized by name), all definitions are identical thanks to the One Definition Rule
  • the loader (in case you use dynamic libraries) can assume the same, thus picking the first symbol it finds; this is usually (ab)used for intercepting calls using LD_PRELOAD tricks on Unixes
  • the execution might fail (SIGSEV) should you use dangling pointers

This is what is so scary about Undefined Behavior: it is nigh impossible to predict, ahead of time, what exact behavior will occur, and this prediction has to be revisited at each update of the toolchain, underlying OS, ...


I recommend watching this video by Michael Spencer (LLVM Developer): CppCon 2016: My Little Optimizer: Undefined Behavior is Magic.

Solution 3

An aggressively optimising C or C++ compiler targeting a 16 bit int will know that the behaviour on adding 1000000000 to an int type is undefined.

It is permitted by either standard to do anything it wants which could include the deletion of the entire program, leaving int main(){}.

But what about larger ints? I don't know of a compiler that does this yet (and I'm not an expert in C and C++ compiler design by any means), but I imagine that sometime a compiler targeting a 32 bit int or higher will figure out that the loop is infinite (i doesn't change) and so a will eventually overflow. So once again, it can optimise the output to int main(){}. The point I'm trying to make here is that as compiler optimisations become progressively more aggressive, more and more undefined behaviour constructs are manifesting themselves in unexpected ways.

The fact that your loop is infinite is not in itself undefined since you are writing to standard output in the loop body.

Solution 4

Technically, under the C++ standard, if a program contains undefined behavior, the behavior of the entire program, even at compile time (before the program is even executed), is undefined.

In practice, because the compiler may assume (as part of an optimization) that the overflow will not occur, at least the behavior of the program on the third iteration of the loop (assuming a 32-bit machine) will be undefined, though it is likely that you will get correct results before the third iteration. However, since the behavior of the entire program is technically undefined, there's nothing stopping the program from generating completely incorrect output (including no output), crashing at runtime at any point during execution, or even failing to compile altogether (as undefined behavior extends to compile time).

Undefined behavior provides the compiler with more room to optimize because they eliminate certain assumptions about what the code must do. In doing so, programs that rely on assumptions involving undefined behavior are not guaranteed to work as expected. As such, you should not rely on any particular behavior that is considered undefined per the C++ standard.

Solution 5

To understand why undefined behavior can 'time travel' as @TartanLlama adequately put it, let's take a look at the 'as-if' rule:

1.9 Program execution

1 The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

With this, we could view the program as a 'black box' with an input and an output. The input could be user-input, files, and many other things. The output is the 'observable behavior' mentioned in the standard.

The standard only defines a mapping between the input and the output, nothing else. It does this by describing an 'example black box', but explicitly says any other black box with the same mapping is equally valid. This means the content of the black box is irrelevant.

With this in mind, it would not make sense to say that undefined behavior occurs at a certain moment. In the sample implementation of the black box, we could say where and when it happens, but the actual black box could be something completely different, so we can't say where and when it happens anymore. Theoretically, a compiler could for example decide to enumerate all the possible inputs, and pre-compute the resulting outputs. Then the undefined behavior would have happened during compilation.

Undefined behavior is the inexistence of a mapping between input and output. A program can have undefined behavior for some input, but defined behavior for other. Then the mapping between input and output is simply incomplete; there is input for which no mapping to output exists.
The program in the question has undefined behavior for any input, so the mapping is empty.

Share:
11,610

Related videos on Youtube

jcoder
Author by

jcoder

Updated on June 11, 2022

Comments

  • jcoder
    jcoder about 2 years

    This is an example to illustrate my question which involves some much more complicated code that I can't post here.

    #include <stdio.h>
    int main()
    {
        int a = 0;
        for (int i = 0; i < 3; i++)
        {
            printf("Hello\n");
            a = a + 1000000000;
        }
    }
    

    This program contains undefined behavior on my platform because a will overflow on the 3rd loop.

    Does that make the whole program have undefined behavior, or only after the overflow actually happens? Could the compiler potentially work out that a will overflow so it can declare the whole loop undefined and not bother to run the printfs even though they all happen before the overflow?

    (Tagged C and C++ even though are different because I'd be interested in answers for both languages if they are different.)

    • Support Ukraine
      Support Ukraine over 7 years
      Wonder if the compiler could work out that a isn't used (except for calculating itself) and simply remove a
    • TartanLlama
      TartanLlama over 7 years
      You might enjoy My Little Optimizer: Undefined Behaviour is Magic from CppCon this year. It's all about what optimizations compilers can carry out based on undefined behaviour.
    • Bo Persson
      Bo Persson over 7 years
    • Eric Lippert
      Eric Lippert over 7 years
    • Kyle Strand
      Kyle Strand over 7 years
    • usr
      usr over 7 years
      This is almost a duplicate: stackoverflow.com/questions/23153445/…
    • M.M
      M.M over 7 years
      This question covers the same issues (in C)
    • M.M
      M.M over 7 years
      A common example of UB being rejected at compile-time is the case of calling a function which was declared but not defined
    • MatthewRock
      MatthewRock over 7 years
      Undefined behaviour means it's undefined. Compiler is free to do anything. For example, it can invert its understanding of standard and treat each defined behaviour as undefined, and vice versa.
    • user541686
      user541686 over 7 years
      @jcoder: I believe the answer is wrong, but a common misconception. Take a look at mine so you see why.
    • Andrew
      Andrew over 7 years
      Of course, the UB is only exhibited if int is smaller than signed 32-bit...
    • Admin
      Admin over 7 years
      If you pass a null pointer into a func that can't receive a null eg memcpy, the compilier can remove all null checks on that pointer ANYWHERE, until it's reassigned because have you told it that it can't be null. The best answer to this question is "Don't use UB anywhere" because the behaviour can be both unexpected and random.
    • Jason S
      Jason S over 7 years
      use -fwrapv with gcc and clang, by the way. Not sure why this is still handled as undefined behavior by the C standard. It's 2016. All of today's devices use 2's complement arithmetic.
    • Shafik Yaghmour
      Shafik Yaghmour almost 6 years
      I added a answer that also covers C since no one really did this is tagged with both.
  • jimifiki
    jimifiki over 7 years
    Is it permitted by the standard to do anything it wants even before the undefined behaviour manifests? Where is this stated?
  • Support Ukraine
    Support Ukraine over 7 years
    why 16 bit? I guess OP is looking for a 32 bit signed overflow.
  • Bathsheba
    Bathsheba over 7 years
    @4386427: could have sworn the question mentioned a 16 bit int. But the fact that it doesn't makes the answer more interesting.
  • Angew is no longer proud of SO
    Angew is no longer proud of SO over 7 years
    @jimifiki In the standard. C++14 (N4140) 1.3.24 "udnefined behaviour = behavior for which this International Standard imposes no requirements." Plus a lengthy note which elaborates. But the point is that it's not the behaviour of a "statement" that is undefined, it's the behaviour of the program. That means that as long as UB is triggered by a rule in the standard (or by absence of a rule), the standard ceases to apply for the program as a whole. So any part of the program can behave however it wants.
  • jcoder
    jcoder over 7 years
    I intended my loop to terminate but it's interesting either way :)
  • jcoder
    jcoder over 7 years
    This is what worries me. In my real code, it's complex but I might have a case where it will always overflow. And I don't really care about that, but i worry that "correct" code will also be affected by this. Obviously I need to fix it, but fixing requires understanding :)
  • MSalters
    MSalters over 7 years
    @jcoder: There's one important escape here. The compiler isn't allowed to guess at input data. As long as there is at least one input for which Undefined Behavior does not occur, the compiler must ensure that this particular input still produces the right output. All the scary talk about dangerous optimizations only apply to unavoidable UB. Practically speaking, if you would have used argc as the loop count, the case argc=1 does not produce UB and the compiler would be forced to handle that.
  • Matthieu M.
    Matthieu M. over 7 years
    @jcoder: In this case, this is not dead code. The compiler, however, could be smart enough to deduce that i cannot be incremented more than N times and therefore that its value is bounded.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE over 7 years
    The first statement is wrong. If int is 16-bit, the addition will take place in long (because the literal operand has type long) where it's well-defined, then be converted by an implementation-defined conversion back to int.
  • Keith Thompson
    Keith Thompson over 7 years
    There's no reason to consider optimization when determining whether the behavior is undefined.
  • Jordan Melo
    Jordan Melo over 7 years
    The fact that the program behaves as one might assume it should does not mean the undefined behaviour "vanishes". The behaviour is still undefined and you are simply relying on luck. The very fact that the program's behaviour can change based on compiler options is a strong indicator that the behaviour is undefined.
  • Sebastian Lenartowicz
    Sebastian Lenartowicz over 7 years
    @KeithThompson: But then, the sneeze() function itself is undefined on anything of the class Demon (of which the nasal variety is a subclass), making the whole thing circular anyways.
  • usr
    usr over 7 years
    But printf might not return, so the first two rounds are defined because until the're done it is not clear there there ever will be UB. See stackoverflow.com/questions/23153445/…
  • usr
    usr over 7 years
    printf might not return, so the first two rounds are defined because until the're done it is not clear there there ever will be UB. See stackoverflow.com/questions/23153445/…
  • supercat
    supercat over 7 years
    I don't consider especially astonishing compilers which sometimes behave as though signed arithmetic is performed on types whose range extends beyond "int", especially considering that even when doing straightforward code generation on x86 there are times when doing so is more efficient than truncating intermediate results. What's more astonishing is when overflow affects other calculations, which can happen in gcc even if code stores the product of two uint16_t values into a uint32_t--an operation which should have no plausible reason to act surprising in a non-sanitizing build.
  • Crashworks
    Crashworks over 7 years
    This is why a compiler is technically within its rights to emit "nop" for the Linux kernel (because the bootstrap code relies on undefined behavior): blog.regehr.org/archives/761
  • Random832
    Random832 over 7 years
    Of course, the correct check would be if(numberOfNewChars > endOfBufferPtr - currentPtr), provided numberOfNewChars can never be negative and currentPtr always points to somewhere within the buffer you don't even need the ridiculous "wraparound" check. (I don't think the code you provided has any hope of working in a circular buffer - you've left out whatever is necessary for that in the paraphrase, so I'm ignoring that case as well)
  • user253751
    user253751 over 7 years
    @Crashworks And that is why Linux is written in, and compiled as, unportable C. (i.e. a superset of C that requires a particular compiler with particular opitions, such as -fno-strict-aliasing)
  • user253751
    user253751 over 7 years
    @usr I expect it's defined if printf does not return, but if printf is going to return, then the undefined behaviour can cause problems before printf is called. Hence, time travel. printf("Hello\n"); and then the next line compiles as undoPrintf(); launchNuclearMissiles();
  • Admin
    Admin over 7 years
    @jcoder: If f(good); does some thing X and f(bad); invokes undefined behavior, then a program that just invokes f(good); is guaranteed to do X, but f(good); f(bad); is not guaranteed to do X.
  • Matthieu M.
    Matthieu M. over 7 years
    @jcoder: I added a link to a recent video from a LLVM developer talking about undefined behavior. It gives some more examples.
  • Cort Ammon
    Cort Ammon over 7 years
    @Random832 I did leave out a ton. I've tried to quote the larger context, but since I lost my source, I've found paraphrasing the context got me in more trouble so I leave it out. I really need to find that blasted bug report so I can quote it properly. It really is a powerful example of how you can think you wrote code one way, and have it compile completely differently.
  • M.M
    M.M over 7 years
    @usr the behaviour of printf is defined by the standard to always return
  • David Schwartz
    David Schwartz over 7 years
    You are totally misreading the standard. It says the behavior when executing the program is undefined. Period. This answer is 100% wrong. The standard is very clear -- running a program with input that produces UB at any point in the naive flow of execution is undefined.
  • user541686
    user541686 over 7 years
    @DavidSchwartz: If you follow your interpretation to its logical conclusions you should realize it doesn't make logical sense. The input is not something that is fully determined when the program starts. The input to the program (even its mere presence) at any given line is allowed to depend on all of the side effects of the program until that line. Therefore, the program cannot avoid producing the side effects that come before the UB line, because that requires interaction with its environment and therefore affects whether the UB line will be reached or not in the first place.
  • David Schwartz
    David Schwartz over 7 years
    That's a non-sequiter. We are talking about what the standard guarantees, not what won't break because you can't think of any way it could break. (And I can think of ways it can break. You are just less imaginative than I am.)
  • user541686
    user541686 over 7 years
    @DavidSchwartz: It's not though. The program has no way of knowing whether the input is known ahead of time or whether the input is based on the program's preceding outputs. The input is beyond the program's control. Since in the latter case it is impossible for the program to avoid code that precedes UB, the same must hold true in the former case.
  • David Schwartz
    David Schwartz over 7 years
    That doesn't matter. Really. Again, you just lack imagination. For example, if the compiler can tell that no compliant code could tell the difference, it could move the code that's UB such that the part that's UB executes prior to the outputs that you naively expect to be "preceding".
  • user541686
    user541686 over 7 years
    @DavidSchwartz: What do you mean it doesn't matter? All you're doing is just asserting that I'm wrong with no logical reasoning behind your position. I'm logically proving to you that what you're saying requires an impossibility, and you're telling me I'm wrong because it "doesn't matter"?
  • David Schwartz
    David Schwartz over 7 years
  • David Schwartz
    David Schwartz over 7 years
    @usr Since printf will in fact return, there's no reason to think the compiler couldn't know that. And if the compiler knows printf will return, it knows there will be UB. That permits it to assume the code will never execute. So it could remove the loop. (Among other things.)
  • usr
    usr over 7 years
    I agree. I did not know printf is specified to return. Maybe its more interesting to consider an arbitrary function that might not return. That's more real-world.
  • John Dvorak
    John Dvorak over 7 years
    @Hurkyl more interestingly, if your code is if(foo) f(good); else f(bad);, a smart compiler will throw away the comparison and produce and an unconditional foo(good).
  • Graham
    Graham over 7 years
    @JordanMelo Since many of the previous answers discussed optimisation (and the OP specifically asked about that), I've mentioned a feature of optimisation which no previous answer had covered. I also pointed out that even though optimisation could remove it, reliance on optimisation to work in any particular way is again undefined. I'm certainly not recommending it! :)
  • Graham
    Graham over 7 years
    @KeithThompson Sure, but the OP specifically asked about optimisation and its effect on the undefined behaviour he would see on his platform. That specific behaviour could vanish, depending on optimisation. As I said in my answer though, the undefinedness would not.
  • user541686
    user541686 over 7 years
    @TartanLlama: Do you know of a single compiler that would e.g. optimize out statements that have side effects and which precede unconditional undefined behavior?
  • TartanLlama
    TartanLlama over 7 years
    @Mehrdad None that I can think of, hence the "purely theoretical" part of my answer. I know that LLVM's CFG simplification just returns if anything could affect the unreachability of a statement with UB. See here if you're interested.
  • user541686
    user541686 over 7 years
    @TartanLlama: So here's the crucial question. If none of them do such a thing, why don't they? Is your best justification that they just haven't gotten around to implementing it? Because, to me, this suggests that they may not be interpreting the standard the same way people here are at all.
  • TartanLlama
    TartanLlama over 7 years
    @Mehrdad Probably because it's a lot of effort to work out if a given function could affect the reachability of some UB and it likely doesn't buy you a lot of performance to do so. Deleting the rest of a basic block after some unconditional UB is easy, doing it beforehand is less so.
  • Johannes Schaub - litb
    Johannes Schaub - litb over 7 years
    As the text says, it even renders the whole program behavior undefined if just an unspecific behavior could possibly yield to undefined behavior, even if the implementation would never choose/exhibit such unspecific behavior.
  • John McGrath
    John McGrath over 7 years
    This is my biggest issue with undefined behaviour. It makes it sometimes impossible to write correct code, and when the compiler detects it, by default does not tell you it's triggered undefined behaviour. In this case the user simply wants to do arithmetic - pointer or not - and all their hard work to wrote secure code was undone. There should at least be a way to annotate a section of code to say - no fancy optimizations here. C/C++ is used in too many critical areas to allow this hazardous situation to continue in favour of optimzation
  • Jason S
    Jason S over 7 years
    and yet, here's the thing... compilers can use undefined behavior to optimize but THEY GENERALLY DON'T TELL YOU. So if we have this awesome tool that you have to avoid doing X at all costs, why can't the compiler give you a warning so you can fix it?
  • supercat
    supercat over 7 years
    @Mehrdad: On many implementations, certain library functions or volatile accesses may cause signals (or something else like them) to be raised, such that code would never reach succeeding statements. If an implementation specifies that pressing control-C while code is blocked at a getch() will raise a SIGINT, for example, then behavior should be defined if code after the getch() would invoke UB, but a SIGINT prevents execution from getting there.
  • supercat
    supercat over 7 years
    @Mehrdad: Unless the entity that controls a compiler also controls everything else about the execution environment, there may be no way for the compiler to know all of the cases where a system call or volatile access might wrest control away from the caller and not return it; if an execution environment specifies that control-C will prevent getch() from ever returning, it would be rather rude of a compiler to behave as though was going to return even in cases where the operator would have typed control-C.
  • user541686
    user541686 over 7 years
    @supercat: "Unless the entity that controls a compiler also controls everything else about the execution environment, there may be no way for the compiler to know all of the cases where a system call or volatile access might wrest control away from the caller and not return it." Excellent! Now would you kindly take a few minutes to read my (downvoted-to-hell) answer below and let me know if you agree with it? It seems that you are saying exactly the same thing I have been saying, yet no one seems to be believing me...
  • supercat
    supercat over 7 years
    @Mehrdad: Perhaps a better means of saying things would be to say that UB cannot time travel back past the last point where something could have happened in the real world that would have made behavior defined. If an implementation could determine by examining input buffers that there was no way any of the next 1000 calls to getchar() could block, and it could also determine that UB would occur after the 1000th call, it would not be required to perform any of the calls. If, however, an implementation were to specify that execution will not pass a getchar() until all preceding output had...
  • supercat
    supercat over 7 years
    ...been delivered to a 300-baud terminal, and that any control-C which is occurs before that will cause getchar() to raise a signal even if there were other characters in the buffer preceding it, then such an implementation could not move any UB past the last output preceding a getchar(). What's tricky is knowing in what case a compiler should be expected to pass through the the programmer any behavioral guarantees a library implementation might offer beyond those mandated by the Standard.
  • user541686
    user541686 over 7 years
    @supercat: Thanks! Yes, I completely agree.
  • supercat
    supercat over 7 years
    @Mehrdad: I think a fundamental problem is that C has split into two languages--one of which is suitable for systems programming and one of which eagerly and needlessly jettisons features necessary for systems programming in an effort to make the language more suitable for high-end computing [I say needless because the level of performance that could be achieved by adding directives to waive guarantees that a particular program doesn't need would exceed anything that could be achieved by a conforming compiler in the absence of such directives, but adding support for such directives...
  • supercat
    supercat over 7 years
    ...would not require breaking any existing code]. The zeal with which some compiler writers pursue UB-based optimizations had led to an apparent lack of interest in other kinds of optimizations which could be easier, safer, and more effective.
  • mlvljr
    mlvljr over 7 years
    What if the UB part is within an if(false) {} scope? Does that poison the entire program, due to compiler assuming all branches contain ~well-defined portions of logic, and thus operating on wrong assumptions?
  • bwDraco
    bwDraco over 7 years
    The standard imposes no requirements whatsoever on undefined behavior, so in theory, yes, it does poison the whole program. However, in practice, any optimizing compiler will likely just remove the dead code, so it probably would have no effect on execution. You still shouldn't rely on this behavior, though.
  • mlvljr
    mlvljr over 7 years
    Good to know, thanx :)
  • phant0m
    phant0m over 6 years
    @MSalters "As long as there is at least one input for which Undefined Behavior does not occur, the compiler must ensure that this particular input still produces the right output" Where can I read up on this?
  • supercat
    supercat almost 6 years
    The fact that executing a function would invoke UB can only affect the way a program behaves when given some particular input if at least one possible execution of the program when given that input would invoke UB. The fact that invoking a function would invoke UB does not prevent a program from having defined behavior when it is fed input that would not allow the function to be invoked.
  • Shafik Yaghmour
    Shafik Yaghmour almost 6 years
    @supercat I believe that is what my answer us saying for C at least.
  • Shafik Yaghmour
    Shafik Yaghmour almost 6 years
    I added a answer which covers C more specifically because a duplicate was for C but no answers really covered C properly.
  • supercat
    supercat almost 6 years
    I think the same applies for the quoted text re C++, since the phrase "Any such execution" refers to ways the program could execute with a particular given input. If a particular input couldn't result in a function executing, I see nothing in the quoted text to say that anything in such a function would result in UB.