Why are these constructs using pre and post-increment undefined behavior?

91,223

Solution 1

C has the concept of undefined behavior, i.e. some language constructs are syntactically valid but you can't predict the behavior when the code is run.

As far as I know, the standard doesn't explicitly say why the concept of undefined behavior exists. In my mind, it's simply because the language designers wanted there to be some leeway in the semantics, instead of i.e. requiring that all implementations handle integer overflow in the exact same way, which would very likely impose serious performance costs, they just left the behavior undefined so that if you write code that causes integer overflow, anything can happen.

So, with that in mind, why are these "issues"? The language clearly says that certain things lead to undefined behavior. There is no problem, there is no "should" involved. If the undefined behavior changes when one of the involved variables is declared volatile, that doesn't prove or change anything. It is undefined; you cannot reason about the behavior.

Your most interesting-looking example, the one with

u = (u++);

is a text-book example of undefined behavior (see Wikipedia's entry on sequence points).

Solution 2

Most of the answers here quoted from C standard emphasizing that the behavior of these constructs are undefined. To understand why the behavior of these constructs are undefined, let's understand these terms first in the light of C11 standard:

Sequenced: (5.1.2.3)

Given any two evaluations A and B, if A is sequenced before B, then the execution of A shall precede the execution of B.

Unsequenced:

If A is not sequenced before or after B, then A and B are unsequenced.

Evaluations can be one of two things:

  • value computations, which work out the result of an expression; and
  • side effects, which are modifications of objects.

Sequence Point:

The presence of a sequence point between the evaluation of expressions A and B implies that every value computation and side effect associated with A is sequenced before every value computation and side effect associated with B.

Now coming to the question, for the expressions like

int i = 1;
i = i++;

standard says that:

6.5 Expressions:

If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. [...]

Therefore, the above expression invokes UB because two side effects on the same object i is unsequenced relative to each other. That means it is not sequenced whether the side effect by assignment to i will be done before or after the side effect by ++.
Depending on whether assignment occurs before or after the increment, different results will be produced and that's the one of the case of undefined behavior.

Lets rename the i at left of assignment be il and at the right of assignment (in the expression i++) be ir, then the expression be like

il = ir++     // Note that suffix l and r are used for the sake of clarity.
              // Both il and ir represents the same object.  

An important point regarding Postfix ++ operator is that:

just because the ++ comes after the variable does not mean that the increment happens late. The increment can happen as early as the compiler likes as long as the compiler ensures that the original value is used.

It means the expression il = ir++ could be evaluated either as

temp = ir;      // i = 1
ir = ir + 1;    // i = 2   side effect by ++ before assignment
il = temp;      // i = 1   result is 1  

or

temp = ir;      // i = 1
il = temp;      // i = 1   side effect by assignment before ++
ir = ir + 1;    // i = 2   result is 2  

resulting in two different results 1 and 2 which depends on the sequence of side effects by assignment and ++ and hence invokes UB.

Solution 3

I think the relevant parts of the C99 standard are 6.5 Expressions, §2

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

and 6.5.16 Assignment operators, §4:

The order of evaluation of the operands is unspecified. If an attempt is made to modify the result of an assignment operator or to access it after the next sequence point, the behavior is undefined.

Solution 4

Just compile and disassemble your line of code, if you are so inclined to know how exactly it is you get what you are getting.

This is what I get on my machine, together with what I think is going on:

$ cat evil.c
void evil(){
  int i = 0;
  i+= i++ + ++i;
}
$ gcc evil.c -c -o evil.bin
$ gdb evil.bin
(gdb) disassemble evil
Dump of assembler code for function evil:
   0x00000000 <+0>:   push   %ebp
   0x00000001 <+1>:   mov    %esp,%ebp
   0x00000003 <+3>:   sub    $0x10,%esp
   0x00000006 <+6>:   movl   $0x0,-0x4(%ebp)  // i = 0   i = 0
   0x0000000d <+13>:  addl   $0x1,-0x4(%ebp)  // i++     i = 1
   0x00000011 <+17>:  mov    -0x4(%ebp),%eax  // j = i   i = 1  j = 1
   0x00000014 <+20>:  add    %eax,%eax        // j += j  i = 1  j = 2
   0x00000016 <+22>:  add    %eax,-0x4(%ebp)  // i += j  i = 3
   0x00000019 <+25>:  addl   $0x1,-0x4(%ebp)  // i++     i = 4
   0x0000001d <+29>:  leave  
   0x0000001e <+30>:  ret
End of assembler dump.

(I... suppose that the 0x00000014 instruction was some kind of compiler optimization?)

Solution 5

The behavior can't really be explained because it invokes both unspecified behavior and undefined behavior, so we can not make any general predictions about this code, although if you read Olve Maudal's work such as Deep C and Unspecified and Undefined sometimes you can make good guesses in very specific cases with a specific compiler and environment but please don't do that anywhere near production.

So moving on to unspecified behavior, in draft c99 standard section6.5 paragraph 3 says(emphasis mine):

The grouping of operators and operands is indicated by the syntax.74) Except as specified later (for the function-call (), &&, ||, ?:, and comma operators), the order of evaluation of subexpressions and the order in which side effects take place are both unspecified.

So when we have a line like this:

i = i++ + ++i;

we do not know whether i++ or ++i will be evaluated first. This is mainly to give the compiler better options for optimization.

We also have undefined behavior here as well since the program is modifying variables(i, u, etc..) more than once between sequence points. From draft standard section 6.5 paragraph 2(emphasis mine):

Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.

it cites the following code examples as being undefined:

i = ++i + 1;
a[i++] = i; 

In all these examples the code is attempting to modify an object more than once in the same sequence point, which will end with the ; in each one of these cases:

i = i++ + ++i;
^   ^       ^

i = (i++);
^    ^

u = u++ + ++u;
^   ^       ^

u = (u++);
^    ^

v = v++ + ++v;
^   ^       ^

Unspecified behavior is defined in the draft c99 standard in section 3.4.4 as:

use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance

and undefined behavior is defined in section 3.4.3 as:

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

and notes that:

Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

Share:
91,223
PiX
Author by

PiX

Updated on July 08, 2022

Comments

  • PiX
    PiX almost 2 years
    #include <stdio.h>
    
    int main(void)
    {
       int i = 0;
       i = i++ + ++i;
       printf("%d\n", i); // 3
    
       i = 1;
       i = (i++);
       printf("%d\n", i); // 2 Should be 1, no ?
    
       volatile int u = 0;
       u = u++ + ++u;
       printf("%d\n", u); // 1
    
       u = 1;
       u = (u++);
       printf("%d\n", u); // 2 Should also be one, no ?
    
       register int v = 0;
       v = v++ + ++v;
       printf("%d\n", v); // 3 (Should be the same as u ?)
    
       int w = 0;
       printf("%d %d\n", ++w, w); // shouldn't this print 1 1
    
       int x[2] = { 5, 8 }, y = 0;
       x[y] = y ++;
       printf("%d %d\n", x[0], x[1]); // shouldn't this print 0 8? or 5 0?
    }
    
    • PiX
      PiX almost 15 years
      @Jarett, nope, just needed some pointers to "sequence points". While working I found a piece of code with i = i++, I thougth "This isn't modifying the value of i". I tested and I wondered why. Since, i've removed this statment and replaced it by i++;
    • Brian Postow
      Brian Postow almost 14 years
      I think it's interesting that everyone ALWAYS assumes that questions like this are asked because the asker wants to USE the construct in question. My first assumption was that PiX knows that these are bad, but is curious why the behave they way the do on whataver compiler s/he was using... And yeah, what unWind said... it's undefined, it could do anything... including JCF (Jump and Catch Fire)
    • Learn OpenGL ES
      Learn OpenGL ES over 11 years
      I'm curious: Why don't compilers seem to warn on constructs such as "u = u++ + ++u;" if the result is undefined?
    • claws
      claws over 11 years
    • swampf0etus
      swampf0etus almost 11 years
      Why would you expect i = (i++) be equal to 1? Parentheses override the natural order of evaluation precedence, so anything within them will happen first. So i++ will happen first (naturally, i++ would happen after assignment), making it 2. 2 would then be assigned back to i. i is 2.
    • Drew McGowen
      Drew McGowen almost 11 years
      (i++) still evaluates to 1, regardless of parentheses
    • Keith Thompson
      Keith Thompson over 10 years
      Whatever i = (i++); was intended to do, there is certainly a clearer way to write it. That would be true even if it were well defined. Even in Java, which defines the behavior of i = (i++);, it's still bad code. Just write i++;
    • friendzis
      friendzis over 10 years
      Just my cents: such statements are undefined behavior, because you read and write the same variable (memory spot). This allows compiler to do magic called "optimization" i.e. clean up your own mess. Naturally this comes with some limitations. Reading from memory is SLOOOW, therefore registers are used and then synced with real memory. Side effect is that compiler is now unsure which value to use: the one from memory or the one in register that has already been modified. (I like to explain it like that, makes most sense)
    • Lightness Races in Orbit
      Lightness Races in Orbit almost 10 years
      @LearnOpenGLES: They do.
    • Joseph Mansfield
      Joseph Mansfield over 9 years
      I have written an article about identifying undefined behaviour in expressions which covers many similar examples, but is defined in terms of the C++11 sequencing rules. Perhaps some of the readers here will find it useful.
    • akhil_mittal
      akhil_mittal over 9 years
      It reminds me of the interview questions asked by many software firms in India. Though the behaviour is undefined they still try to impose logic on the output. Also similar questions are mentioned in many C books of Yashwant Kanetkar. This kind of questions really make me sick :(
    • Destructor
      Destructor almost 9 years
      @LearnOpenGLES: My compiler(gcc 4.8.1) warns me on constructs like u=u++ & j=i++ + ++i;
    • Johan Lundberg
      Johan Lundberg over 7 years
      Although this question is about C, it may be of interest to some aspects related to this is going to change with the next version of C++, with the voted in C++17 evaluation order guarantees (P0145R2) More: stackoverflow.com/questions/38501587/…
    • rcgldr
      rcgldr almost 7 years
      As mentioned in some comments, C / C++ don't have explicit rules on evaluation order. Some other languages do, in which case, this would not be an issue.The most unusual case is APL (A Programming Language) that evaluates expressions right to left (which allows for multiple assignments on a single line), with parenthesis used to override the order of evaluation.
    • Akhilesh Dhar Dubey
      Akhilesh Dhar Dubey over 6 years
      C compiler ouput different than Java compiler: int i=5; System.out.printf(",%d,%d,%d,%d,%d",i++,i--,++i,--i,i); gcc 5.3.0: Output: 4,5,5,5,5 Java1.8 Output: 5,6,6,5,5
    • supercat
      supercat over 6 years
      @i_am_zero: The fact that the Standard does not mandate a behavior in some situation does not mean that no implementations will specify how they process code in sufficient detail that only one possible behavior would be consistent with the spec. One problem with the Standard is that it has never attempted to catalog all the cases where an implementation would have to go out of its way not to behave in predictable fashion (e.g. using memcpy in cases where the source and destination might occasionally be equal, e.g. because the cost of an occasional redundant copy would be less than...
    • supercat
      supercat over 6 years
      ...the cost of checking on every operation whether the copy was necessary). IMHO, the Standard would be better if it specified a basic execution model and then kinds of optimizations that programmers may enable. Given x=(*p)++ + (*q)++; b=*p; c=*p;, for example, it may be reasonable to say that with some optimizations enabled a compiler could at its option independently treat b and c as either holding the one plus the value that was read before the increment of *p, or as holding a value which is read from *p at any time between the increment and the assignment to b or c.
    • supercat
      supercat over 6 years
      @i_am_zero: Such rules would give compilers almost all of the useful flexibility they have under the present standard, but if combined with ways of converting indeterminate values to arbitrary values could allow some kinds of code to be written more efficiently than is currently possible.
  • Richard
    Richard almost 15 years
    @PiX: Things are undefined for a number of possible reasons. These include: there is no clear "right result", different machine architectures would strongly favour different results, existing practice is not consistent, or beyond the scope of the standard (e.g. what filenames are valid).
  • supercat
    supercat over 12 years
    Would the above imply that 'i=i=5;" would be Undefined Behavior?
  • bad_keypoints
    bad_keypoints over 11 years
    how do i get the machine code? I use Dev C++, and i played around with 'Code Generation' option in compiler settings, but go no extra file output or any console output
  • badp
    badp over 11 years
    @ronnieaka gcc evil.c -c -o evil.bin and gdb evil.bindisassemble evil, or whatever the Windows equivalents of those are :)
  • kchoi
    kchoi over 10 years
    is -0x4(%ebp) = 4 at the end?
  • dhein
    dhein over 10 years
    @supercat as far as I know i=i=5 is also undefined behavior
  • supercat
    supercat over 10 years
    @Zaibis: The rationale I like to use for most places rule applies that in theory a mutli-processor platform could implement something like A=B=5; as "Write-lock A; Write-Lock B; Store 5 to A; store 5 to B; Unlock B; Unock A;", and a statement like C=A+B; as "Read-lock A; Read-lock B; Compute A+B; Unlock A and B; Write-lock C; Store result; Unlock C;". That would ensure that if one thread did A=B=5; while another did C=A+B; the latter thread would either see both writes as having taken place or neither. Potentially a useful guarantee. If one thread did I=I=5;, however, ...
  • supercat
    supercat over 10 years
    ... and the compiler didn't notice that both writes were to the same location (if one or both lvalues involve pointers, that may be hard to determine), the generated code could deadlock. I don't think any real-world implementations implement such locking as part of their normal behavior, but it would be permissible under the standard, and if hardware could implement such behaviors cheaply it might be useful. On today's hardware such behavior would be way too expensive to implement as a default, but that doesn't mean it would always be thus.
  • dhein
    dhein over 10 years
    @supercat but wouldn't the sequence point access rule of c99 alone be enough to declare it as undefined behavior? So it doesn't matter what technically the hardware could implement?
  • supercat
    supercat over 10 years
    @Zaibis: Rules which characterize actions as Undefined Behavior aren't supposed to exist merely to allow implementations to behave in hostile fashion. They're supposed to exist to allow implementers to either do something more efficiently or more usefully than would be possible in their absence. To understand why the specs characterize something as UB, it's helpful to identify something useful the rule would allow implementations to do which they otherwise could not.
  • dhein
    dhein over 10 years
    @supercat I absolutly agree to that what you say about the behavior of undefined behavior(^^). But this doesn't change the point that if something is in the standard listed as UB you can expect, it is well defined just because it would be easy to implement as well defined construct. If the standard says it is UB, then the answer to the question is it UB? is "Yes!", and not "It could... [...]".
  • supercat
    supercat over 10 years
    @Zaibis: The answer to almost any question of the form "Why is X in language/framework Y Undefined Behavior" is "Because that's what the standard for Y says", but that's hardly enlightening. In most cases, however, what someone asking such a question really wants to know is "Why did the makers of the standard specify that". In most cases, things are specified as UB (rather than partially-specified behaviors) to allow for the possibility of an implementation which might do something unexpected. For example, the spec could have said that p1=malloc(4); p2=malloc(5); r=p1>p2;...
  • supercat
    supercat over 10 years
    ...may result in r arbitrarily holding 1 or 0, with no guarantee that the value will relate in any way to future comparisons among the same or different operands. Such a spec (returning an arbitrary 0 or 1) would have allowed an efficient memmove to be written in portable fashion [if dest > src, apply a top-down copy, else bottom-up; if the regions don't overlap, either will work so the comparison result wouldn't matter]. I believe the standard says such comparison is UB, however; if every machine could easily--at worst--arbitrarily yield a 0 or 1, there'd be no reason not to say so.
  • Shafik Yaghmour
    Shafik Yaghmour almost 10 years
    This answer does not really address the question of Why are these constructs undefined behavior?.
  • badp
    badp almost 10 years
    @ShafikYaghmour I'm addressing the questions in the question body ("why am I not getting the results I am getting?"), see the comments in the code. Given that this is undefined behaviour, I can only show how to get the actual assembly he's compiled.
  • Shafik Yaghmour
    Shafik Yaghmour almost 10 years
    Perhaps the answer is in there but I think most would not be able to figure it out without some elaboration. Just add some explanatory text and it becomes an answer.
  • badp
    badp almost 10 years
    @ShafikYaghmour I must admit that the assembly is kinda baffling me; especially the instruction at +20. But why am I trying to make sense of it?
  • M.M
    M.M almost 10 years
    Just to confuse everyone, some such examples are now well-defined in C11, e.g. i = ++i + 1; .
  • supercat
    supercat almost 9 years
    A rather nasty gotcha with regard to Undefined Behavior is that while it used to be safe on 99.9% of compilers to use *p=(*q)++; to mean if (p!=q) *p=(*q)++; else *p= __ARBITRARY_VALUE; that is no longer the case. Hyper-modern C would require writing something like the latter formulation (though there's no standard way of indicating code doesn't care what's in *p) to achieve the level of efficiency compilers used to provide with the former (the else clause is necessary in order to let the compiler optimize out the if which some newer compilers would require).
  • Kat
    Kat over 8 years
    As an aside, it'll be easier to compile to assembly (with gcc -S evil.c), which is all that's needed here. Assembling then disassembling it is just a roundabout way of doing it.
  • Steve Summit
    Steve Summit about 8 years
    For the record, if for whatever reason you're wondering what a given construct does -- and especially if there's any suspicion that it might be undefined behavior -- the age-old advice of "just try it with your compiler and see" is potentially quite perilous. You will learn, at best, what it does under this version of your compiler, under these circumstances, today. You will not learn much if anything about what it's guaranteed to do. In general, "just try it with your compiler" leads to nonportable programs that work only with your compiler.
  • underscore_d
    underscore_d almost 8 years
    Of course it doesn't apply to different variables within one expression. It would be a total design failure if it did! All you need in the 2nd example is for both to be incremented between the statement ending and the next one beginning, and that's guaranteed, precisely because of the concept of sequence points at the centre of all this.
  • Antti Haapala -- Слава Україні
    Antti Haapala -- Слава Україні over 6 years
    I've edited the question to add the UB in evaluation of function arguments, as this question is often used as a duplicate for that. (The last example)
  • Antti Haapala -- Слава Україні
    Antti Haapala -- Слава Україні over 6 years
    I've edited the question to add the UB in evaluation of function arguments, as this question is often used as a duplicate for that. (The last example)
  • Antti Haapala -- Слава Україні
    Antti Haapala -- Слава Україні over 6 years
    Also the question is about c now, not C++
  • haccks
    haccks over 6 years
    How this answer added new to the existing answers? Also the explanations for i=i++ is very similar to this answer.
  • alinsoar
    alinsoar over 6 years
    @haccks I did not read the other answers. I wanted to explain in my own language what I learned from the mentioned document from the official site of ISO 9899 open-std.org/jtc1/sc22/wg14/www/docs/n1188.pdf
  • supercat
    supercat over 6 years
    Reading the Standard and the published rationale, It's clear why the concept of UB exists. The Standard was never intended to fully describe everything a C implementation must do to be suitable for any particular purpose (see the discussion of the "One Program" rule), but instead relies upon implementors' judgment and desire to produce useful quality implementations. A quality implementation suitable for low-level systems programming will need to define the behavior of actions that wouldn't be needed in high-end number crunching.applications. Rather than try to complicate the Standard...
  • supercat
    supercat over 6 years
    ...by getting into extreme detail about which corner cases are or are not defined, the authors of the Standard recognized that implementors should be better paced to judge which kinds of behaviors will be needed by the kinds of programs they're expected to support. Hyper-modernist compilers pretend that making certain actions UB was intended to imply that no quality program should need them, but the Standard and rationale are inconsistent with such a supposed intent.
  • supercat
    supercat over 6 years
    @jrh: I wrote that answer before I'd realized how out of hand the hyper-modernist philosophy had gotten. What irks me is the progression from "We don't need to officially recognize this behavior because the platforms where it's needed can support it anyway" to "We can remove this behavior without providing a usable replacement because it was never recognized and thus any code needing it was broken". Many behaviors should have been deprecated long ago in favor of replacements that were in every way better, but that would have required acknowledging their legitimacy.
  • pqnet
    pqnet almost 6 years
    Undefined behavior basically allows the compiler to make more assumption about conditions which can only be verified at runtime, e.g. assume that in the expression *ptr the pointer is valid, because if it is null the program is allowed to do anything and so it is not necessary to add code to the program to check for that condition and ensure a defined behavior.
  • David R Tribble
    David R Tribble over 5 years
    A the time that C was standardized (1989), many C compilers existed, and each one played by slightly different rules. The primary goal of the ANSI (and later ISO) committee was to codify existing practice. Thus in many cases where multiple compilers disagreed on the "correct" semantic behavior for obviously ambiguous cases (mostly having to do with the evaluation order of expression operators), the committee (wisely) chose to deem such cases as "undefined behavior" or "implementation defined behavior".
  • stillanoob
    stillanoob over 5 years
    @unwind For u=1; u=u++;, is it true that what's undefined is the value of u after the second statement is executed? I mean, by the rules of sequencing of value evaluation (as opposed to the side-effect evaluation), the expression u=u++ must be guaranteed to evaluate to 1, right?
  • kavadias
    kavadias over 5 years
    This sequence int a = 10, b = 20, c = 30; printf("a=%d b=%d c=%d\n", (a = a + b + c), (b = b + b), (c = c + c)); appears to give stable behavior (right-to-left argument evaluation in gcc v7.3.0; result "a=110 b=40 c=60"). Is it because the assignments are considered as 'full-statements' and thus introduce a sequence point? Shouldn't that result in left-to-right argument/statement evaluation? Or, is it just manifestation of undefined behavior?
  • P.P
    P.P over 5 years
    @kavadias That printf statement involves undefined behaviour, for the same reason explained above. You are writing b and c in 3rd & 4th arguments respectively and reading in 2nd argument. But there's no sequence between these expressions (2nd, 3rd, & 4th args). gcc/clang has an option -Wsequence-point which can help find these, too.
  • Ilmari Karonen
    Ilmari Karonen about 5 years
    @stillanoob: No, because the behavior of any code containing that expression is undefined, meaning that it can do literally anything. It might always evaluate to 42, except on Sundays when the moon is waxing gibbous. It might get stuck in an infinite loop instead of evaluating to anything at all. It might jump to a random location in your code. It might crash the process. It might even make your computer catch fire and make demons fly out of your nose, and the C standard still wouldn't care.
  • unwind
    unwind about 5 years
    @Rajesh Because the operator + is not a sequence point. Please read the Wikipedia page.
  • Steve Summit
    Steve Summit over 4 years
    @supercat I now believe that any compiler that's "smart" enough to perform that sort of optimization must also be smart enough to peek at assert statements, so that the programmer can precede the line in question with a simple assert(p != q). (Of course, taking that course would also require rewriting <assert.h> to not delete assertions outright in non-debug versions, but rather, turn them into something like __builtin_assert_disabled() that the compiler proper can see, and then not emit code for.)
  • RobertS supports Monica Cellio
    RobertS supports Monica Cellio almost 4 years
    Not to be offensive, but IMHO this answer addresses more an explanation of undefined behavior itself instead of why the code in the question actually invokes undefined behavior. Thus, I think it would be more appropriate as answer to this question. - For the reason why (due to the question title), you actually only link to the "Sequence point"-Wikipedia page, which is IMHO a little too less for being the accepted answer to that question. - Just my personal opinion.
  • RobertS supports Monica Cellio
    RobertS supports Monica Cellio almost 4 years
    what we're really saying is "add 1 to i, and assign the result back to i, and assign the result back to i". --- I think there is one "and assign the result back to i" too much.
  • Steve Summit
    Steve Summit almost 4 years
    @RobertSsupportsMonicaCellio It's admittedly a bit confusing the way it's written. Read it as "Add 1 to the value fetched from i, assign the result back to i, and assign the result back to i".
  • radioflash
    radioflash over 3 years
    @bad_keypoints I'd suggest using godbolt.org to quickly see generated assembly code for many different compilers/architectures/optimization levels. Links can also be shared easily.
  • Craig Tullis
    Craig Tullis over 3 years
    "C, of course, has a handy shortcut: i++" Haha, nice. Or worst case, i += 1.
  • Steve Summit
    Steve Summit about 3 years
    However if you stick to one compiler, you will find the behavior persistent. Well, no, not necessarily. If you, for example, change optimization flags, the compiler may quite easily end up emitting code which makes the undefined behavior behave differently. Also if you make seemingly-unrelated changes to nearby code.
  • Soup  Endless
    Soup Endless about 3 years
    @haccks this answer is ok besides the fact it's a copy of your answer, but I would ask instead, what all other answers are doing here and why they have so much rep while missing the main point of question, that's explaining the details of the UB in examples.
  • P.P
    P.P about 3 years
    @SoupEndless There are many answers because this is a canonical question for a number of similar (but not direct) duplicates. Without the overhead of creating different canonical posts for minor variants of the same question, often others post answers much later (often years later!) to make a question an ideal candidate for dup hammers. That's what happened here. It's pointless to repeat the same answer (especially after a few years, it's been answered!). So the latter answerers didn't really "miss the point". This is just how SO works.
  • U. Windl
    U. Windl almost 3 years
    @M.M i = ++i is significantly different (regarding the definedness) from i = i++. IMHO the first is just ++i, while the second is "++i maybe".
  • U. Windl
    U. Windl almost 3 years
    @stillanoob I think the important concept of undefinedness is that it propagates: Using an undefined statement or expression causes the surrounding statement or expression to be undefined as well. In the end it makes the whole program effect being undefined.
  • M.M
    M.M almost 3 years
    @U.Windl in C they are both undefined behaviour. There are no degrees of "definedness" of undefined behaviour. If execution reaches either statement then the there is no defined behaviour for the whole program.
  • Mark Ransom
    Mark Ransom almost 2 years
    @M.M execution does not need to reach the malformed code to cause undefined behavior. For concrete examples see: Undefined behavior can result in time travel (among other things, but time travel is the funkiest).
  • M.M
    M.M almost 2 years
    @MarkRansom The time travel can't "start" unless the code would be reached, e.g. if (0) 1/0; can't time-travel to blow up the whole program.
  • Mark Ransom
    Mark Ransom almost 2 years
    @M.M did you read the whole article? The reference to "time travel" is about bad effects that occur before you reach the bad code.
  • M.M
    M.M almost 2 years
    @MarkRansom yes, I understand the subject well. If the behaviour of the program is undefined, effects can be seen at any point in the execution. If the behaviour is not undefined then there cannot be such effects. A program that would never reach an erroneous construct when executed according to the rules of execution in the standard, does not have undefined behaviour. if (0) 1/0; being a simple demonstration of this.
  • Mark Ransom
    Mark Ransom almost 2 years
    @M.M read the article again. There's an example very similar to the one you keep repeating, and it fails.
  • M.M
    M.M almost 2 years
    @MarkRansom No there isn't. The article doesn't even have any complete programs; e.g. if unwitting() is never called then the program's behaviour is not undefined, and the effects described cannot happen, time-travelling or not. In any case, comments are not the place for this sort of discussion -- if you think if (0) 1/0; can time-travel then start a new question (referencing my comment) and I will respond.
  • M.M
    M.M almost 2 years
    To put it another way - there's no time travel in the abstract machine. C is defined as an abstract machine with statements executed sequentially according to the rules of the standard. The behaviour of a program only becomes undefined if the abstract machine's execution reaches the erroneous expression. This occurrence then cancels the requirement for the real machine's output to match the abstract machine's output , meaning that the real machine is permitted to do anything at all, including display apparent time-travel effects .
  • M.M
    M.M almost 2 years
    The abstract machine must execute the code exactly as written -- the "transformations" described by the article are operations the real machine undertakes in order to produce the same observable behaviour as the abstract machine specifies, but faster. And would be non-conforming if they caused a program without UB to display effects of UB. The examples in the article are all supposed to be taken in the context of execution of a program where the abstract machine actually reaches the erroneous expression.