Is uninitialized local variable the fastest random number generator?

20,932

Solution 1

As others have noted, this is Undefined Behavior (UB).

In practice, it will (probably) actually (kind of) work. Reading from an uninitialized register on x86[-64] architectures will indeed produce garbage results, and probably won't do anything bad (as opposed to e.g. Itanium, where registers can be flagged as invalid, so that reads propagate errors like NaN).

There are two main problems though:

  1. It won't be particularly random. In this case, you're reading from the stack, so you'll get whatever was there previously. Which might be effectively random, completely structured, the password you entered ten minutes ago, or your grandmother's cookie recipe.

  2. It's Bad (capital 'B') practice to let things like this creep into your code. Technically, the compiler could insert reformat_hdd(); every time you read an undefined variable. It won't, but you shouldn't do it anyway. Don't do unsafe things. The fewer exceptions you make, the safer you are from accidental mistakes all the time.

The more pressing issue with UB is that it makes your entire program's behavior undefined. Modern compilers can use this to elide huge swaths of your code or even go back in time. Playing with UB is like a Victorian engineer dismantling a live nuclear reactor. There's a zillion things to go wrong, and you probably won't know half of the underlying principles or implemented technology. It might be okay, but you still shouldn't let it happen. Look at the other nice answers for details.

Also, I'd fire you.

Solution 2

Let me say this clearly: we do not invoke undefined behavior in our programs. It is never ever a good idea, period. There are rare exceptions to this rule; for example, if you are a library implementer implementing offsetof. If your case falls under such an exception you likely know this already. In this case we know using uninitialized automatic variables is undefined behavior.

Compilers have become very aggressive with optimizations around undefined behavior and we can find many cases where undefined behavior has lead to security flaws. The most infamous case is probably the Linux kernel null pointer check removal which I mention in my answer to C++ compilation bug? where a compiler optimization around undefined behavior turned a finite loop into an infinite one.

We can read CERT's Dangerous Optimizations and the Loss of Causality (video) which says, amongst other things:

Increasingly, compiler writers are taking advantage of undefined behaviors in the C and C++ programming languages to improve optimizations.

Frequently, these optimizations are interfering with the ability of developers to perform cause-effect analysis on their source code, that is, analyzing the dependence of downstream results on prior results.

Consequently, these optimizations are eliminating causality in software and are increasing the probability of software faults, defects, and vulnerabilities.

Specifically with respect to indeterminate values, the C standard defect report 451: Instability of uninitialized automatic variables makes for some interesting reading. It has not been resolved yet but introduces the concept of wobbly values which means the indeterminatness of a value may propagate through the program and can have different indeterminate values at different points in the program.

I don't know of any examples where this happens but at this point we can't rule it out.

Real examples, not the result you expect

You are unlikely to get random values. A compiler could optimize the away the loop altogether. For example, with this simplified case:

void updateEffect(int  arr[20]){
    for(int i=0;i<20;i++){
        int r ;    
        arr[i] = r ;
    }
}

clang optimizes it away (see it live):

updateEffect(int*):                     # @updateEffect(int*)
    retq

or perhaps get all zeros, as with this modified case:

void updateEffect(int  arr[20]){
    for(int i=0;i<20;i++){
        int r ;    
        arr[i] = r%255 ;
    }
}

see it live:

updateEffect(int*):                     # @updateEffect(int*)
    xorps   %xmm0, %xmm0
    movups  %xmm0, 64(%rdi)
    movups  %xmm0, 48(%rdi)
    movups  %xmm0, 32(%rdi)
    movups  %xmm0, 16(%rdi)
    movups  %xmm0, (%rdi)
    retq

Both of these cases are perfectly acceptable forms of undefined behavior.

Note, if we are on an Itanium we could end up with a trap value:

[...]if the register happens to hold a special not-a-thing value, reading the register traps except for a few instructions[...]

Other important notes

It is interesting to note the variance between gcc and clang noted in the UB Canaries project over how willing they are to take advantage of undefined behavior with respect to uninitialized memory. The article notes (emphasis mine):

Of course we need to be completely clear with ourselves that any such expectation has nothing to do with the language standard and everything to do with what a particular compiler happens to do, either because the providers of that compiler are unwilling to exploit that UB or just because they have not gotten around to exploiting it yet. When no real guarantee from the compiler provider exists, we like to say that as-yet unexploited UBs are time bombs: they’re waiting to go off next month or next year when the compiler gets a bit more aggressive.

As Matthieu M. points out What Every C Programmer Should Know About Undefined Behavior #2/3 is also relevant to this question. It says amongst other things (emphasis mine):

The important and scary thing to realize is that just about any optimization based on undefined behavior can start being triggered on buggy code at any time in the future. Inlining, loop unrolling, memory promotion and other optimizations will keep getting better, and a significant part of their reason for existing is to expose secondary optimizations like the ones above.

To me, this is deeply dissatisfying, partially because the compiler inevitably ends up getting blamed, but also because it means that huge bodies of C code are land mines just waiting to explode.

For completeness sake I should probably mention that implementations can choose to make undefined behavior well defined, for example gcc allows type punning through unions while in C++ this seems like undefined behavior. If this is the case the implementation should document it and this will usually not be portable.

Solution 3

No, it's terrible.

The behaviour of using an uninitialised variable is undefined in both C and C++, and it's very unlikely that such a scheme would have desirable statistical properties.

If you want a "quick and dirty" random number generator, then rand() is your best bet. In its implementation, all it does is a multiplication, an addition, and a modulus.

The fastest generator I know of requires you to use a uint32_t as the type of the pseudo-random variable I, and use

I = 1664525 * I + 1013904223

to generate successive values. You can choose any initial value of I (called the seed) that takes your fancy. Obviously you can code that inline. The standard-guaranteed wraparound of an unsigned type acts as the modulus. (The numeric constants are hand-picked by that remarkable scientific programmer Donald Knuth.)

Solution 4

Good question!

Undefined does not mean it's random. Think about it, the values you'd get in global uninitialized variables were left there by the system or your/other applications running. Depending what your system does with no longer used memory and/or what kind of values the system and applications generate, you may get:

  1. Always the same.
  2. Be one of a small set of values.
  3. Get values in one or more small ranges.
  4. See many values dividable by 2/4/8 from pointers on 16/32/64-bit system
  5. ...

The values you'll get completely depend on which non-random values are left by the system and/or applications. So, indeed there will be some noise (unless your system wipes no longer used memory), but the value pool from which you'll draw will by no means be random.

Things get much worse for local variables because these come directly from the stack of your own program. There is a very good chance that your program will actually write these stack locations during the execution of other code. I estimate the chances for luck in this situation very low, and a 'random' code change you make tries this luck.

Read about randomness. As you'll see randomness is a very specific and hard to obtain property. It's a common mistake to think that if you just take something that's hard to track (like your suggestion) you'll get a random value.

Solution 5

Many good answers, but allow me to add another and stress the point that in a deterministic computer, nothing is random. This is true for both the numbers produced by an pseudo-RNG and the seemingly "random" numbers found in areas of memory reserved for C/C++ local variables on the stack.

BUT... there is a crucial difference.

The numbers generated by a good pseudorandom generator have the properties that make them statistically similar to truly random draws. For instance, the distribution is uniform. The cycle length is long: you can get millions of random numbers before the cycle repeats itself. The sequence is not autocorrelated: for instance, you will not begin to see strange patterns emerge if you take every 2nd, 3rd, or 27th number, or if you look at specific digits in the generated numbers.

In contrast, the "random" numbers left behind on the stack have none of these properties. Their values and their apparent randomness depend entirely on how the program is constructed, how it is compiled, and how it is optimized by the compiler. By way of example, here is a variation of your idea as a self-contained program:

#include <stdio.h>

notrandom()
{
        int r, g, b;

        printf("R=%d, G=%d, B=%d", r&255, g&255, b&255);
}

int main(int argc, char *argv[])
{
        int i;
        for (i = 0; i < 10; i++)
        {
                notrandom();
                printf("\n");
        }

        return 0;
}

When I compile this code with GCC on a Linux machine and run it, it turns out to be rather unpleasantly deterministic:

R=0, G=19, B=0
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255
R=130, G=16, B=255

If you looked at the compiled code with a disassembler, you could reconstruct what was going on, in detail. The first call to notrandom() used an area of the stack that was not used by this program previously; who knows what was in there. But after that call to notrandom(), there is a call to printf() (which the GCC compiler actually optimizes to a call to putchar(), but never mind) and that overwrites the stack. So the next and subsequent times, when notrandom() is called, the stack will contain stale data from the execution of putchar(), and since putchar() is always called with the same arguments, this stale data will always be the same, too.

So there is absolutely nothing random about this behavior, nor do the numbers obtained this way have any of the desirable properties of a well-written pseudorandom number generator. In fact, in most real-life scenarios, their values will be repetitive and highly correlated.

Indeed, as others, I would also seriously consider firing someone who tried to pass off this idea as a "high performance RNG".

Share:
20,932
ggrr
Author by

ggrr

Updated on July 08, 2022

Comments

  • ggrr
    ggrr almost 2 years

    I know the uninitialized local variable is undefined behaviour(UB), and also the value may have trap representations which may affect further operation, but sometimes I want to use the random number only for visual representation and will not further use them in other part of program, for example, set something with random color in a visual effect, for example:

    void updateEffect(){
        for(int i=0;i<1000;i++){
            int r;
            int g;
            int b;
            star[i].setColor(r%255,g%255,b%255);
            bool isVisible;
            star[i].setVisible(isVisible);
        }
    }
    

    is it that faster than

    void updateEffect(){
        for(int i=0;i<1000;i++){
            star[i].setColor(rand()%255,rand()%255,rand()%255);
            star[i].setVisible(rand()%2==0?true:false);
        }
    }
    

    and also faster than other random number generator?

  • Potatoswatter
    Potatoswatter almost 9 years
    There's no such thing as an "uninitialized register" on any architecture. The machine doesn't understand initialization or lack thereof; it's a language construct.
  • tbleher
    tbleher almost 9 years
    @Potatoswatter: Itanium registers can contain NaT (Not a Thing) which in effect is an "uninitialized register". On Itanium, reading from a register when you haven't written to it may abort your program (read more on it here: blogs.msdn.com/b/oldnewthing/archive/2004/01/19/60162.aspx). So there is a good reason why reading uninitialized values is undefined behaviour. It's also probably one reason why Itanium is not very popular :)
  • meaning-matters
    meaning-matters almost 9 years
    Very good point! It may be a pragmatic trick, but indeed one that requires luck.
  • Potatoswatter
    Potatoswatter almost 9 years
    @tbleher Initializing something to a trap representation is not the same as no initialization. Hardware just goes about its business of shuffling bits in a deterministic fashion. But, point taken.
  • usr
    usr almost 9 years
    I really object to the "it kind of works" notion. Even if it was true today, which it is not, it might change at any time due to more aggressive compilers. The compiler can replace any read with unreachable() and delete half of your program. This does happen in practice as well. This behavior did completely neutralize the RNG in some Linux distro I believe.; Most answers in this question seem to assume that an uninitialized value behaves like a value at all. That's false.
  • Shafik Yaghmour
    Shafik Yaghmour almost 9 years
    Also, I'd fire you seems a rather silly thing to say, assuming good practices this should be caught at code review, discussed and should never happen again. This should definitely be caught since we are using the correct warning flags, right?
  • Shafik Yaghmour
    Shafik Yaghmour almost 9 years
    I have to agree with @usr here, you can see my live godbolt examples in my answer, at least for clang this does not kind'of work.
  • usr
    usr almost 9 years
    @CaptainCodeman UB means it can do anything at all including inserting code. It literally can format your disk. In practice other nasty things happen.
  • usr
    usr almost 9 years
    You're making some strong predictions about what will happen although none of that is guaranteed due to UB. It also is not true in practice.
  • Shafik Yaghmour
    Shafik Yaghmour almost 9 years
    This is going to depend in your compiler which is expected with undefined behavior, as we can see from my answer clang today will not do what they want.
  • Michael
    Michael almost 9 years
    But in this case just the value is undefinied, isn’t it? I mean, it’s not like the whole behaviour of the program is suddenly undefinied just because you use an unitialized variable somewhere.
  • Tom Tanner
    Tom Tanner almost 9 years
    @Michael Actually, it is. If a program has undefined behaviour at any point, the compiler can optimise your program in a way that affects code preceding that invoking the undefined behaviour. There are various articles and demonstrations of how mind boggling this can get Here's quite a good one: blogs.msdn.com/b/oldnewthing/archive/2014/06/27/10537746.asp‌​x (which includes the bit in the standard that says all bets are off if any path in your program invokes UB)
  • Steve Jessop
    Steve Jessop almost 9 years
    "if any combination of bytes will be particularly vexatious, reading it will always yield that pattern of bytes" -- until you code to cope with that pattern, at which point it is no longer vexatious and a different pattern will be read in future.
  • supercat
    supercat almost 9 years
    @SteveJessop: Precisely. My line about development vs production was intended to convey a similar notion. Code shouldn't care about what's in uninitialized memory beyond a vague notion of "Some randomness might be nice". If program behavior is affected by the contents of one piece of uninitialized memory, the contents of pieces that are acquired in future may in turn be affected by that.
  • cmaster - reinstate monica
    cmaster - reinstate monica almost 9 years
    There is absolutely no luck involved. If the compiler does not optimize the undefined behavior away, the values that you get will be perfectly deterministic (= depend entirely on your program, its inputs, its compiler, the libraries it uses, the timing of its threads if it has threads). The problem is that you can't reason about these values since they depend on implementation details.
  • cmaster - reinstate monica
    cmaster - reinstate monica almost 9 years
    That OpenSSL used this method as an entropy input does not say that it was any good. After all, the only other entropy source they used was the PID. Not exactly a good random value. From someone who relies on such a bad entropy source, I won't expect good judgement on their other entropy source. I just hope, the people who currently maintain OpenSSL are brighter.
  • Jay
    Jay almost 9 years
    The "linear congruential" generator that you present is good for simple applications, but only for non-cryptographic applications. It is possible to predict its behavior. See for example "Deciphering a linear congruential encryption" by Don Knuth himself (IEEE Transactions on Information Theory, Volume 31)
  • supercat
    supercat almost 9 years
    A lot depends on whether the purpose of a compiler is help programmers produce executable files that meet domain requirements, or whether the purpose is to produce the most "efficient" executable whose behavior will be consistent with the minimal requirements of the the C Standard, without regard for whether such behavior will serve any useful purpose. With regard to the former objective, having code use some arbitrary initial values for r,g,b, or triggering a debugger trap if practical, would be more useful than turning the code into a nop. With regard to the latter objective...
  • supercat
    supercat almost 9 years
    ...an optimal compiler should determine what inputs would cause the above method to execute, and eliminate any code which would only be relevant when such inputs are received.
  • Mike McMahon
    Mike McMahon almost 9 years
    @Jay compared to an unitialized variable for quick and dirty? This is a much better solution.
  • supercat
    supercat almost 9 years
    @usr: Prior to roughly 2010, the effect of Undefined Behavior could be viewed as equivalent to overwriting all of RAM (including the stack) with arbitrary values and jumping to a random location, but on a Harvard-architecture machine with non-writable code store (common in the embedded-systems world) a program could offer certain guarantees about its behavior even under such circumstances. Indeed, some systems require that it be impossible for a program to trigger certain hardware actions even if a power glitch causes RAM to get trashed. Since 2010, however, Undefined Behavior in C...
  • supercat
    supercat almost 9 years
    ...goes beyond even the "arbitrarily rewrite everything in memory and jump to an arbitrary address" fault model.
  • usr
    usr almost 9 years
    @supercat what happened in 2010? More aggressive compilers?
  • supercat
    supercat almost 9 years
    @tbleher: When the Standard was written, saying "Reading an uninitialized variable may yield an arbitrary value, or may trap in a fashion which is outside the jurisdiction of this Standard; the documentation for an implementation must specify which" would have been, from a requirements standpoint, equivalent to saying that the standard imposes no requirements, save only for the fact that the lack of anything in a compiler's documentation regarding its behavior would be equivalent to specifying that it the compiler may interpret the action as triggering a trap which frees the system from...
  • supercat
    supercat almost 9 years
    ...the jurisdiction of the Standard. I really doubt that the original authors of the Standard envisioned the idea, which seems to have emerged around 2010, that a compiler should try to identify inputs which would result in a program to invoke Undefined Behavior, and then eliminate any code which would be irrelevant unless such inputs were received. I'm not quite clear that anyone actually benefits from granting compilers such freedom, since imposing somewhat tighter rules on how compilers must interpret most forms of UB would make it possible for many programs to meet requirements...
  • Admin
    Admin almost 9 years
    +(int)(PI/3) for the compiler output examples; a real-life example that UB is, well, UB.
  • supercat
    supercat almost 9 years
    ...more efficiently than is possible using only actions defined by the current Standard. Many programs have two requirements: (1) When given valid input, produce strictly-defined output; (2) Don't launch nuclear missiles when given invalid input. If (with int being 32 bits) x<<y were defined as arbitrarily yielding x<<(y & 31) or y & ~31 ? 0 : x<<y, code which would be happy with either value when fed inputs that caused y to exceed 31 could let machines where one operation would be cheaper than the other use the cheaper operation. Making it UB means compilers can't get that freedom.
  • supercat
    supercat almost 9 years
    In the absence of an operating system with an interrupt-handling stack separate from the application stack, luck may well be involved, since interrupts will frequently disturb the contents of memory slightly beyond the current stack contents.
  • sfdcfox
    sfdcfox almost 9 years
    Utilizing UB effectively used to be the trademark of an excellent hacker. This tradition has gone on for probably 50 years or more now. Unfortunately, computers are now required to minimize the effects of UB because of Bad People. I really enjoyed figuring out how to do cool things with UB machine code or port read/writes, etc I the 90s, when the OS wasn't as capable of protecting the user from themselves.
  • Shafik Yaghmour
    Shafik Yaghmour almost 9 years
    @sfdcfox there is little out there that documents this transition, at least I have not found it and I have read a lot on this topic. It seems unlikely this genie will be put back into the bottle.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE almost 9 years
    @supercat: Your claim that something changed in or around 2010 is unsupported and wrong. There have always been lots of bad things possible from UB, and many/most of the bad things that can happen from UB due to optimizing compiles happen even if the compiler does not go out of its way to break code with UB. There are a very small limited class of UB cases the compiler can detect at compile-time, and therefore would have the freedom to "avoid breaking" (i.e. "do what the programmer expected"), but this blows up and requires very expensive runtime checks in the general case.
  • Theodoros Chatzigiannakis
    Theodoros Chatzigiannakis almost 9 years
    This answer makes it sound as if "invoking undefined behavior is bad in theory, but it won't really hurt you much in practice". That's wrong. Collecting entropy from an expression that would cause UB can (and probably will) cause all the previously collected entropy to be lost. This is a serious hazard.
  • Deduplicator
    Deduplicator almost 9 years
    ... and that's leaving out all the compiler-optimizations which would completely gut that code.
  • supercat
    supercat almost 9 years
    @R..: What is the earliest you know of that anyone has advocated the idea that compilers should endeavor to use UB to make inferences about what inputs a piece of code could receive, and eliminate dead paths based upon such inferences? I'm well aware that compilers have made certain inferences about integer arithmetic going back even before 2000, but all such inferences I know of could be characterized under a computation model I refer to as "Partially-Indeterminate Value". If int is 16 bits, then after unsigned char b=200; int i=b*b;, each use of i as an rvalue could...
  • supercat
    supercat almost 9 years
    ...independently behave as any number (representable or not) which was congruent to 40000 mod 65536. Such behavior could cause confusion for code that expected two's-complement wrapping the Standard generally would generally require when casting a 32-bit int to an int16_t, but if i was used in a fashion such that any number congruent to 40000 mod 65536 would allow the program to meet its requirements, there would be no reason for the programmer to constrain the compiler by e.g. using unsigned rather than int.
  • supercat
    supercat almost 9 years
    @TheodorosChatzigiannakis: Even in cases where it is possible to read uninitialized memory without causing Undefined Behavior (and there are some), and even in cases where a compiler wouldn't be able to tell that code was reading uninitialized memory, there could be a correlation between the values that were read and the existing "entropy collection". If one has a uint64_t seed; and uint8_t arbitrary[1];, coded such that reads of the latter yield an Unspecified value that a compiler can't optimize away, the code seed += arbitrary[0]; could lose a bit of entropy if arbitrary[0] is...
  • supercat
    supercat almost 9 years
    ...always even when seed is even and odd when it is odd [e.g. because its value was copied from storage that had been used in the last computation of seed]. There are times when reading unintialized data might provide a very feeble source of entity to a program at a time when it would otherwise have none available (e.g. if many identical embedded processors in a network being powered on simultaneously and need to establish arbitrary unique ids, timing divergence may eventually work, but any entropy that can be used to start the process may help greatly) but such cases are rare.
  • Shafik Yaghmour
    Shafik Yaghmour almost 9 years
    @TheodorosChatzigiannakis considering my answer came almost six hours after the question was asked I am doing pretty well. With these hot network question it is very hard to overcome first mover advantage. I only added my answer because the existing answers really left out a lot of important points. Most likely the only way I will end up on top is if the OP accepts my answer.
  • supercat
    supercat almost 9 years
    @R..: I would be genuinely interested in seeing evidence that the use of UB to make inferences about program inputs was advocated prior to 2009-2010. My claim that something changed around then is predicated upon my having read articles from around that time period advocating that compilers make use of such inferences, but not having seen evidence of such things having been advocated previously. Also, I'd say that the fact that neither the C89 nor C99 standard makes any mention that the effects of Undefined Behavior can be retroactive, but the C11 standard does, is indicative of a change.
  • Caleth
    Caleth almost 9 years
    @sfdcfox if you were doing it in machine code / assembler, it wasn't undefined behavior (it may have been unconventional behavior).
  • 0xMatthewGroves
    0xMatthewGroves almost 9 years
    "I'd fire you?" What an arrogant thing to say. He was trying to come up with a creative way of solving a problem. Hopefully nobody works for you.
  • supercat
    supercat almost 9 years
    @Caleth: There are many platforms where certain kinds of pointer operations will have useful behaviors even though the Standard imposes no requirements on them. For example, void copy(char *dest, char *src { char ch; int diff = dest-src-1; while((ch = *src++) != 0); src[diff] = ch; } Horrible Undefined Behavior, but if src and dest are known to be close enough, it can yield better code than any defined alternative.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE almost 9 years
    @FrostRocket: People who insist on doing things that are documented as wrong and unsafe for the sake of being "creative" are a liability to a project, whether it's commercial or FOSS. The remark you're replying to may have been unnecessarily inflammatory, but the sentiment that people need to stop doing this or get out is quite valid.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE almost 9 years
    @supercat: What you wrote before ("Prior to roughly 2010, the effect of Undefined Behavior could be viewed as equivalent to overwriting all of RAM ...but...a program could offer certain guarantees about its behavior even under such circumstances") has nothing to do with what you're claiming now, that DCE based on UB was introduced around 2010. Even without UB-based DCE, UB is much more dangerous than you're giving it credit for.
  • supercat
    supercat almost 9 years
    @R..: There are many circumstances where programs which receive invalid input is allowed to produce arbitrary output, subject to relatively few constraints. The way many platforms have historically treated things like integer overflow (e.g. x-y>z might arbitrarily yield 0 or 1 if x-y overflows, but wouldn't launch nukes in any case) made it possible to meet that requirement cheaply, while allowing optimizations on some platforms that would not be possible if the code were rewritten as (int)((unsigned)x-y)>z.
  • supercat
    supercat almost 9 years
    @R..: Do you suppose the pattern (x << y) | (x >> (32-y)) appeared because the authors were fools, or because until recently every compiler for a machine with 32-bit integers would evaluate it correctly? If all instances of that code which are required to work when y==0 were replaced with (x << y) | (x >> (31-y) >> 1) would any useful optimizations be possible that would not have been possible if the standard had been changed to require that x>>32 had to yield either x or 0 at the compiler's option?
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE almost 9 years
    @supercat: I'm not trying to have an argument over the merits declaring certain particular things undefined. That is a huge discussion and it's off-topic here. What I am saying is that you claim that UB was relatively safe before 2010, and that aggressive compiler optimizations using UB for DCE made it unsafe, is unsupported, and in my view, false. Non-obvious (to the compiler) cases of invalid aliasing, sequence point violations, access to uninitialized objects, etc. can all lead to things like inconsistent evaluation of expressions at different points, which is very unsafe.
  • Damian Yerrick
    Damian Yerrick almost 9 years
    @supercat Or its purpose could be C. to produce efficient executables in compliance with the Standard while helping the programmer find places where compliance may not be useful. Compilers can meet this compromise purpose by emitting more diagnostics than the Standard requires, such as GCC's -Wall -Wextra.
  • Caleth
    Caleth almost 9 years
    If you have a specific assembly in mind, then use that and don't write uncompliant C. Then everyone will know you are using a specific non-portable trick. And it's not Bad People who mean you can't use UB, it's Intel etc doing their tricks on chip.
  • Matthieu M.
    Matthieu M. almost 9 years
    This answer, even though thoroughly upvoted, is unfortunately factually incorrect. I would advise you to read Shafik's answer, as well as the What every C programmer should know about undefined behavior to get a better grasp on the subtleties of undefined behavior (the short of it: the compiler expects that undefined behavior never happens, and is allowed to delete any branch of code that would trigger it).
  • Matthieu M.
    Matthieu M. almost 9 years
    You might also want to link to What every C programmer should know about Undefined Behavior. Now, let's wish this answer takes the top...
  • imallett
    imallett almost 9 years
    #### I have edited this answer to make it (more) clear that UB is not something anyone except compiler writers are equipped to deal with. ####
  • user253751
    user253751 almost 9 years
    @R.. They can. Could they before approximately 2010?
  • Dewi Morgan
    Dewi Morgan almost 9 years
    That the values are undefined does not mean that the behavior of surrounding code is undefined. No compiler should noop that function. The two function calls, whatever inputs they're given, absolutely MUST be called; the first MUST be called with three numbers between 0 and 255, and the second MUST be called with either a true or false value. A "good optimizing compiler" could optimize the function params to arbitrary static values, getting rid of the variables completely, but that's as far as it could go (well, unless the functions themselves could be reduced to noops on certain inputs).
  • Jules
    Jules almost 9 years
    @DewiMorgan - as the functions called are of the "set this parameter" type, they almost certainly do reduce to noops when the input is the same as the current value of the parameter, which the compiler is free to assume is the case.
  • Jack Aidley
    Jack Aidley almost 9 years
    rand() is not fit for purpose and should be entirely deprecated, in my opinion. These days you can download freely licensed and vastly superior random number generators (e.g Mersenne Twister) which are very nearly as fast with the greatest of ease so there's really no need to continue using the highly defective rand()
  • James Thorpe
    James Thorpe almost 9 years
    "Also, I'd fire you." Perhaps. But we're in Undefined Behaviour land here. You might end up giving him a raise :)
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE almost 9 years
    @immibis: I don't have a test case in front of me right now, but conceptually there's no reason they shouldn't be able to happen. All that should be needed to trigger inconsistent evaluations related to illegal aliasing is perturbing things enough to affect register spills, and this kind of issue is going to happen on any compiler that's not so idiotic as to load/store on every single access in the abstract machine (i.e. treat all objects as volatile).
  • supercat
    supercat almost 9 years
    @R..: If an variable is written one way and read another, in violation of the Strict Aliasing Rule (or any of the aliasing rules which some compilers could optionally enable prior to C99) a traditional behavioral model would imply that a compiler could create a separate object for each mode of accessing that variable, and arbitrarily copy the value of any such object at any time to any of the other objects associated with the same variable (in addition to copying the values at certain well-defined times). In many cases, this could be disastrous, but if an algorithm would work correctly...
  • supercat
    supercat almost 9 years
    ...for any combination of read/write visibility consistent with the model [perhaps better for some such combinations than others], coding the algorithm in a fashion that gives a compiler flexibility would (on a compiler which promised not to do anything beyond what such a model would allow) enable optimizations that would not be possible if code couldn't request any actions beyond those permitted by the Standard.
  • supercat
    supercat almost 9 years
    @immibis: Compilers have for a long time made certain inferences about the behavior of integer arithmetic in loops which might at first glance appear to violate causality, but could instead be described in terms of linear transformations. A 16-bit compiler might rewrite int i,n; long total; ... for (i=n; i>=0; i--) total+=i*3000; a 16-bit compiler might assume an execution model where integers have enough extra bits to avoid overflow and transform the code to for (i=n*3000; i>=0; i-=3000) total+=i;, which might end up executing the loop 0 times when e.g. n is 11. On the other hand,...
  • supercat
    supercat almost 9 years
    ...there's a difference between saying that certain kinds of transforms which would preserve semantics under such an execution model may be performed even if the platform uses a different execution model, and saying that any form of UB throws all behavioral requirements out the window.
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE almost 9 years
    @supercat: My argument is not that there don't exist specific invalid programs for which you can nonetheless prove theorems about their behavior given knowledge of a specific compiler's transformations, but rather that, in general, programs with undefined behavior were unsafe even prior to 2010. In particular, whenever the same expression can evaluate differently at different points due to UB, you can end up with bypassed bounds checks, etc.
  • supercat
    supercat almost 9 years
    @R..: On 16-bit machines, under a long-standing common behavioral model, given int x=32767; int y=x+1; the value of y might sometimes behave as +32768 and sometimes as -32768; it might also sometimes behave as any other numbers congruent to 32768 mod 65536. Because bounds-checks like if (y >= 0 && y < 50) foo(y); could very well get bypassed, code needed to ensure that any values which might have overflowed were only used in situations where such inconsistent behavior couldn't pose a danger. In cases where any value that was congruent to the arithmetically-correct one mod 65536...
  • supercat
    supercat almost 9 years
    ...would suffice to meet program requirements, however, code which only needed to run on platforms which fit that behavioral model could offer more optimization opportunities to compilers which would abide by that model, than would code which was used only Standard-defined behaviors.
  • Marwan Burelle
    Marwan Burelle almost 9 years
    rand() has another terrible issue: it uses a kind of lock, called inside threads it slows down your code dramatically. At least, there's a reentrant version. And if you use C++11, the random API provides everything you need.
  • user541686
    user541686 almost 9 years
    I believe the "go back in time" argument is subtly wrong -- the compiler can only "go back" in time if the code path leading to UB has no observable side effects, but in that case, you wouldn't be able to tell if it "went back" or not. If you do anything observable inside ring_bell() (e.g. writing to a file), that action must actually happen before UB is invoked. The compiler cannot assume at compile-time that it would be able to "undo" this action at run-time, because there is no guarantee it will be able to -- the call is external and, for example, may never even return.
  • Kat
    Kat almost 9 years
    Although rand() must be seeded (otherwise it won't look remotely random). Since that typically uses the time, you'd have a sys call in there as well. Still extremely acceptable performance in virtually any case, though (and seeding is a one time thing).
  • 500 - Internal Server Error
    500 - Internal Server Error almost 9 years
    Aside, and probably a stupid question with a simple answer: Why are operations that are explicitly Undefined Behavior according to the standard not simply disallowed completely?
  • Shafik Yaghmour
    Shafik Yaghmour almost 9 years
    @500-InternalServerError because they may not be easily detectable or may not be detectable at all in the general case and therefore there would be no way to disallow them. Which is different then violations of the grammar which can be detected. We also have ill-formed and ill-formed no diagnostic required which in general separates poorly formed programs that could be detected in theory from those that in theory could not be reliably detected.
  • Shafik Yaghmour
    Shafik Yaghmour almost 9 years
    @500-InternalServerError the second reason is that it allows compilers to support extensions ... or support the behavior on platforms that are not problematic.
  • jcoder
    jcoder almost 9 years
    To be fair he didn't ask if it was a good random number generator. He asked if it was fast. Well, yes, it's probably the fasted., But the results will not be very random at all.
  • Konrad Rudolph
    Konrad Rudolph almost 9 years
    “in a deterministic computer, nothing is random” — This isn’t actually true. Modern computers contain all kinds of sensors that allow you to produce true, unpredictable randomness without separate hardware generators. On a modern architecture, the values of /dev/random are often seeded from such hardware sources, and are in fact “quantum noise”, i.e. truly unpredictable in the best physical sense of the word.
  • Piotr Siupa
    Piotr Siupa almost 9 years
    Good point. A result strongly depends on where this "random" number generator is called in the code. It is rather unpredictable than random.
  • Viktor Toth
    Viktor Toth almost 9 years
    But then, that's not a deterministic computer, is it? You are now relying on environmental input. In any case, this takes us well beyond the discussion of a conventional pseudo-RNG vs. "random" bits in uninitialized memory. Also... look at the description of /dev/random to appreciate just how far out of their way the implementers went to ensure that the random numbers are cryptographically secure... precisely because the input sources are not pure, uncorrelated quantum noise but rather, potentially highly correlated sensor readings with only a small degree of randomness. It's pretty slow, too.
  • Hunsu
    Hunsu almost 9 years
    It doesn't answer the question which is 'Is uninitialized local variable the fastest random number generator?'
  • supercat
    supercat almost 9 years
    @500-InternalServerError: Consider effect of unsigned int x,y; x=getch(); y=getch(); printf("%u",x<<y); on a 32-bit machine. Some machines would generate code that would shift x left by y times. Others would use a shift-left instruction that would compute x<<(y & 31). Some might trigger a hardware trap if y exceeds 32. If the Standard had mandate one or the other of the first two behaviors, that would require many compilers to generate extra code to handle the y>=32 case which would slow down performance even when y was in the 0 to 31 range. Even worse, if...
  • supercat
    supercat almost 9 years
    ...e.g. the hardware could perform the "shift left N times" behavior in a single cycle, and a program needed the "shift left N times" behavior but the Standard had mandated the "shift left (N & 31)" times, a programmer would have to write y >= 32 ? 0 : x << y which a typical compiler would then most likely render as a a comparison, conditional jump, bit-mask, and shift--quite possibly taking four times as long as a "shift x left by y" instruction. The effect was that programmers who knew that their code would only have to run on a platform whose behavior matched their needs could...
  • supercat
    supercat almost 9 years
    ...take advantage of that. Someone writing code for a platform which didn't match their needs would have to write code to deal with that, but requiring that the user write code to handle the y>=32 case in situations where the platform behavior wasn't acceptable was better than requiring compilers for about half the platforms in existence to generate such code even when it would often be useless and sometimes downright counterproductive.
  • supercat
    supercat almost 9 years
    I find the linked Defect Report 451 and the committee's response interesting. Semantics where the first read of uninitialized data may read any value, but all subsequent reads must yield the same value (absent intervening writes) make it possible for programmers who know that an algorithm will work with an array holding any combination of contents to refrain from needless initialization. I'm hard-pressed to imagine any cases where weakening the semantics to the extent described would allow any optimizations in any programs written to behave meaningfully under those semantics.
  • supercat
    supercat almost 9 years
    There are many situations where code receives a number which is small enough to use as an array index, and needs to compute from that a value which can be readily verified as correct. Fetching a value from an uninitialized array, determining whether it's correct, and--if not--computing the value and storing the computed value in the array, may sometimes be more efficient than initializing the array, especially in situations where only a tiny fraction of array elements are ever used. Having the elements read indeterminate rather than unspecified, however...
  • supercat
    supercat almost 9 years
    ...would totally break such an algorithm. BTW, I'm also curious why the authors of the response think a description of padding bits as "unspecified" implies that "unspecified" and "indeterminate" are the same thing? If long is 40 bits plus 24 bits of padding at the end, given union { long l; unsigned char b[8];} u, making them unspecified would say that each run of u.l=123; printf("(%d %d)", u.b[7], u.b[7]); must output some number 0-255 twice, but each run could output a different number. Making it indeterminate would make that code UB. Is the latter even remotely reasonable?
  • Sql Surfer
    Sql Surfer almost 9 years
    6 ... You will get different "randomness" in Debug and Release. Undefined means you are doing it wrong.
  • underscore_d
    underscore_d over 8 years
    I find it very hard to believe a compiler can decide your code is doing something silly and remove it. I'd expect it only to optimise away unused code, not inadvisable code. Do you have a reproducible test case? Either way, the recommendation of UB is dangerous. Plus, GCC isn't the only competent compiler around, so it's unfair to single it out as "modern".
  • Glenn Teitelbaum
    Glenn Teitelbaum over 8 years
    For comparison, see the results in this answer: stackoverflow.com/a/31836461/2963099
  • fche
    fche about 7 years
    Right. I'd abbreviate or summarize with "undefined" != "arbitrary" != "random". All these kinds of "unknownness" have different properties.
  • Shafik Yaghmour
    Shafik Yaghmour over 6 years
    Dangerous Optimizations and the Loss of Causality now found here also see the video
  • supercat
    supercat almost 6 years
    @Mehrdad: A compiler may reorder UB across side-effects if is allowed to assume that the code performing the side-effects will neither block perpetually nor cause an abnormal program termination in implementation-defined manner (such as attempting to send data to a broken pipe). Unfortunately, few implementations bother to document which operations they do or do not recognize as possibly disrupting program execution, because it so seldom matters.
  • supercat
    supercat almost 6 years
    Almost any code that uses aggregates violates 6.5p7 and consequently invokes UB. The authors of the Standard presumably expected that any quality implementation should recognize at least some situations where an lvalue of one type is used to derive one of another, and thought it so obvious that the constructs "aggregate.member" and "aggregatePtr->member" should be supported by all usable-quality implementations that they shouldn't need explicit mention.
  • user541686
    user541686 almost 6 years
    @supercat: I was saying the problem is that "if" cannot be satisfied in reality, because the compiler has no idea what the system that actually ends up running the program will do as a result of that side effect (the program does not contain enough information about this! it's the outside world and unobservable by definition!) and hence no idea of how long that action will take (if finite) or whether it is even possible to undo the result (if any). So it's forced to make the side effect occur and cannot optimize UB across it.
  • supercat
    supercat almost 6 years
    @Mehrdad: Implementations are not required to specify any particular relation between anything that happens in the abstract machine and anything that happens in the real world. An implementation couldn't be useful for much of anything if it couldn't guarantee some kind of meaningful relationship, but the authors of the Standard state in the rationale that they don't think it's necessary to avoid the possibility of conforming-but-useless implementations.
  • user541686
    user541686 almost 6 years
    @supercat: We were talking about a compiler (takes in source code, spits out some translation to be run later on some implementation), not the actual abstract "implementation" itself (takes in source code, executes it in some abstract fashion). You can't call it a compiler if it's the final entity executing the code; you'd call it an "interpreter" or something else.
  • supercat
    supercat almost 6 years
    @Mehrdad: The Standard can't say anything meaningful about what happens if a compiler's output is run in an environment that doesn't meet requirements. A conforming compiler could specify that it is only suitable for use in environments that promise that every facially-valid call to fwrite will return, and that it makes no promises about what will happen if a call to fwrite fails to do so. A compiler that specifies things in such fashion would be free to reorder operations that may invoke UB across calls to fwrite(), even though fwrite() has observable side-effects.
  • user541686
    user541686 almost 6 years
    @supercat: Yes, and I don't think that contradicts anything I said. I only ever talked about program a compiler can produce; it doesn't even make sense to talk about "optimizing out" code when talking about purely the abstract machine. You need intermediate introspection. Indeed a compiler can produce output that tells the run-time system "here's what you do if you know a priori what fwrite can do, and here's what you do if not", and the implementation can use that if that's the case. But we know that's vacuous for our implementations, so we only care about the "if not" case.
  • user541686
    user541686 almost 6 years
    @supercat: As a side note, the standard doesn't have to say anything about compiler output explicitly. Whatever the compiler produces must be logically implied by the original program if executed with valid inputs, and the standard does have something to say about that, and that is very much a nontrivial constraint. (But again, note that I was talking about the compiler the whole time, not the final abstract machine: "optimizing out" has no meaning unless you can assume something concrete that you can introspect.)
  • supercat
    supercat almost 6 years
    @Mehrdad: Consider the function void test(FILE *f, unsigned x) { for (int i=0; i<10; i++) { char temp= !x; fwrite(&temp, 1, 1, f); temp = 1/x; fwrite(&temp, 1, 1, f); } }. If a compiler specifies that it is only suitable for environments where a facially-valid fwrite will always return, it could hoist the computation of 1/x outside the loop. Such an optimization would often be useful, but could cause UB to time-travel across the first fwrite call. Even if the generated code for 1/x doesn't actually trap when x==0 (e.g. a compiler could replace 1/x with x==1)...
  • supercat
    supercat almost 6 years
    ...a compiler could assume that the expression !x won't be reachable with x==0.
  • user541686
    user541686 almost 6 years
    @supercat: Yes, I believe that statement is consistent with what I just said, and that a compiler would not be able to actually make this optimization because of what I also just explained.
  • supercat
    supercat almost 6 years
    @Mehrdad: I'm not clear why you think a compiler would be unable to make such an optimization. I'm not sure any make it with fwrite, but if one replaced the fwrite with a store to a volatile object, gcc would make that optimization even though on many platforms a volatile store may have externally-observable effects.
  • user541686
    user541686 almost 6 years
    @supercat: Is GCC legally allowed to assume that a write to a volatile guaranteed to complete in finite time in whatever implementation GCC is targeting? If it is, then that may be where I'm mistaken (I've never read anything claiming this). If not, and if GCC actually does this, I would say it is non-conforming in doing so, with the reasons being everything I've stated. Could you link me to an example supporting your claim that GCC will do this? I'm having trouble reproducing it.
  • supercat
    supercat almost 6 years
    See godbolt.org/g/4EEJ57 for a simple example. The code performs the division before the loop. As to whether such behavior is "allowed", I think the Standard would view the sequencing relationships between volatile accesses and things that aren't volatile accesses as a Quality of Implementation issue, and makes no attempt to require that implementations be of sufficient quality to be usable for any particular purpose.
  • user541686
    user541686 almost 6 years
    @supercat: Interesting, thanks. Yes, I see that as an illegal optimization. I don't see how the write to the volatile zzq can be inhibited just because a user specified x = 0. What if a write to zzq were supposed to result in a CPU reboot before the instruction completed? You could probably make actual hardware (or if not, an emulator...) that did this with an x86 ISA. The only justification I see here is if x86 prohibits such a thing, which, if it does, I'm not aware of it.
  • supercat
    supercat almost 6 years
    @Mehrdad: The Standard gives implementations extremely broad latitude with regard to how they handle volatile. I think the intention is that if there would be any imaginable circumstance where it would be useful for an implementation targeted toward a particular purpose to behave in some particular fashion, an implementation targeted toward that purpose should be allowed to do so if it documents the behavior. Not unreasonable, if compiler writers recognize that the Standards' allowances for unusual behaviors in no way imply an endorsement of them.
  • Brian Vandenberg
    Brian Vandenberg almost 6 years
    Global variables are guaranteed to have a defined value, whether explicitly initialized or not. This is definitely true in C++ and in C as well.
  • Ben Voigt
    Ben Voigt about 3 years
    "That the values are undefined does not mean that the behavior of surrounding code is undefined" It absolutely does mean that in both C and C++. If you don't like that, using a different language.