Multiple nested for loops vs single for loop

c++ for-loop nested timing

11,068

Solution 1

The compiler won't "optimize" the loops away when the testX variable is used somewhere later in the code. When I just add one line to the code to output testX, the results are as follows:

single for loop: 1.218 ms
nested for loop: 1.218 ms

This pretty much shows that the compiler converts the nested loop into a single loop, whenever possible. The loop index can be used to prevent that optimisation:

Modifying the code this way

for( int i = 0; i < 27000000; i++ ){
    testX += i;
}

and

for( int x = 0; x < 300; x++ ){
    testX += x;
    for( int y = 0; y < 300; y++ ){
        testX += y;
        for( int z = 0; z < 300; z++ ){
            testX += z;
        }
    }
}

will add a little overhead to the nested loop but the execution time shows

single for loop: 1.224 ms
nested for loop: 1.226 ms

The times given here are averaged over 30.000 loop runs.

Note: The testX += x; only contributes 1 in 90000 and the testX += x; only contributes 1 in 300. Thus the two sections above remain comparable.

Nested loops are not much slower than single loops but your observation that they are faster is not true.

And: The times you show are about 40 times the times I observed. I'd suggest to carefully inspect the compiler settings since I ran the test on a medium speed hardware. Maybe the results of glfwGetTime() are questionable and this is the main reason for your question. Have you tried to use another timing scheme?

Edit: In order to prevent compiler optimisation the loop limit can be choosen to be non constant:

int lmt = rand() % 1 + 300;      // random value 300 or 301 
int big_lmt = lmt * lmt * lmt;   // random value 27000000 or 27270901

for( int i = 0; i < big_lmt; i++ ){
    testX += i;
}

for( int x = 0; x < lmt; x++ ){
    testX += x;
    for( int y = 0; y < lmt; y++ ){
        testX += y;
        for( int z = 0; z < lmt; z++ ){
            testX += z;
        }
    }
}

This avoids compiler predictability.

Results (for a lmt = 300 case to be comparable):

single for loop: 1.213 ms
nested for loop: 1.216 ms

Result:

Nested loops are not faster than single loops.

Solution 2

If you don't use your for variables (x,y,z) inside your for loop, a smart compiler could (and should) convert your second form in a single for loop without nesting. Unless you prevent such compiler optimization removing static predictability by having the user input the x,y,z values at runtime from stdin, or reading from some stream, etc..

Furthermore, if you don't do something with your testX variable (say, printing it to stdout), a smart compiler could (and should) optimize it away, i.e. remove the dead code altogether.

So what I'm saying is that the benchmark, as it is right now, is somehow ill-formed.

11,068

Author by

Jamie Syme

Updated on June 04, 2022

Comments

Jamie Syme almost 2 years
I was doing a little speed testing in c++ (MSVS) and got a strange result. I was testing the speed of using one for loop vs multiple nested for loops. Here is the code:
```
double testX = 0;
// Single loop executes in roughly 0.04 seconds
for( int i = 0; i < 27000000; i++ ){
    testX += 1;
}

// Nested loop executes in roughly 0.03 seconds
for( int x = 0; x < 300; x++ ){
    for( int y = 0; y < 300; y++ ){
        for( int z = 0; z < 300; z++ ){
            testX += 1;
        }
    }
}
```
As you can see, the speed difference is fairly obvious. I have run this many times, and those are the average times I am seeing (these are timed using glfwGetTime()).

So my question is: why? Is my test method inadequate? Am I using too few loops? I have tried searching google, and the only similar question I could find related his problem to cache coherency, but since these are empty for loops, I didn't think it would really have an effect.

Any help is appriciated :)

Edit: Thanks to the comments, I realized that using empty for loops probably wasn't the best way of testing things... So I have updated my code to do some (very) simple operations to a double. I also am compiling in release mode. However, though the two methods are a lot more similar in times, the second method is still slightly faster.

And yes, this is all the test code (minus the timing/output functions, but those aren't exactly specific to the question).