If statement vs if-else statement, which is faster?
Solution 1
TL;DR: In unoptimized code, if
without else
seems irrelevantly more efficient but with even the most basic level of optimization enabled the code is basically rewritten to value = condition + 5
.
I gave it a try and generated the assembly for the following code:
int ifonly(bool condition, int value)
{
value = 5;
if (condition) {
value = 6;
}
return value;
}
int ifelse(bool condition, int value)
{
if (condition) {
value = 6;
} else {
value = 5;
}
return value;
}
On gcc 6.3 with optimizations disabled (-O0
), the relevant difference is:
mov DWORD PTR [rbp-8], 5
cmp BYTE PTR [rbp-4], 0
je .L2
mov DWORD PTR [rbp-8], 6
.L2:
mov eax, DWORD PTR [rbp-8]
for ifonly
, while ifelse
has
cmp BYTE PTR [rbp-4], 0
je .L5
mov DWORD PTR [rbp-8], 6
jmp .L6
.L5:
mov DWORD PTR [rbp-8], 5
.L6:
mov eax, DWORD PTR [rbp-8]
The latter looks slightly less efficient because it has an extra jump but both have at least two and at most three assignments so unless you really need to squeeze every last drop of performance (hint: unless you are working on a space shuttle you don't, and even then you probably don't) the difference won't be noticeable.
However, even with the lowest optimization level (-O1
) both functions reduce to the same:
test dil, dil
setne al
movzx eax, al
add eax, 5
which is basically the equivalent of
return 5 + condition;
assuming condition
is zero or one.
Higher optimization levels don't really change the output, except they manage to avoid the movzx
by efficiently zeroing out the EAX
register at the start.
Disclaimer: You probably shouldn't write 5 + condition
yourself (even though the standard guarantees that converting true
to an integer type gives 1
) because your intent might not be immediately obvious to people reading your code (which may include your future self). The point of this code is to show that what the compiler produces in both cases is (practically) identical. Ciprian Tomoiaga states it quite well in the comments:
a human's job is to write code for humans and let the compiler write code for the machine.
Solution 2
The answer from CompuChip shows that for int
they both are optimized to the same assembly, so it doesn't matter.
What if value is a matrix ?
I will interpret this in a more general way, i.e. what if value
is of a type whose constructions and assignments are expensive (and moves are cheap).
then
T value = init1;
if (condition)
value = init2;
is sub-optimal because in case condition
is true, you do the unnecessary initialization to init1
and then you do the copy assignment.
T value;
if (condition)
value = init2;
else
value = init3;
This is better. But still sub-optimal if default construction is expensive and if copy construction is more expensive then initialization.
You have the conditional operator solution which is good:
T value = condition ? init1 : init2;
Or, if you don't like the conditional operator, you can create a helper function like this:
T create(bool condition)
{
if (condition)
return {init1};
else
return {init2};
}
T value = create(condition);
Depending on what init1
and init2
are you can also consider this:
auto final_init = condition ? init1 : init2;
T value = final_init;
But again I must emphasize that this is relevant only when construction and assignments are really expensive for the given type. And even then, only by profiling you know for sure.
Solution 3
In pseudo-assembly language,
li #0, r0
test r1
beq L1
li #1, r0
L1:
may or may not be faster than
test r1
beq L1
li #1, r0
bra L2
L1:
li #0, r0
L2:
depending on how sophisticated the actual CPU is. Going from simplest to fanciest:
With any CPU manufactured after roughly 1990, good performance depends on the code fitting within the instruction cache. When in doubt, therefore, minimize code size. This weighs in favor of the first example.
With a basic "in-order, five-stage pipeline" CPU, which is still roughly what you get in many microcontrollers, there is a pipeline bubble every time a branch—conditional or unconditional—is taken, so it is also important to minimize the number of branch instructions. This also weighs in favor of the first example.
-
Somewhat more sophisticated CPUs—fancy enough to do "out-of-order execution", but not fancy enough to use the best known implementations of that concept—may incur pipeline bubbles whenever they encounter write-after-write hazards. This weighs in favor of the second example, where
r0
is written only once no matter what. These CPUs are usually fancy enough to process unconditional branches in the instruction fetcher, so you aren't just trading the write-after-write penalty for a branch penalty.I don't know if anyone is still making this kind of CPU anymore. However, the CPUs that do use the "best known implementations" of out-of-order execution are likely to cut corners on the less frequently used instructions, so you need to be aware that this sort of thing can happen. A real example is false data dependencies on the destination registers in
popcnt
andlzcnt
on Sandy Bridge CPUs. -
At the highest end, the OOO engine will wind up issuing exactly the same sequence of internal operations for both code fragments—this is the hardware version of "don't worry about it, the compiler will generate the same machine code either way." However, code size still does matter, and now you also should be worrying about the predictability of the conditional branch. Branch prediction failures potentially cause a complete pipeline flush, which is catastrophic for performance; see Why is it faster to process a sorted array than an unsorted array? to understand how much difference this can make.
If the branch is highly unpredictable, and your CPU has conditional-set or conditional-move instructions, this is the time to use them:
li #0, r0 test r1 setne r0
or
li #0, r0 li #1, r2 test r1 movne r2, r0
The conditional-set version is also more compact than any other alternative; if that instruction is available it is practically guaranteed to be the Right Thing for this scenario, even if the branch was predictable. The conditional-move version requires an additional scratch register, and always wastes one
li
instruction's worth of dispatch and execute resources; if the branch was in fact predictable, the branchy version may well be faster.
Solution 4
In unoptimised code, the first example assigns a variable always once and sometimes twice. The second example only ever assigns a variable once. The conditional is the same on both code paths, so that shouldn't matter. In optimised code, it depends on the compiler.
As always, if you are that concerned, generate the assembly and see what the compiler is actually doing.
Solution 5
What would make you think any of them even the one liner is faster or slower?
unsigned int fun0 ( unsigned int condition, unsigned int value )
{
value = 5;
if (condition) {
value = 6;
}
return(value);
}
unsigned int fun1 ( unsigned int condition, unsigned int value )
{
if (condition) {
value = 6;
} else {
value = 5;
}
return(value);
}
unsigned int fun2 ( unsigned int condition, unsigned int value )
{
value = condition ? 6 : 5;
return(value);
}
More lines of code of a high level language gives the compiler more to work with so if you want to make a general rule about it give the compiler more code to work with. If the algorithm is the same like the cases above then one would expect the compiler with minimal optimization to figure that out.
00000000 <fun0>:
0: e3500000 cmp r0, #0
4: 03a00005 moveq r0, #5
8: 13a00006 movne r0, #6
c: e12fff1e bx lr
00000010 <fun1>:
10: e3500000 cmp r0, #0
14: 13a00006 movne r0, #6
18: 03a00005 moveq r0, #5
1c: e12fff1e bx lr
00000020 <fun2>:
20: e3500000 cmp r0, #0
24: 13a00006 movne r0, #6
28: 03a00005 moveq r0, #5
2c: e12fff1e bx lr
not a big surprise it did the first function in a different order, same execution time though.
0000000000000000 <fun0>:
0: 7100001f cmp w0, #0x0
4: 1a9f07e0 cset w0, ne
8: 11001400 add w0, w0, #0x5
c: d65f03c0 ret
0000000000000010 <fun1>:
10: 7100001f cmp w0, #0x0
14: 1a9f07e0 cset w0, ne
18: 11001400 add w0, w0, #0x5
1c: d65f03c0 ret
0000000000000020 <fun2>:
20: 7100001f cmp w0, #0x0
24: 1a9f07e0 cset w0, ne
28: 11001400 add w0, w0, #0x5
2c: d65f03c0 ret
Hopefully you get the idea you could have just tried this if it wasnt obvious that the different implementations were not actually different.
As far as a matrix goes, not sure how that matters,
if(condition)
{
big blob of code a
}
else
{
big blob of code b
}
just going to put the same if-then-else wrapper around the big blobs of code be they value=5 or something more complicated. Likewise the comparison even if it is a big blob of code it still has to be computed, and equal to or not equal to something is often compiled with the negative, if (condition) do something is often compiled as if not condition goto.
00000000 <fun0>:
0: 0f 93 tst r15
2: 03 24 jz $+8 ;abs 0xa
4: 3f 40 06 00 mov #6, r15 ;#0x0006
8: 30 41 ret
a: 3f 40 05 00 mov #5, r15 ;#0x0005
e: 30 41 ret
00000010 <fun1>:
10: 0f 93 tst r15
12: 03 20 jnz $+8 ;abs 0x1a
14: 3f 40 05 00 mov #5, r15 ;#0x0005
18: 30 41 ret
1a: 3f 40 06 00 mov #6, r15 ;#0x0006
1e: 30 41 ret
00000020 <fun2>:
20: 0f 93 tst r15
22: 03 20 jnz $+8 ;abs 0x2a
24: 3f 40 05 00 mov #5, r15 ;#0x0005
28: 30 41 ret
2a: 3f 40 06 00 mov #6, r15 ;#0x0006
2e: 30 41
we just went through this exercise with someone else recently on stackoverflow. this mips compiler interestingly in that case not only realized the functions were the same, but had one function simply jump to the other to save on code space. Didnt do that here though
00000000 <fun0>:
0: 0004102b sltu $2,$0,$4
4: 03e00008 jr $31
8: 24420005 addiu $2,$2,5
0000000c <fun1>:
c: 0004102b sltu $2,$0,$4
10: 03e00008 jr $31
14: 24420005 addiu $2,$2,5
00000018 <fun2>:
18: 0004102b sltu $2,$0,$4
1c: 03e00008 jr $31
20: 24420005 addiu $2,$2,5
some more targets.
00000000 <_fun0>:
0: 1166 mov r5, -(sp)
2: 1185 mov sp, r5
4: 0bf5 0004 tst 4(r5)
8: 0304 beq 12 <_fun0+0x12>
a: 15c0 0006 mov $6, r0
e: 1585 mov (sp)+, r5
10: 0087 rts pc
12: 15c0 0005 mov $5, r0
16: 1585 mov (sp)+, r5
18: 0087 rts pc
0000001a <_fun1>:
1a: 1166 mov r5, -(sp)
1c: 1185 mov sp, r5
1e: 0bf5 0004 tst 4(r5)
22: 0204 bne 2c <_fun1+0x12>
24: 15c0 0005 mov $5, r0
28: 1585 mov (sp)+, r5
2a: 0087 rts pc
2c: 15c0 0006 mov $6, r0
30: 1585 mov (sp)+, r5
32: 0087 rts pc
00000034 <_fun2>:
34: 1166 mov r5, -(sp)
36: 1185 mov sp, r5
38: 0bf5 0004 tst 4(r5)
3c: 0204 bne 46 <_fun2+0x12>
3e: 15c0 0005 mov $5, r0
42: 1585 mov (sp)+, r5
44: 0087 rts pc
46: 15c0 0006 mov $6, r0
4a: 1585 mov (sp)+, r5
4c: 0087 rts pc
00000000 <fun0>:
0: 00a03533 snez x10,x10
4: 0515 addi x10,x10,5
6: 8082 ret
00000008 <fun1>:
8: 00a03533 snez x10,x10
c: 0515 addi x10,x10,5
e: 8082 ret
00000010 <fun2>:
10: 00a03533 snez x10,x10
14: 0515 addi x10,x10,5
16: 8082 ret
and compilers
with this i code one would expect the different targets to match as well
define i32 @fun0(i32 %condition, i32 %value) #0 {
%1 = icmp ne i32 %condition, 0
%. = select i1 %1, i32 6, i32 5
ret i32 %.
}
; Function Attrs: norecurse nounwind readnone
define i32 @fun1(i32 %condition, i32 %value) #0 {
%1 = icmp eq i32 %condition, 0
%. = select i1 %1, i32 5, i32 6
ret i32 %.
}
; Function Attrs: norecurse nounwind readnone
define i32 @fun2(i32 %condition, i32 %value) #0 {
%1 = icmp ne i32 %condition, 0
%2 = select i1 %1, i32 6, i32 5
ret i32 %2
}
00000000 <fun0>:
0: e3a01005 mov r1, #5
4: e3500000 cmp r0, #0
8: 13a01006 movne r1, #6
c: e1a00001 mov r0, r1
10: e12fff1e bx lr
00000014 <fun1>:
14: e3a01006 mov r1, #6
18: e3500000 cmp r0, #0
1c: 03a01005 moveq r1, #5
20: e1a00001 mov r0, r1
24: e12fff1e bx lr
00000028 <fun2>:
28: e3a01005 mov r1, #5
2c: e3500000 cmp r0, #0
30: 13a01006 movne r1, #6
34: e1a00001 mov r0, r1
38: e12fff1e bx lr
fun0:
push.w r4
mov.w r1, r4
mov.w r15, r12
mov.w #6, r15
cmp.w #0, r12
jne .LBB0_2
mov.w #5, r15
.LBB0_2:
pop.w r4
ret
fun1:
push.w r4
mov.w r1, r4
mov.w r15, r12
mov.w #5, r15
cmp.w #0, r12
jeq .LBB1_2
mov.w #6, r15
.LBB1_2:
pop.w r4
ret
fun2:
push.w r4
mov.w r1, r4
mov.w r15, r12
mov.w #6, r15
cmp.w #0, r12
jne .LBB2_2
mov.w #5, r15
.LBB2_2:
pop.w r4
ret
Now technically there is a performance difference in some of these solutions, sometimes the result is 5 case has a jump over the result is 6 code, and vice versa, is a branch faster than executing through? one could argue but the execution should vary. But that is more of an if condition vs if not condition in the code resulting in the compiler doing the if this jump over else execute through. but this is not necessarily due to the coding style but the comparison and the if and the else cases in whatever syntax.
Julien__
Updated on July 08, 2022Comments
-
Julien__ almost 2 years
I argued with a friend the other day about those two snippets. Which is faster and why ?
value = 5; if (condition) { value = 6; }
and:
if (condition) { value = 6; } else { value = 5; }
What if
value
is a matrix ?Note: I know that
value = condition ? 6 : 5;
exists and I expect it to be faster, but it wasn't an option.Edit (requested by staff since question is on hold at the moment):
- please answer by considering either x86 assembly generated by mainstream compilers (say g++, clang++, vc, mingw) in both optimized and non optimized versions or MIPS assembly.
- when assembly differ, explain why a version is faster and when (e.g. "better because no branching and branching has following issue blahblah")
-
dtell about 7 yearsThis is a great answer and should be accepted.
-
Ciprian Tomoiagă about 7 yearsI would've never though of using the addition (<- what python does to you.)
-
Muzer about 7 years@CiprianTomoiaga and unless you're writing an optimiser, you shouldn't! In almost all cases you should let the compiler do optimisations like this, especially where they severely reduce code readability. Only if performance testing shows issues with a certain bit of code should you even start to try to optimise it, and even then keep it clean and well-commented, and only perform optimisations that make a measurable difference.
-
Matthieu M. about 7 yearsExpensive and not optimized out. For example, if the default constructor zeroes-out the matrix, the compiler could realize that the assignment is just about overwriting those 0s, so not zero it at all, and write directly in this memory. Of course, optimizers are finicky beasts so it's hard to predict when they kick in or not...
-
Matthieu M. about 7 yearsThis answers fails to account for the question: what if
value
is a matrix? Optimizers are very good with primitive types, but the same optimizations are not necessarily available with user-defined types. -
bolov about 7 years@MatthieuM. of course. What I meant by "expensive" is "expensive to execute (by a metric, be it either CPU clocks, resource utilization etc.) even after compiler optimizations.
-
Jason C about 7 years@MatthieuM. Define "matrix". That part of the question is too unclear to answer. In any case this answer adequately describes the techniques needed to determine the results for any value type, an exhaustive list of all possible value types (and on all compilers on all platforms, if you want to be thorough) is neither possible nor useful. Just do the same analysis that was done here, but with a "matrix" or a "widget" or a
double
or whatever (and with your compiler on your target platform) instead. -
nrocha about 7 yearsWell done @CompuChip. In fact, this addition trick is very well found (from a theoretical view, at last). But what if we change the lines of code
value = 6
tovalue = -1
? Would the compiler, under the same optimization level, do areturn 5 + (condition * -6)
? -
Matthieu M. about 7 years@JasonC: I agree that the techniques are useful; my worry is that someone might imagine that this optimization is automatic when in fact it's fairly specialized and relies on the compiler being able to optimize away a number of unnecessary allocations/writes in the general case.
-
Jason C about 7 years@MatthieuM. A mention that it is "fairly specialized and relies on the compiler being able to optimize away a number of unnecessary allocations/writes in the general case" in this answer, then, would be an excellent catch-all strategy rather than adding another example (which leaves you with the same problem, just with one more specific example). And, of course, in the bigger picture, I'm generally convinced that the people who truly need to learn more either will learn more or eventually natural selection will happen (and if neither of those happens, then it didn't matter anyways). :)
-
old_timer about 7 yearsIf concerned about performance then they wont compile unoptimized. But certainly how "good" is the optimizer depends on the compiler/version.
-
plugwash about 7 yearsIt seems unlikely to me that default construction would be expensive but moves cheap.
-
CompuChip about 7 years@CiprianTomoiaga another reason you shouldn't is that I think the standard does not state that casting true to an int always produces 1, just a non-zero value. The assembly code doesn't assume this either actually, it depends on the
setne
instruction which is documented to set the register to 0 or 1. As I tried to stress in the answer, let the compiler do this, don't go micro-optimize on your own! -
Ciprian Tomoiagă about 7 yearsI wanted to reply to Muzer, but it wouldn't have added anything to the thread. However, I just want to restate that a human's job is to write code for humans and let the compiler write code for the machine. I said that from a compiler developer PoV (which I'm not, but I learnt a bit about them)
-
user2357112 about 7 years@CiprianTomoiaga: Python actually does let you do
condition + 5
, assumingcondition
is a boolean. Many Python users feel this shouldn't be allowed, but it is the way it is. -
supercat about 7 years@CompuChip: Computers often know things that humans don't, and humans often know things that computers don't. Given something like
if (x == 23 && x) {...} else switch(x) {...}
, moving the code from the "if" into "case 23" is apt to improve performance unlessx
happens to be 23 more than some fraction of the time. The compiler may have a better sense than the programmer what the break-even fraction would be, but the programmer may know more than the compiler about how oftenx
will be 23. -
supercat about 7 yearsI'd reword your second point as to whether the CPU has an out-of-order engine that gets delayed by write-after-write hazards. If a CPU has an out-of-order engine that can handle such hazards without delay, there's no problem, but there's also no problem if the CPU doesn't have an out-of-order engine at all.
-
zwol about 7 years@supercat The paragraph at the end is meant to cover that case but I'll think about how to make it clearer.
-
T.C. about 7 yearsThe value
true
converted toint
always yields 1, period. Of course, if your condition is merely "truthy" and not thebool
valuetrue
, then that's a completely different matter. -
JuSTMOnIcAjUSTmONiCAJusTMoNICa about 7 years"unless you are working on a space shuttle you don't, and even then you probably don't" -- +1, and if performance is that critical, you're likely writing it in assembly anyway....
-
supercat about 7 yearsI don't know what current CPUs have a cache which would cause sequentially-executed code to run faster the second time through than the first time (some flash-based ARM parts have an interface that can buffer a few rows of flash data, but can fetch code sequentially as fast as they execute it, but the key to making branch-heavy code run fast on those is to copy it to RAM). CPUs without any out-of-order execution at all are far more common than those which would get delayed by write-after-write hazards.
-
TrentP about 7 years@plugwash Consider a class with a very large allocated array. Default constructor allocates and initializes the array, which is costly. The move (not copy!) constructor can just swap pointers with the source object and doesn't need to allocate or initialize the large array.
-
CompuChip about 7 years@nrocha this particular compiler (probably most others) will eventually rewrite the code to return
(-condition) | 5
, so ifcondition
is 0 it gives(-0) | 5 = 0
and if condition is 1 thenneg(1) = 11...11
so theor
just returns 5. Note that in the answer I provided a link where you can easily try it yourself. -
Neil about 7 yearsAFAIK there is no comment on which compiler/CPU architecture etc, so potentially, their compiler doesn't do optimising. They could be compiling on anything from an 8-bit PIC to a 64-bit Xeon.
-
cmaster - reinstate monica about 7 yearsAs long as the parts are simple, I would definitely prefer using the
?:
operator over introducing a new function. After all, you'll likely not just pass the condition to the function, but also some constructor arguments. Some of which may not even be used bycreate()
depending on the condition. -
bolov about 7 years@cmaster I've seen (and been) on both sides of the arguments of for and agains
?:
. Ultimately I guess it's a matter of style and for every one who says you should use one you'll find another one who says you should use the other. The argument that slightly tipped the balance for me is that in debug the branch taken can be identified quicker in anif
structure than in a?:
operator. -
EvSunWoodard about 7 years@CompuChip - Obviously, this question is about C/C++, but is there a different answer if you are using a language without a compiler like python or Javascript? In the case where it is interpreted, does the interpreter still make the best optimisations?
-
Kaz about 7 years@MatthieuM. If the compiler only has a declaration of that constructor from the
class
, and the zeroing out is in a separately compiledcpp
, it's a very unlikely optimization. -
the_lotus about 7 yearsWouldn't it also depend if condition is mostly true or false. I think that the CPU likes to predict what is the outcome of a jump before executing it.
-
Matthieu M. about 7 years@Kaz: Well, things are blurry today with LTO, but my point is more that even with full information a compiler could still fail to perform the optimization. If the computations involved are too complicated for it, it'll just bail out and "do the safe thing".
-
Tim about 7 yearsI think there are places that this maters more than rocket science - high frequency forex trading could be one?
-
Quuxplusone about 7 yearsUnfortunately for this answer, feeding the same source code into a C compiler (or into a C++ compiler) produces vastly different output than feeding it into Glen's brain. There is no difference, no "optimization" potential, between any of the alternatives at the source code level. Just use the one that's most readable (presumably the if/else one).
-
jpaugh about 7 years@Yep. The compiler is either likely to optimize both variants to the fastest version, or likely to add extra overhead that far outweighs the difference between the two. Or both.
-
Toby Speight about 7 yearsAssuming that it's "not necessarily C" does seem a sensible choice, given that the question is tagged as C++ (unfortunately, it doesn't manage to declare the types of the variables involved).
-
Julien__ about 7 yearsThis is very insightful