Does a c/c++ compiler optimize constant divisions by power-of-two value into shifts?
Solution 1
Even with g++ -O0
(yes, -O0
!), this happens. Your function compiles down to:
_Z3divm:
.LFB952:
pushq %rbp
.LCFI0:
movq %rsp, %rbp
.LCFI1:
movq %rdi, -24(%rbp)
movq $64, -8(%rbp)
movq -24(%rbp), %rax
shrq $6, %rax
leave
ret
Note the shrq $6
, which is a right shift by 6 places.
With -O1
, the unnecessary junk is removed:
_Z3divm:
.LFB1023:
movq %rdi, %rax
shrq $6, %rax
ret
Results on g++ 4.3.3, x64.
Solution 2
Most compilers will go even further than reducing division by powers of 2 into shifts - they'll often convert integer division by a constant into a series of multiplication, shift, and addition instructions to get the result instead of using the CPU's built-in divide instruction (if there even is one).
For example, MSVC converts division by 71 to the following:
// volatile int y = x / 71;
8b 0c 24 mov ecx, DWORD PTR _x$[esp+8] ; load x into ecx
b8 49 b4 c2 e6 mov eax, -423447479 ; magic happens starting here...
f7 e9 imul ecx ; edx:eax = x * 0xe6c2b449
03 d1 add edx, ecx ; edx = x + edx
c1 fa 06 sar edx, 6 ; edx >>= 6 (with sign fill)
8b c2 mov eax, edx ; eax = edx
c1 e8 1f shr eax, 31 ; eax >>= 31 (no sign fill)
03 c2 add eax, edx ; eax += edx
89 04 24 mov DWORD PTR _y$[esp+8], eax
So, you get a divide by 71 with a multiply, a couple shifts and a couple adds.
For more details on what's going on, consult Henry Warren's Hacker's Delight book or the companion webpage:
There's an online added chapter that provides some addition information about about division by constants using multiplication/shift/add with magic numbers, and a page with a little JavaScript program that'll calculate the magic numbers you need.
The companion site for the book is well worth reading (as is the book) - particularly if you're interested in bit-level micro optimizations.
Another article that I discovered just now that discusses this optimization: https://docs.microsoft.com/en-us/archive/blogs/devdev/integer-division-by-constants
Solution 3
Only when it can determine that the argument is positive. That's the case for your example, but ever since C99 specified round-towards-zero semantics for integer division, it has become harder to optimize division by powers of two into shifts, because they give different results for negative arguments.
In reaction to Michael's comment below, here is one way the division r=x/p;
of x
by a known power of two p
can indeed be translated by the compiler:
if (x<0)
x += p-1;
r = x >> (log2 p);
Since the OP was asking whether he should think about these things, one possible answer would be "only if you know the dividend's sign better than the compiler or know that it doesn't matter if the result is rounded towards 0 or -∞".
Solution 4
Yes, compilers generate the most optimal code for such simplistic calculations. However, why you are insisting specifically on "shifts" is not clear to me. The optimal code for a given platform might easily turn out to be something different from a "shift".
In general case the old and beaten-to-death idea that a "shift" is somehow the most optimal way to implement power-of-two multiplications and divisions has very little practical relevance on modern platforms. It is a good way to illustrate the concept of "optimization" to newbies, but no more than that.
Your original example is not really representative, because it uses an unsigned type, which greatly simplifies the implementation of division operation. The "round towards zero" requirement of the C and C++ languages makes it impossible to do division with a mere shift if the operand is signed.
Related videos on Youtube
porgarmingduod
Updated on May 11, 2022Comments
-
porgarmingduod about 2 years
Question says it all. Does anyone know if the following...
size_t div(size_t value) { const size_t x = 64; return value / x; }
...is optimized into?
size_t div(size_t value) { return value >> 6; }
Do compilers do this? (My interest lies in GCC). Are there situations where it does and others where it doesn't?
I would really like to know, because every time I write a division that could be optimized like this I spend some mental energy wondering about whether precious nothings of a second is wasted doing a division where a shift would suffice.
-
GManNickG about 14 yearsIf you're using a compiler within the decade (or two), yeah.
-
Brian about 14 yearsYou are wasting mental energy on silly micro-optimizations. Ignore them until you are forced to confront them (i.e. because your program is running too slowly and a profiler is pointing at them).
-
porgarmingduod about 14 yearsBut let's not forget that micro-optimizations are fun!
-
janks about 14 yearsKeep in mind that none of us know what version of gcc you're using, nor what hardware platform you're on, and what your optimization flags are. The answer will vary enormously based on those factors.
-
Nemanja Trifunovic about 14 yearsUnderstanding compiler optimizations only helps avoiding "silly micro-optimizations" in the code, so this is not a bad question at all.
-
Dennis Zickefoose about 14 yearsAt this level, knowledge of specific optimizations are much less useful than simply understanding that optimizations take place. Understanding higher level optimizations is much more informative.
-
Romain Hippeau about 14 yearsAs a note: A decent modern compiler will generate better optimized code than the average assembler programmer.
-
v.oddou over 9 yearsyou know giving him the command line flags that shows how to get the assembly output would be the answer to the question.
-
-
porgarmingduod about 14 yearsOn the assumption that
shrq
has something to do with shifting, you just greatly expanded my knowledge of assembly. From nothing to next to nothing. Thanks! -
Alex Jasmin about 14 yearsOn my PC with gcc 3.4.5 the first sample (the with a const variable)
gcc -O0
generates adivl
but I getshrq
withgcc -O2
. -
janks about 14 yearsWithout optimization I get a divide instruction, not a shift. My gcc is 4.4.1.
-
Keyur Padalia about 14 years@porgarmingduod (frak, that's hard to type): I couldn't write any of this myself either. But being able to read it is useful at times, for example, if you want to know what the compiler is up to. Just start with some toy programs, run
g++ -S
over them, study the generated.s
file, and learn! -
porgarmingduod about 14 years@Thomas: That is actually a good idea. I've never quite been able to motivate myself to learn programming assembly, however, learning how to read it, at least cursory, might be worthwhile.
-
Keyur Padalia about 14 years@janks: Strange. Even when compiling for i386, I still get a
shrl
instruction at-O0
. -
Keyur Padalia about 14 years@janks: When compiling with
gcc
(i.e. as a C program), I get adivl
instruction at-O0
. Withg++
(i.e. as a C++ program), I get ashrl
. -
janks about 14 yearsOh, I should have mentioned x86
-
Alex Jasmin about 14 years@Thomas Same thing on the five years old compiler ;-)
gcc -O0
generates adivl
andg++ -O0
generates ashrl
-
janks about 14 years@Thomas: janks@phoenix:/tmp$ uname -m && gcc --version | head -1 i686 gcc (Ubuntu 4.4.1-4ubuntu9) 4.4.1 janks@phoenix:/tmp$ gcc -O3 -S test.c && egrep shrl test.s shrl $6, %eax janks@phoenix:/tmp$ gcc -O0 -S test.c && egrep divl test.s divl -4(%ebp)
-
Keyur Padalia about 14 years@janks: Backticks around the code,
`like this`
. But multiline code in a comment won't be pretty. Anyway, like I said, I used g++ and you use gcc, and that seems to make the difference. -
GManNickG about 14 yearsFor example, I'm pretty sure for integer multiplication by 2, MSVC just adds it with itself rather than a shift or multiply.
-
Michael Burr about 14 yearsCompilers can (and do) handle that problem by adding an adjustment to the operand if it's negative before performing the shift. So, it doesn't compile down to nothing but a shift, but it's still better than a division (generally - I think that integer division is still on the order of 15 cycles, but I'm not sure).
-
v.oddou over 9 yearsCPU shifters most often use barrel roll wiring which makes them instantaneous. there is no faster than that.
-
v.oddou over 9 years@porgarmingduod:
shrq
= shift right quad word. -
AnT stands with Russia over 9 years@v.oddou: Barrel roll wiring can make them instantaneous only in cases when the amount of the shift is hardcoded. Once it becomes a run-time value (as in this case) it is no longer instantaneous.
-
AnT stands with Russia over 9 yearsThe example is not really representative because of unsigned type being used.
-
phuclv over 9 years@AndreyT you can use a multiplexer to do variable barrel shifting
-
Peter Cordes over 6 years@Thomas: did you forget
const
? Withgcc -O0
, that would force the compiler to assume thatx
could have been modified (with a debugger) between C statements. But anyway,-O0
is silly, and I'm sure even old gcc would optimize/64
into a shift at-O2
or-Os
(or even-O1
). This is kind of optimization is just a basic requirement for any optimizing compiler worthy of the name, because it's trivial and local. (For signed integers, it takes a couple extra instructions to get the semantics right for positive and negative inputs, but still easy and worth it.)