Is multiplication faster than float division?
Solution 1
Multiplication is faster than division. At university I was taught that division takes six times that of multiplication. The actual timings are architecture dependent but in general multiplication will never be slower or even as slow as division. Always optimize your code towards using multiplication if the rounding errors allow.
So in an example this would typically be slower ...
for (int i=0; i<arraySize; i++) {
a[i] = b[i] / x;
}
... than this ...
y=1/x;
for (int i=0; i<arraySize; i++) {
a[i] = b[i] * y;
}
Of course with rounding errors, you'll loose (a little) precision with the second method, but unless you are repeatedly calculating x=1/x;
that's unlikely to cause much issue.
Edit:
Just for reference. I've dug up a third party comparison of operation timings by searching on Google.
http://gmplib.org/~tege/x86-timing.pdf
Look at the numbers on MUL and DIV. This indicates differences of between 5 and 10 times depending on the processor.
Solution 2
It is quite likely that the compiler will convert a divide to a multiply in this case, if it "thinks" it's faster. Dividing by 2 in floating point may also be faster than other float divides. If the compiler doesn't convert it, it MAY be faster to use multiply, but not certain - depends on the processor itself.
The gain from manually using multiply instead of divide can be quite large in cases where the compiler can't determine that it's "safe" to do so (e.g. 0.1 can't be stored exactly as 0.1 in a floating point number, it becomes 0.10000000149011612). See below for figures on AMD processors which can be taken as representative for the class.
To tell if your compiler does this well or not, why don't you write a bit of code to experiment. Make sure you write it so that the compiler doesn't just calculate a constant value and discards all the calculation in the loop tho'.
Edit:
AMD's optimisation guide for Family 15h processors, provide figures for fdiv
and fmul
are 42 and 6 respectively. SSE versions are a little closer, 24 (single) or 27 (double) cycles for DIVPS, DIVPD DIVSS and DIVSD (divide), and 6 cycles for all forms of multiply.
From memory, Intel's figures aren't that far off.
Solution 3
Floating point multiplication usually takes fewer cycles than floating point division. But with literal operands the optimizer is well aware of this kind of micro-optimizations.
Admin
Updated on March 03, 2020Comments
-
Admin about 4 years
In C/C++, you can set up the following code:
double a, b, c; ... c = (a + b) / 2;
This does the exact same thing as:
c = (a + b) * 0.5;
I'm wondering which is better to use. Is one operation fundamentally faster than the other?
-
Thomas almost 11 years"Can't understand why this has been down voted so much" because on SO any question that even comes remotely close to undefined/unspecified/implementation dependent behaviour is likely immediately downvoted, closed, and swept under the carpet. Apparently pedantry is more relevant than realistic and pragmatic development needs. Slightly tongue-in-cheek but it sometimes feels that way.
-
Eric Postpischil almost 11 yearsThis is a situation where we cannot rely on the optimizer, if we ask about cases other than single example in the question. Division by a power of two and multiplication by the inverse are equivalent in binary floating-point, but divisions by other factors have no equivalent multiplication, because the reciprocal cannot be exactly represented. This prohibits the optimizer from making the transformation. Thus, programmers should be aware of this and should favor multiplication if they know the rounding error in the inverse is acceptable.
-
Mats Petersson almost 11 years@EricPostpischil: Does it "help" to give flags for "use fast math" in cases where we don't care too much about precision? Say, forexample in case of 3d games, you'd expect that the last couple of bits of floating point precision doesn't matter.
-
Philip Couling almost 11 years@Thomas yup. What's galling is that this isn't even ambiguous or undefined. Processor manufacturers usually publish the stats. This is where the stats come from for compilers to perform the optimizations. So here "undefined" means "I haven't read it". aaarrrrrrr. Okay. </rant>
-
Mats Petersson almost 11 yearsWhat the h*** did I get wrong now, that I get downvoted for providing more information? Please tell me so I can learn!
-
jason almost 11 yearsI didn't downvote, have no idea why someone else did, but I hope you don't mind that I corrected your statement about
0.1
. -
Stephen Canon almost 11 years
x/y —> x * (1/y)
is licensed by-ffast-math
. Specifically, this is licensed by-freciprocal-math
, which is one of the “optimizations” included in-ffast-math
. -
josesuero almost 11 yearsIt's not quite clear what you mean in the second paragraph (the parenthesis is never closed, and the sentence kind of trails off. But as has been pointed out in the comments to other answers, the compiler typically won't convert divisions to multiplies when dealing with floating-point code.
-
Damon almost 11 years@Thomas: I would guess that the downvotes come from the fact that the question is pretty trivial and has an exact duplicate that's automatically shown in the "related" column to the right (read as: no research effort), the optimization has a high likelihood of being premature (think out of order execution and memory bandwidth), and the code snippet contains implicit int-double conversions and extra operations that are unrelated to the question. (That's only a guess, of course, I'm not one of the downvoters, so can't tell)
-
Mats Petersson almost 11 years@Jalf: yes, I have clarified the second paragraph a little bit.
-
Lightness Races in Orbit almost 11 years@Damon is correct. Guessing at downvoters' motivations may be a fun game but, in this case at least, Thomas lost.
-
Mike Dunlavey almost 11 years@Thomas: I tend to go easy on newbies. We all make or at least made mistakes at one point or another, and getting scorched for it doesn't make anyone feel welcome. Helping less experienced people is a good thing, and StackOverflow is very good at that.
-
Peter Cordes about 8 yearsFor x86, see agner.org/optimize. div and mul have different ratios of throughput to latency. Both are only 1 uop, so can overlap well with other operations. Your ~6x more expensive looks about right for throughput, but the latency ratio these days for FP div is more like 4 (on Intel Haswell). Intel Skylake has a very heavily pipelined FP div unit (throughput of one
double
result per 4 cycles). (256b SIMD vectors have lower throughput).