Computational cost of trig functions

math optimization trigonometry

10,397

Solution 1

Modern x86 processors include trig functions in their instruction set, but they take many cycles to execute. So if you're on such a processor, and if you have no dependencies in your code (i.e. you don't need the result of one sin computation in order to start the next one), then you probably won't get much faster than using sin and cos directly, as they will be fully pipelined, achieving an effective rate of 1 per cycle.

Solution 2

You need to profile your code!

You need to profile this yourself. Based on my results, trig functions take about 100 ns and divisions about 20 ns. That can be easily converted to answers. But again, the most important thing is that you profile this on your hardware. That way you get exactly right answers and knowledge for your system.

Solution 3

(This was originally a comment on codekaizen's answer, but It got rather long....)

(Codekaizen): Most trig functions are implemented as lookup tables these days.

um.. Since most trig function take a double precision argument, looking up the value isn't practical. I believe most look up the integers on either side and then interpolate from there (i.e. Sin(5.279) is 27.9% the way from Sin(5) to Sin(6)). That less work that calculating the value directly, but still a fair amount of calculations.

10,397

Author by

ack

Updated on June 06, 2022

Comments

ack about 2 years

Possible Duplicate:
How do Trigonometric functions work?

What actually goes into the computation of trig functions like Sin, Cos, Tan and Atan?

I think I've found an optimization in my code where I can avoid using any of these functions and base the problem around slope instead of angles. So that means a couple division operations in place of the above trig functions. But I'd like to know more about what goes into those trig functions so that I can compare my new code (from the perspective of number of basic math ops). Or maybe I've just found a more circuitous way of doing the same thing, or worse, introduced a less efficient method.

Using C++ and Python but I imagine these is fairly language agnostic with math operation cost being relative to the most primitive operations.
ack over 13 years

Faster than division? I guess that's comparing two completely different things and that becomes a question of architecture specifics... If it really is a lookup table, that would imply my old method with pure trig functions was faster, no?
aioobe over 13 years

I would think so, yes. A lookup table requires a memory access though, unless you get a hit on some level of the cache.
codekaizen over 13 years

I've never seen a table where each distinct double value is stored.
ack over 13 years

Of course that will get the job done, but it won't really tell me why it's faster.
Jeffrey L Whitledge over 13 years

@aioobe, I'm pretty sure the lookup table for many systems is implemented on-chip, so that no memory access is required. I think an error in the lookup table was the cause of the infamous Pentium math problem some years ago.
aioobe over 13 years

@Jeffrey, you have a link for that math-problem?
James Curran over 13 years

@codekaizen: Neither have I. My point is that your answer implements that you have.
codekaizen over 13 years

I can't see how you can logically entail that. Given a table of values doesn't imply all possible values. A moment's reflection should yield that a table covering all values is just unreasonable.
timday over 13 years

@aioobe: en.wikipedia.org/wiki/Pentium_FDIV_bug : "...missing entries in the lookup table used by the divide operation..."
aioobe over 13 years

Oh, right, I thought you ment error in the trig-lookup-method and that didn't sound familiar. Now I get it :-)
Niki over 13 years

Do you have any source to back that up? I highly doubt that. That would mean that calculating e.g. the 2nd derivative of sin(x) numerically would be 0! Calculating the tayler approximation converges very quickly if you map x to a value between [0..pi/4]
Martin Beckett over 13 years

It's not that bad - firstly you only need 0-45deg because of symmetry. And you can use identities to work out sin(a+b) from sin(a)+sin(b) so you only need to store a very sparse table,
comingstorm over 13 years

I believe the full trig functions are much slower than 1 per cycle. You may be able to do other things while waiting for the trig function to complete, but the trig operations themselves are not pipelined that way.
comingstorm over 13 years

So, you may be able to speed things up by using an approximation, or you may not -- depending on your application, and how much approximation you can take.
comingstorm over 13 years

Also, reworking your math to avoid the trig can sometimes speed up and simplify your code a great deal, completely independent of the cost of the trig operations themselves.
Peter Cordes almost 5 years

C sin() doesn't usually compile to x87 fsin, with most compilers. The instructions are heavily micro-coded, like 53 to 105 uops on Skylake (agner.org/optimize) and with throughput potentially as bad as latency (not pipelined). On Nehalem (current when you posted this answer), fsin was 100 uops with latency=throughput = 40 to 100 cycles. And many math libraries already didn't use it.