What is the cost of a function call?
Solution 1
relative timings (shouldn't be off by more than a factor of 100 ;-)
- memory-access in cache = 1
- function call/return in cache = 2
- memory-access out of cache = 10 .. 300
- disk access = 1000 .. 1e8 (amortized depends upon the number of bytes transferred)
- depending mostly upon seek times
- the transfer itself can be pretty fast
- involves at least a few thousand ops, since the user/system threshold must be crossed at least twice; an I/O request must be scheduled, the result must be written back; possibly buffers are allocated...
- network calls = 1000 .. 1e9 (amortized depends upon the number of bytes transferred)
- same argument as with disk i/o
- the raw transfer speed can be quite high, but some process on the other computer must do the actual work
Solution 2
A function call is simply a shift of the frame pointer in memory onto the stack and addition of a new frame on top of that. The function parameters are shifted into local registers for use and the stack pointer is advanced to the new top of the stack for execution of the function.
In comparison with time
Function call ~ simple memory access
Function call < Disk Access
Function call < memory access on another computer
Function call < disk access on another computer
Solution 3
Compared to a simple memory access - slightly more, negligible really.
Compared to every thing else listed - orders of magnitude less.
This should hold true for just about any language on any OS.
Solution 4
In general, a function call is going to be slightly slower than memory access since it in fact has to do multiple memory accesses to perform the call. For example, multiple pushes and pops of the stack are required for most function calls using __stdcall on x86. But if your memory access is to a page that isn't even in the L2 cache, the function call can be much faster if the destination and the stack are all in the CPU's memory caches.
For everything else, a function call is many (many) magnitudes faster.
Solution 5
Hard to answer because there are a lot of factors involved.
First of all, "Simple Memory Access" isn't simple. Since at modern clock speeds, a CPU can add two numbers faster than it get a number from one side of the chip to the other (The speed of light -- It's not just a good idea, it's the LAW)
So, is the function being called inside the CPU memory cache? Is the memory access you're comparing it too?
Then we have the function call will clear the CPU instruction pipeline, which will affect speed in a non-deterministic way.
FlinkmanSV
https://bitcoinsv.academy/ https://bsvquickstart.com/ https://www.youtube.com/watch?v=3MJSEGnpgB8 https://www.reddit.com/r/bitcoincashSV/
Updated on March 05, 2020Comments
-
FlinkmanSV about 4 years
Compared to
- Simple memory access
- Disk access
- Memory access on another computer(on the same network)
- Disk access on another computer(on the same network)
in C++ on windows.
-
Mladen Janković over 15 yearsPipelines should not be flushed on modern CPU when call is made. CPU flushes its pipelines only when it wrongly predicts conditional jump (or when interrupt or similar unpredictable event occurs).
-
James Curran over 15 years"The law" : The electrical impulses which make up a value in memory, moving at the speed of light, will take several CPU cycles to move the three inches from one end of the CPU to the other.
-
Suma over 15 yearsDid you really mean "shift of the stack pointer"? "Shift of the instruction pointer" would see more appropriate to me. I would describe it as: IP is pushed onto stack IP is set to the function address SP is advanced to reserve local variables spaces (may include stack frame creation)
-
jW. over 15 yearsThank you for the clarification! I will update my answer for anyone who browses in the future
-
Martin York over 15 yearsIt's the LAW! The definition of a law is that it is meant to be interpreted. Rules on the other hand are absolute. That is why Rugby has Laws and other sports have Rules.
-
Asaf R over 15 yearsAll agreed, except from what I know memory access out of cache is about 200~300 times cache access.
-
mfx over 15 yearsChanged. The 10... range is to include first-level caches. Also, i don't want to create the impression that a disk access can be as fast as a memory access; the situation is not that bad ;-)
-
Michael Borgwardt about 14 years@James: three inches?? I'd like to know what kind of monster CPU you are working with...
-
James Curran about 14 years@Michael... OkOkOk... A modern Intel CPU is 1.95"x1.95" meaning the corner to corner distance is only 2.75" --- except the signals aren't moving in a straight line-- but regardless, it's still several cycles at the speed of light.
-
Michael Borgwardt about 14 years@James: the actual processor die is much smaller than that. The Core i7's die is 263mm^2, which is (assuming a square) about 0.64"x0.64"
-
einpoklum over 8 yearsIn your cache figures - aren't you forgetting the argument handling?