Is fastcall really faster?
Solution 1
It depends on the platform. For a Xenon PowerPC, for example, it can be an order of magnitude difference due to a load-hit-store issue with passing data on the stack. I empirically timed the overhead of a cdecl
function at about 45 cycles compared to ~4 for a fastcall
.
For an out-of-order x86 (Intel and AMD), the impact may be much less, because the registers are all shadowed and renamed anyway.
The answer really is that you need to benchmark it yourself on the particular platform you care about.
Solution 2
Is the fastcall calling convention really faster than other calling conventions, such as cdecl?
I believe that Microsofts implementation of fastcall
on x86 and x64 involves passing the first two parameters in registers instead of on the stack.
Since it typically saves at least four memory accesses, yes it is generally faster. However, if the function involved is register-starved and is thus likely to write them to locals on the stack anyway, there's not likely to be a significant increase.
Solution 3
Calling convention (at least on x86) doesn't really make much of a difference in speed. In Windows, _stdcall
was made the default because it produces tangible results for nontrivial programs in that it usually results in smaller code size when compared with _cdecl
. _fastcall
is not the default value because the difference it makes is far less tangible. What you make up for in argument passing via registers you lose in less efficient function bodies (as previously mentioned by Anon.). You don't gain anything by passing in registers if the called function immediately needs to spill everything out into memory for its own calculations.
However, we can spout theoretical ideas all day long -- benchmark your code for the right answer. _fastcall
will be faster in some cases, and slower in others.
Solution 4
On modern x86 - no. Between L1 cache and in-lining there's no place for fastcall.
Related videos on Youtube
zr.
Updated on August 19, 2020Comments
-
zr. over 3 years
Is the fastcall calling convention really faster than other calling conventions, such as cdecl? Are there any benchmarks out there that show how performance is affected by calling convention?
-
avakar about 14 years"How is performance affected by calling convention?" Marginally.
-
Crashworks about 14 yearsExcept when it's affected massively.
-
bluish about 11 yearsSee also bcbjournal.org/articles/vol4/0004/…
-
susmits over 10 yearsSome background may be found in this article: blogs.msdn.com/b/larryosterman/archive/2005/10/10/479278.aspx. To quote: "IIRC, back in the NT4 days, the entire NT kernel was recompiled with __fastcall and it got something like a 10% overall speedup. "
-
-
Crashworks about 14 yearsIf a function is inlined it is neither fastcall nor cdecl nor any other calling convention.
-
ima about 14 yearsExactly. Fetching from L1 is 1 cycle over register - in most cases it's below noise level, it's hard to even benchmark it reliably. And functions where a few cycles on call are important difference should be inlined anyway.
-
Mark Ransom over 11 yearsI have to agree with this - any function that is simple enough to benefit from fastcall would benefit from inlining even more.
-
phuclv over 10 yearsIn x64 there is only one calling convention
-
0xC0000022L over 5 yearsExcept that inlining isn't always possible. Think callbacks from code implemented by two different parties ...
-
Kotauskas about 5 years@phuclv How exacttly is there one calling convention? On Windows
x86_64
mingw-w64
C++11,__attribute__((fastcall))
compiles and produces afastcall
-compatible function. Besides, an achitecture cannot standartize calling conventions since they are a compiler feature. -
phuclv about 5 years@VladislavToncharov of course I'm specifically mentioning the calling convention on 64-bit windows, since this question is talking about "Microsoft's implementation". Calling convention is defined by the platform, not the compiler. GCC on Windows still have to follow Windows' convention when interacting without outside components