Which inline assembly code is correct for rdtscp?
Solution 1
Here's C++ code that will return the TSC and store the auxiliary 32-bits into the reference parameter
static inline uint64_t rdtscp( uint32_t & aux )
{
uint64_t rax,rdx;
asm volatile ( "rdtscp\n" : "=a" (rax), "=d" (rdx), "=c" (aux) : : );
return (rdx << 32) + rax;
}
It is better to do the shift
and add
to merge both 32-bit halves in C++ statement rather than inline, this allows the compiler to schedule those instructions as it sees fit.
Solution 2
According to this, this operation clobbers EDX and ECX. You need to mark those registers as clobbered which is what the second one does. BTW, is this the link where you got the above code or did you find it elsewhere? It also shows a few other variaitions for timings as well which is pretty neat.
James
Updated on June 05, 2022Comments
-
James almost 2 years
Disclaimer: Words cannot describe how much I detest AT&T style syntax
I have a problem that I hope is caused by register clobbering. If not, I have a much bigger problem.
The first version I used was
static unsigned long long rdtscp(void) { unsigned int hi, lo; __asm__ __volatile__("rdtscp" : "=a"(lo), "=d"(hi)); return (unsigned long long)lo | ((unsigned long long)hi << 32); }
I notice there is no 'clobbering' stuff in this version. Whether or not this is a problem I don't know... I suppose it depends if the compiler inlines the function or not. Using this version causes me problems that aren't always reproducible.
The next version I found is
static unsigned long long rdtscp(void) { unsigned long long tsc; __asm__ __volatile__( "rdtscp;" "shl $32, %%rdx;" "or %%rdx, %%rax" : "=a"(tsc) : : "%rcx", "%rdx"); return tsc; }
This is reassuringly unreadable and official looking, but like I said my issue isn't always reproducible so I'm merely trying to rule out one possible cause of my problem.
The reason I believe the first version is a problem is that it is overwriting a register that previously held a function parameter.
What's correct... version 1, or version 2, or both?
-
ughoavgfhw about 11 yearsWrong instruction. It's
rdtscp
, notrdtsc
, and any output is known to be clobbered so it doesn't need to be listed. The problem is thatrdtscp
also destroys ecx, which version 2 marks as clobbered but version 1 does not. -
Michael Dorgan about 11 yearsDoes this not also clobber ECX as well? If not, I'll just delete my answer and call your good.
-
amdn about 11 yearsYes, the "=c" specification tells the compiler that ECX will hold the output, which implies that it is clobbered
-
Michael Dorgan about 11 yearsLeaving this here for the other SO link which may be useful - though the above answer is better.
-
James about 11 yearsThank you. The first version didn't mark
ecx
as being clobbered. This register initally held a parameter value, which was used in a conditional which, if failed, calledstd::terminate()
. When this was clobbered the condition obviously was checking the wrong thing! -
FrankH. about 11 yearsSometimes useful to have references as to where else this type of code is used ... hence, for example, have a look at lxr.free-electrons.com/source/arch/x86/include/asm/msr.h#L205 and lxr.free-electrons.com/source/arch/x86/include/asm/msr.h#L42 (the opcode for
rdtscp
is the byte sequence given there). -
nodakai over 9 yearsIf you don't need the value set to
%ecx
(which can be used identify CPU cores), you can simply use the clobbers list:__asm__ __volatile__("rdtscp" : "=a"(lo), "=d"(hi) : : "%ecx" );
-
haelix over 5 yearsConcise explanation on why we need the function's
aux
parameter?