Which inline assembly code is correct for rdtscp?

gcc assembly x86-64 inline-assembly

10,589

Solution 1

Here's C++ code that will return the TSC and store the auxiliary 32-bits into the reference parameter

static inline uint64_t rdtscp( uint32_t & aux )
{
    uint64_t rax,rdx;
    asm volatile ( "rdtscp\n" : "=a" (rax), "=d" (rdx), "=c" (aux) : : );
    return (rdx << 32) + rax;
}

It is better to do the shift and add to merge both 32-bit halves in C++ statement rather than inline, this allows the compiler to schedule those instructions as it sees fit.

Solution 2

According to this, this operation clobbers EDX and ECX. You need to mark those registers as clobbered which is what the second one does. BTW, is this the link where you got the above code or did you find it elsewhere? It also shows a few other variaitions for timings as well which is pretty neat.

10,589

Author by

James

Updated on June 05, 2022

Comments

James almost 2 years
Disclaimer: Words cannot describe how much I detest AT&T style syntax

I have a problem that I hope is caused by register clobbering. If not, I have a much bigger problem.

The first version I used was
```
static unsigned long long rdtscp(void)
{
    unsigned int hi, lo;
    __asm__ __volatile__("rdtscp" : "=a"(lo), "=d"(hi));
    return (unsigned long long)lo | ((unsigned long long)hi << 32);
}
```
I notice there is no 'clobbering' stuff in this version. Whether or not this is a problem I don't know... I suppose it depends if the compiler inlines the function or not. Using this version causes me problems that aren't always reproducible.

The next version I found is
```
static unsigned long long rdtscp(void)
{
    unsigned long long tsc;
    __asm__ __volatile__(
        "rdtscp;"
        "shl $32, %%rdx;"
        "or %%rdx, %%rax"
        : "=a"(tsc)
        :
        : "%rcx", "%rdx");

    return tsc;
}
```
This is reassuringly unreadable and official looking, but like I said my issue isn't always reproducible so I'm merely trying to rule out one possible cause of my problem.

The reason I believe the first version is a problem is that it is overwriting a register that previously held a function parameter.

What's correct... version 1, or version 2, or both?
ughoavgfhw about 11 years

Wrong instruction. It's rdtscp, not rdtsc, and any output is known to be clobbered so it doesn't need to be listed. The problem is that rdtscp also destroys ecx, which version 2 marks as clobbered but version 1 does not.
Michael Dorgan about 11 years

Does this not also clobber ECX as well? If not, I'll just delete my answer and call your good.
amdn about 11 years

Yes, the "=c" specification tells the compiler that ECX will hold the output, which implies that it is clobbered
Michael Dorgan about 11 years

Leaving this here for the other SO link which may be useful - though the above answer is better.
James about 11 years

Thank you. The first version didn't mark ecx as being clobbered. This register initally held a parameter value, which was used in a conditional which, if failed, called std::terminate(). When this was clobbered the condition obviously was checking the wrong thing!
FrankH. about 11 years

Sometimes useful to have references as to where else this type of code is used ... hence, for example, have a look at lxr.free-electrons.com/source/arch/x86/include/asm/msr.h#L20‌5 and lxr.free-electrons.com/source/arch/x86/include/asm/msr.h#L42 (the opcode for rdtscp is the byte sequence given there).
nodakai over 9 years

If you don't need the value set to %ecx (which can be used identify CPU cores), you can simply use the clobbers list: __asm__ __volatile__("rdtscp" : "=a"(lo), "=d"(hi) : : "%ecx" );
haelix over 5 years

Concise explanation on why we need the function's aux parameter?