Which inline assembly code is correct for rdtscp?

10,589

Solution 1

Here's C++ code that will return the TSC and store the auxiliary 32-bits into the reference parameter

static inline uint64_t rdtscp( uint32_t & aux )
{
    uint64_t rax,rdx;
    asm volatile ( "rdtscp\n" : "=a" (rax), "=d" (rdx), "=c" (aux) : : );
    return (rdx << 32) + rax;
}

It is better to do the shift and add to merge both 32-bit halves in C++ statement rather than inline, this allows the compiler to schedule those instructions as it sees fit.

Solution 2

According to this, this operation clobbers EDX and ECX. You need to mark those registers as clobbered which is what the second one does. BTW, is this the link where you got the above code or did you find it elsewhere? It also shows a few other variaitions for timings as well which is pretty neat.

Share:
10,589
James
Author by

James

Updated on June 05, 2022

Comments

  • James
    James almost 2 years

    Disclaimer: Words cannot describe how much I detest AT&T style syntax

    I have a problem that I hope is caused by register clobbering. If not, I have a much bigger problem.

    The first version I used was

    static unsigned long long rdtscp(void)
    {
        unsigned int hi, lo;
        __asm__ __volatile__("rdtscp" : "=a"(lo), "=d"(hi));
        return (unsigned long long)lo | ((unsigned long long)hi << 32);
    }
    

    I notice there is no 'clobbering' stuff in this version. Whether or not this is a problem I don't know... I suppose it depends if the compiler inlines the function or not. Using this version causes me problems that aren't always reproducible.

    The next version I found is

    static unsigned long long rdtscp(void)
    {
        unsigned long long tsc;
        __asm__ __volatile__(
            "rdtscp;"
            "shl $32, %%rdx;"
            "or %%rdx, %%rax"
            : "=a"(tsc)
            :
            : "%rcx", "%rdx");
    
        return tsc;
    }
    

    This is reassuringly unreadable and official looking, but like I said my issue isn't always reproducible so I'm merely trying to rule out one possible cause of my problem.

    The reason I believe the first version is a problem is that it is overwriting a register that previously held a function parameter.

    What's correct... version 1, or version 2, or both?

  • ughoavgfhw
    ughoavgfhw about 11 years
    Wrong instruction. It's rdtscp, not rdtsc, and any output is known to be clobbered so it doesn't need to be listed. The problem is that rdtscp also destroys ecx, which version 2 marks as clobbered but version 1 does not.
  • Michael Dorgan
    Michael Dorgan about 11 years
    Does this not also clobber ECX as well? If not, I'll just delete my answer and call your good.
  • amdn
    amdn about 11 years
    Yes, the "=c" specification tells the compiler that ECX will hold the output, which implies that it is clobbered
  • Michael Dorgan
    Michael Dorgan about 11 years
    Leaving this here for the other SO link which may be useful - though the above answer is better.
  • James
    James about 11 years
    Thank you. The first version didn't mark ecx as being clobbered. This register initally held a parameter value, which was used in a conditional which, if failed, called std::terminate(). When this was clobbered the condition obviously was checking the wrong thing!
  • FrankH.
    FrankH. about 11 years
    Sometimes useful to have references as to where else this type of code is used ... hence, for example, have a look at lxr.free-electrons.com/source/arch/x86/include/asm/msr.h#L20‌​5 and lxr.free-electrons.com/source/arch/x86/include/asm/msr.h#L42 (the opcode for rdtscp is the byte sequence given there).
  • nodakai
    nodakai over 9 years
    If you don't need the value set to %ecx (which can be used identify CPU cores), you can simply use the clobbers list: __asm__ __volatile__("rdtscp" : "=a"(lo), "=d"(hi) : : "%ecx" );
  • haelix
    haelix over 5 years
    Concise explanation on why we need the function's aux parameter?