Why is this inline assembly not working with a separate asm volatile statement for each instruction?

c linux gcc assembly x86-64

10,804

Solution 1

You clobber memory but don't tell GCC about it, so GCC can cache values in buf across assembly calls. If you want to use inputs and outputs, tell GCC about everything.

__asm__ (
    "movq %1, 0(%0)\n\t"
    "movq %2, 8(%0)"
    :                                /* Outputs (none) */
    : "r"(buf), "r"(rrax), "r"(rrbx) /* Inputs */
    : "memory");                     /* Clobbered */

You also generally want to let GCC handle most of the mov, register selection, etc -- even if you explicitly constrain the registers (rrax is stil %rax) let the information flow through GCC or you will get unexpected results.

`volatile` is wrong.

The reason __volatile__ exists is so you can guarantee that the compiler places your code exactly where it is... which is a completely unnecessary guarantee for this code. It's necessary for implementing advanced features such as memory barriers, but almost completely worthless if you are only modifying memory and registers.

GCC already knows that it can't move this assembly after printf because the printf call accesses buf, and buf could be clobbered by the assembly. GCC already knows that it can't move the assembly before rrax=0x39; because rax is an input to the assembly code. So what does __volatile__ get you? Nothing.

If your code does not work without __volatile__ then there is an error in the code which should be fixed instead of just adding __volatile__ and hoping that makes everything better. The __volatile__ keyword is not magic and should not be treated as such.

Alternative fix:

Is __volatile__ necessary for your original code? No. Just mark the inputs and clobber values correctly.

/* The "S" constraint means %rsi, "b" means %rbx, and "a" means %rax
   The inputs and clobbered values are specified.  There is no output
   so that section is blank.  */
rsi = (long) buf;
__asm__ ("movq %%rax, 0(%%rsi)" : : "a"(rrax), "S"(rssi) : "memory");
__asm__ ("movq %%rbx, 0(%%rsi)" : : "b"(rrbx), "S"(rrsi) : "memory");

Why __volatile__ doesn't help you here:

rrax = 0x34; /* Dead code */

GCC is well within its rights to completely delete the above line, since the code in the question above claims that it never uses rrax.

A clearer example

long global;
void store_5(void)
{
    register long rax asm ("rax");
    rax = 5;
    __asm__ __volatile__ ("movq %%rax, (global)");
}

The disassembly is more or less as you expect it at -O0,

movl $5, %rax
movq %rax, (global)

But with optimization off, you can be fairly sloppy about assembly. Let's try -O2:

movq %rax, (global)

Whoops! Where did rax = 5; go? It's dead code, since %rax is never used in the function — at least as far as GCC knows. GCC doesn't peek inside assembly. What happens when we remove __volatile__?

; empty

Well, you might think __volatile__ is doing you a service by keeping GCC from discarding your precious assembly, but it's just masking the fact that GCC thinks your assembly isn't doing anything. GCC thinks your assembly takes no inputs, produces no outputs, and clobbers no memory. You had better straighten it out:

long global;
void store_5(void)
{
    register long rax asm ("rax");
    rax = 5;
    __asm__ __volatile__ ("movq %%rax, (global)" : : : "memory");
}

Now we get the following output:

movq %rax, (global)

Better. But if you tell GCC about the inputs, it will make sure that %rax is properly initialized first:

long global;
void store_5(void)
{
    register long rax asm ("rax");
    rax = 5;
    __asm__ ("movq %%rax, (global)" : : "a"(rax) : "memory");
}

The output, with optimizations:

movl $5, %eax
movq %rax, (global)

Correct! And we don't even need to use __volatile__.

Why does `volatile` exist?

The primary correct use for __volatile__ is if your assembly code does something else besides input, output, or clobbering memory. Perhaps it messes with special registers which GCC doesn't know about, or affects IO. You see it a lot in the Linux kernel, but it's misused very often in user space.

The __volatile__ keyword is very tempting because we C programmers often like to think we're almost programming in assembly language already. We're not. C compilers do a lot of data flow analysis — so you need to explain the data flow to the compiler for your assembly code. That way, the compiler can safely manipulate your chunk of assembly just like it manipulates the assembly that it generates.

If you find yourself using __volatile__ a lot, as an alternative you could write an entire function or module in an assembly file.

Solution 2

The compiler uses registers, and it may write over the values you have put in them.

In this case, the compiler probably uses the rbx register after the rrbx assignment and before the inline assembly section.

In general, you shouldn't expect registers to keep their values after and between inline assembly code sequences.

Solution 3

Slightly off-topic but I'd like to follow up a bit on gcc inline assembly.

The (non-)need for __volatile__ comes from the fact that GCC optimizes inline assembly. GCC inspects the assembly statement for side effects / prerequisites, and if it finds them not to exist it may choose to move the assembly instruction around or even decide to remove it. All __volatile__ does is to tell the compiler "stop caring and put this right there".

Which is usually not what you really want.

This is where the need for constraints come in. The name is overloaded and actually used for different things in GCC inline assembly:

constraints specify input / output operands used in the asm() block
constraints specify the "clobber list", which details what "state" (registers, condition codes, memory) are affected by the asm().
constraints specify classes of operands (registers, addresses, offsets, constants, ...)
constraints declare associations / bindings between assembler entities and C/C++ variables / expressions

In many cases, developers abuse __volatile__ because they noticed their code either being moved around or even disappearing without it. If this happens, it's usually rather a sign that the developer has attempted not to tell GCC about side effects / prerequisites of the assembly. For example, this buggy code:

register int foo __asm__("rax") = 1234;
register int bar __adm__("rbx") = 4321;

asm("add %rax, %rbx");
printf("I'm expecting 'bar' to be 5555 it is: %d\n", bar);

It's got several bugs:

for one, it only compiles due to a gcc bug (!). Normally, to write register names in inline assembly, double %% are needed, but in the above if you actually specify them you get a compiler/assembler error, /tmp/ccYPmr3g.s:22: Error: bad register name '%%rax'.
second, it's not telling the compiler when and where you need/use the variables. Instead, it assumes the compiler honours asm() literally. That might be true for Microsoft Visual C++ but is not the case for gcc.

If you compile it without optimization, it creates:

0000000000400524 <main>:
[ ... ]
  400534:       b8 d2 04 00 00          mov    $0x4d2,%eax
  400539:       bb e1 10 00 00          mov    $0x10e1,%ebx
  40053e:       48 01 c3                add    %rax,%rbx
  400541:       48 89 da                mov    %rbx,%rdx
  400544:       b8 5c 06 40 00          mov    $0x40065c,%eax
  400549:       48 89 d6                mov    %rdx,%rsi
  40054c:       48 89 c7                mov    %rax,%rdi
  40054f:       b8 00 00 00 00          mov    $0x0,%eax
  400554:       e8 d7 fe ff ff          callq  400430 <printf@plt>
[...]

You can find your add instruction, and the initializations of the two registers, and it'll print the expected. If, on the other hand, you crank optimization up, something else happens:

0000000000400530 <main>:
  400530:       48 83 ec 08             sub    $0x8,%rsp
  400534:       48 01 c3                add    %rax,%rbx
  400537:       be e1 10 00 00          mov    $0x10e1,%esi
  40053c:       bf 3c 06 40 00          mov    $0x40063c,%edi
  400541:       31 c0                   xor    %eax,%eax
  400543:       e8 e8 fe ff ff          callq  400430 <printf@plt>
[ ... ]

Your initializations of both the "used" registers are no longer there. The compiler discarded them because nothing it could see was using them, and while it kept the assembly instruction it put it before any use of the two variables. It's there but it does nothing (Luckily actually ... if rax / rbx had been in use who can tell what'd have happened ...).

And the reason for that is that you haven't actually told GCC that the assembly is using these registers / these operand values. This has nothing whatsoever to do with volatile but all with the fact you're using a constraint-free asm() expression.

The way to do this correctly is via constraints, i.e. you'd use:

int foo = 1234;
int bar = 4321;

asm("add %1, %0" : "+r"(bar) : "r"(foo));
printf("I'm expecting 'bar' to be 5555 it is: %d\n", bar);

This tells the compiler that the assembly:

has one argument in a register, "+r"(...) that both needs to be initialized before the assembly statement, and is modified by the assembly statement, and associate the variable bar with it.
has a second argument in a register, "r"(...) that needs to be initialized before the assembly statement and is treated as readonly / not modified by the statement. Here, associate foo with that.

Notice no register assignment is specified - the compiler chooses that depending on the variables / state of the compile. The (optimized) output of the above:

0000000000400530 <main>:
  400530:       48 83 ec 08             sub    $0x8,%rsp
  400534:       b8 d2 04 00 00          mov    $0x4d2,%eax
  400539:       be e1 10 00 00          mov    $0x10e1,%esi
  40053e:       bf 4c 06 40 00          mov    $0x40064c,%edi
  400543:       01 c6                   add    %eax,%esi
  400545:       31 c0                   xor    %eax,%eax
  400547:       e8 e4 fe ff ff          callq  400430 <printf@plt>
[ ... ]

GCC inline assembly constraints are almost always necessary in some form or the other, but there can be multiple possible ways of describing the same requirements to the compiler; instead of the above, you could also write:

asm("add %1, %0" : "=r"(bar) : "r"(foo), "0"(bar));

This tells gcc:

the statement has an output operand, the variable bar, that after the statement will be found in a register, "=r"(...)
the statement has an input operand, the variable foo, which is to be placed into a register, "r"(...)
operand zero is also an input operand and to be initialized with bar

Or, again an alternative:

asm("add %1, %0" : "+r"(bar) : "g"(foo));

which tells gcc:

bla (yawn - same as before, bar both input/output)
the statement has an input operand, the variable foo, which the statement doesn't care whether it's in a register, in memory or a compile-time constant (that's the "g"(...) constraint)

The result is different from the former:

0000000000400530 <main>:
  400530:       48 83 ec 08             sub    $0x8,%rsp
  400534:       bf 4c 06 40 00          mov    $0x40064c,%edi
  400539:       31 c0                   xor    %eax,%eax
  40053b:       be e1 10 00 00          mov    $0x10e1,%esi
  400540:       81 c6 d2 04 00 00       add    $0x4d2,%esi
  400546:       e8 e5 fe ff ff          callq  400430 <printf@plt>
[ ... ]

because now, GCC has actually figured out foo is a compile-time constant and simply embedded the value in the add instruction ! Isn't that neat ?

Admittedly, this is complex and takes getting used to. The advantage is that letting the compiler choose which registers to use for what operands allows optimizing the code overall; if, for example, an inline assembly statement is used in a macro and/or a static inline function, the compiler can, depending on the calling context, choose different registers at different instantiations of the code. Or if a certain value is compile-time evaluatable / constant in one place but not in another, the compiler can tailor the created assembly for it.

Think of GCC inline assembly constraints as kind of "extended function prototypes" - they tell the compiler what types and locations for arguments / return values are, plus a bit more. If you don't specify these constraints, your inline assembly is creating the analogue of functions that operate on global variables/state only - which, as we probably all agree, are rarely ever doing exactly what you intended.

10,804

Author by

MetallicPriest

Updated on June 09, 2022

Comments

MetallicPriest almost 2 years

For the the following code:

long buf[64];

register long rrax asm ("rax");
register long rrbx asm ("rbx");
register long rrsi asm ("rsi");

rrax = 0x34;
rrbx = 0x39;

__asm__ __volatile__ ("movq $buf,%rsi");
__asm__ __volatile__ ("movq %rax, 0(%rsi);");
__asm__ __volatile__ ("movq %rbx, 8(%rsi);");

printf( "buf[0] = %lx, buf[1] = %lx!\n", buf[0], buf[1] );

I get the following output:

buf[0] = 0, buf[1] = 346161cbc0!

while it should have been:

buf[0] = 34, buf[1] = 39!

Any ideas why it is not working properly, and how to solve it?

MetallicPriest over 12 years

But I use rsi as a pointer to buf before printf. It doesnt matter, if printf uses it or not. buf[0] and buf[1] should have right values anyway, no? Even if I remove the rrsi from the printf, It still prints the same erroneous values.
Dietrich Epp over 12 years

@ugoren: If you make GCC assign registers using the asm keyword, it will properly spill them and reload them so they are saved across function calls.
MetallicPriest over 12 years

volatile in asm is to tell the compiler to place the code exactly where it is placed. Its not like volatile for variables.
Dietrich Epp over 12 years

@MetallicPriest: Yes, that is exactly what volatile is for, and that's why it's not necessary here. If you don't understand that, then read the GCC inline assembly HOWTO from start to finish because it does not help to skip over chunks.
ugoren over 12 years

@MetallicPriest, you're right about the specific case. I should edit my answer, but I'm having technical trouble. But the general idea is as I wrote.
ugoren over 12 years

@DietrichEpp, the compiler is in charge of saving the registers it uses. If inline assembly changes registers, it knows nothing about it and won't take care to save them.
Dietrich Epp over 12 years

@ugoren: If assembly changes registers, then it's your responsibility to put those registers in the clobbered section of the inline assembly. GCC will correctly spill across __asm__ statements. Without this, you'd never really be able to use __asm__ at all.
FrankH. over 12 years

I'd up your answer if not for the boldfaced "__volatile__ is wrong"; that's because the assembly as stated by the original poster actually did need it. It's wrong for quite a few other reasons (as you noted as well, the missing clobber). Nonetheless, separate asm() statements, if one insists on using them (rarely ever a good idea), require it to force ordering. Yes, call me nitpicky if you like ;-)
Dietrich Epp over 12 years

@FrankH.: No, the assembly as stated by the original poster did not need __volatile__, it just needed the correct input/output operands marked. The __volatile__ keyword is ill-equipped to solve single-processor data flow issues. You shouldn't use __volatile__ to replace proper input/output operands.
ugoren over 12 years

@DietrichEpp, Using the clobbered section is indeed important. But for the opposite reason. If you have inline assembly between 2 C statements, defining clobber will prevent assembly from corrupting registers used by C. But if you have 2 inline assembly sections, with a C statement in between, the second section may find registers have changed.
Dietrich Epp over 12 years

@FrankH.: I've updated the answer with a kind of "walkthrough" for why __volatile__ was never really necessary in the first place.
Dietrich Epp over 12 years

@ugoren: Yes, that is exactly the reason I gave why clobber is important (so GCC can spill registers across __asm__). I did not say that clobber would prevent C from corrupting registers used by asm. To do that, you need to use input and output operands.
ugoren over 12 years

@DietrichEpp, I think we both understand inline assembly, but maybe not what MetallicPriest is trying to do. I think he wants a register to keep its value between two assembly sections (or between assignment to a register-defined variable and an assembly section), and clobber doesn't handle this.
Dietrich Epp over 12 years

@ugoren: Yes, I completely agree. Clobber doesn't do that. You must use the input and output parameters to the assembly section do that. Assembly sections take three kinds of operands: input, output, and clobber. You must correctly specify all three categories or you are in danger of GCC incorrectly transforming your code. Using __volatile__ as a substitute for this is the road to ruin because you will have to extend the volatile sections outwards until they span the entire function, at which point your assembly is no longer "inline assembly" but just regular assembly.
FrankH. over 12 years

@Dietrich Epp: I agree the solution is specifying proper/sufficient constraints. __volatile__ in inline assembly is one of those "you got a hammer and screws become nails" things. It does what it says not what you may mean ;-)
Seng Cheong over 8 years

@DietrichEpp You are correct, I was clearly not thinking. Removed the noise.