Calling printf in x86_64 using GNU assembler

linux gcc assembly x86-64 gnu-assembler

42,340

Solution 1

There are a number of issues with this code. The AMD64 System V ABI calling convention used by Linux requires a few things. It requires that just before a CALL that the stack be at least 16-byte (or 32-byte) aligned:

The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary.

After the C runtime calls your main function the stack is misaligned by 8 because the return pointer was placed on the stack by CALL. To realign to 16-byte boundary you can simply PUSH any general purpose register onto the stack and POP it off at the end.

The calling convention also requires that AL contain the number of vector registers used for a variable argument function:

%al is used to indicate the number of vector arguments passed to a function requiring a variable number of arguments

printf is a variable argument function, so AL needs to be set. In this case you don't pass any parameters in a vector register so you can set AL to 0.

You also dereference the $format pointer when it is already an address. So this is wrong:

mov  $format, %rbx
mov  (%rbx), %rdi

This takes the address of format and places it in RBX. Then you take the 8 bytes at that address in RBX and place them in RDI. RDI needs to be a pointer to a string of characters, not the characters themselves. The two lines could be replaced with:

lea  format(%rip), %rdi

This uses RIP Relative Addressing.

You should also NUL terminate your strings. Rather than use .ascii you can use .asciz on the x86 platform.

A working version of your program could look like:

# global data  #
    .data
format: .asciz "%d\n"
.text
    .global main
main:
  push %rbx
  lea  format(%rip), %rdi
  mov  $1, %esi           # Writing to ESI zero extends to RSI.
  xor %eax, %eax          # Zeroing EAX is efficient way to clear AL.
  call printf
  pop %rbx
  ret

Other Recommendations/Suggestions

You should also be aware from the 64-bit Linux ABI, that the calling convention also requires functions you write to honor the preservation of certain registers. The list of registers and whether they should preserved is as follows:

Any register that says Yes in the Preserved across function calls column are ones you must ensure are preserved across your function. Function main is like any other C function.

If you have strings/data that you know will be read only you can place them in the .rodata section with .section .rodata rather than .data

In 64-bit mode: if you have a destination operand that is a 32-bit register, the CPU will zero extend the register across the entire 64-bit register. This can save bytes on the instruction encoding.

It is possible your executable is being compiled as position independent code. You may receive an error similar to:

relocation R_X86_64_PC32 against symbol `printf@@GLIBC_2.2.5' can not be used when making a shared object; recompile with -fPIC

To fix this you'll have to call the external function printf this way:

call printf@plt

This calls the external library function via the Procedure Linkage Table (PLT)

Solution 2

You can look at assembly code generated from an equivalent c file.
Running gcc -o - -S -fno-asynchronous-unwind-tables test.c with test.c

#include <stdio.h>
int main() {
   return printf("%d\n", 1);
}

This output the assembly code:

        .file   "test.c"
        .section        .rodata
.LC0:
        .string "%d\n"
        .text
        .globl  main
        .type   main, @function
main:
        pushq   %rbp
        movq    %rsp, %rbp
        movl    $1, %esi
        movl    $.LC0, %edi
        movl    $0, %eax
        call    printf
        popq    %rbp
        ret
        .size   main, .-main
        .ident  "GCC: (GNU) 6.1.1 20160602"
        .section        .note.GNU-stack,"",@progbits

This give you a sample of an assembly code calling printf that you can then modify.

Comparing with your code, you should modify 2 things:

%rdi should point to the format, you should not unreferenced %rbx, this could be done with mov $format, %rdi
printf has a variable number of arguments, then you should add mov $0, %eax

Applying these modifications will give something like :

    .data
format: .ascii "%d\n"  
.text
    .global main  
main:
  mov  $format, %rdi
  mov  $1, %rsi
  mov  $0, %eax
  call printf
  ret

And then running it print :

1

42,340

Author by

L's World

Updated on July 10, 2022

Comments

L's World almost 2 years
I've written a program using AT&T syntax for use with GNU assembler:
```
            .data
format:   .ascii "%d\n"  

            .text
            .global main  
main:
            mov    $format, %rbx
            mov    (%rbx), %rdi
            mov    $1, %rsi
            call     printf
            ret
```
I use GCC to assemble and link with:

gcc -o main main.s

I run it with this command:

./main

When I run the program I get a seg fault. By using gdb, it says printf not found. I have tried ".extern printf", which does not work. Someone suggested I should store the stack pointer before calling printf and restore before RET, How do I do that?
L's World almost 8 years

I have reorganized my code and it works. I thought it was a problem of import printf, which is not necessary. Thanks so much.
Michael Petch almost 8 years

In this particular code example since there is only one call and it appears at the end one can JMP to printf rather than CALL and eliminate the stack alignment with the PUSH/POP. That was outside the scope of my answer but one can always look at literature on TAIL CALL optimizations
Michael Petch almost 8 years

Your modified code doesn't 16-byte align the stack before the call toprintf. It may work in many scenarios but not all. Pushing any 64-bit register after your function main starts and restoring it at the end would keep things aligned. The 64-bit Linux ABI requires a minimum 16-byte alignment (32-byte aligned if passing 256 bit vectors to a function). At the point just before a function call the stack needs 16(or 32) byte alignment. After the CALL instruction transfers control to a function (main is like other C function)the return address is placed on the stack misaligning it by 8.
mpromonet almost 8 years

@MichaelPetch: I tried to give a working code with minimal modification, otherwise the gcc generated assembly is better.
Michael Petch almost 8 years

Unfortunately the 16-byte alignment is lucky to be working code in this case. I would suspect that on an optimization level like -O2 or higher it would actually remove the PUSH/POP/RET and then do a tail call JMP to printf. In that case alignment is still maintained (without the extra PUSH/POP) since JMP doesn't place a return address on the stack like a CALL.
Peter Cordes almost 8 years

Michael is correct: gcc emits optimal code if you ask it to optimize (by using -O3): godbolt.org/g/sX5yCe. It uses a jmp for the tail-call so the stack alignment stays the same as on entry to main. It also uses xor to zero %al, instead of a less-efficient mov. And of course it puts the string constant in .rodata, not .data. Using compiler output as a starting-point for optimization is a good plan, but only if you start with -O2 or -O3 output! Otherwise you might do worse than the compiler.
Nick Desaulniers about 7 years

Excellent recommendations. Future travelers, see also: nickdesaulniers.github.io/blog/2014/04/18/…
Peter Cordes almost 6 years

The changelog message from my edit contains the important points. . The ABI doc link is broken. The other changes are optional, because xor %eax,%eax is the best way to set AL or RAX to zero (so it's not harmful to say that variadic functions look at %rax instead of %al), and the rest were just extra details / comments I made since an edit was needed anyway to fix the ABI link.