What is the meaning of each line of the assembly output of a C hello world?

linux gcc assembly x86

21,401

Solution 1

Here how it goes:

        .file   "test.c"

The original source file name (used by debuggers).

        .section        .rodata
.LC0:
        .string "Hello world!"

A zero-terminated string is included in the section ".rodata" ("ro" means "read-only": the application will be able to read the data, but any attempt at writing into it will trigger an exception).

        .text

Now we write things into the ".text" section, which is where code goes.

.globl main
        .type   main, @function
main:

We define a function called "main" and globally visible (other object files will be able to invoke it).

        leal    4(%esp), %ecx

We store in register %ecx the value 4+%esp (%esp is the stack pointer).

        andl    $-16, %esp

%esp is slightly modified so that it becomes a multiple of 16. For some data types (the floating-point format corresponding to C's double and long double), performance is better when the memory accesses are at addresses which are multiple of 16. This is not really needed here, but when used without the optimization flag (-O2...), the compiler tends to produce quite a lot of generic useless code (i.e. code which could be useful in some cases but not here).

        pushl   -4(%ecx)

This one is a bit weird: at that point, the word at address -4(%ecx) is the word which was on top of the stack prior to the andl. The code retrieves that word (which should be the return address, by the way) and pushes it again. This kind of emulates what would be obtained with a call from a function which had a 16-byte aligned stack. My guess is that this push is a remnant of an argument-copying sequence. Since the function has adjusted the stack pointer, it must copy the function arguments, which were accessible through the old value of the stack pointer. Here, there is no argument, except the function return address. Note that this word will not be used (yet again, this is code without optimization).

        pushl   %ebp
        movl    %esp, %ebp

This is the standard function prologue: we save %ebp (since we are about to modify it), then set %ebp to point to the stack frame. Thereafter, %ebp will be used to access the function arguments, making %esp free again. (Yes, there is no argument, so this is useless for that function.)

        pushl   %ecx

We save %ecx (we will need it at function exit, to restore %esp at the value it had before the andl).

        subl    $20, %esp

We reserve 32 bytes on the stack (remember that the stack grows "down"). That space will be used to storea the arguments to printf() (that's overkill, since there is a single argument, which will use 4 bytes [that's a pointer]).

        movl    $.LC0, (%esp)
        call    printf

We "push" the argument to printf() (i.e. we make sure that %esp points to a word which contains the argument, here $.LC0, which is the address of the constant string in the rodata section). Then we call printf().

        addl    $20, %esp

When printf() returns, we remove the space allocated for the arguments. This addl cancels what the subl above did.

        popl    %ecx

We recover %ecx (pushed above); printf() may have modified it (the call conventions describe which register can a function modify without restoring them upon exit; %ecx is one such register).

        popl    %ebp

Function epilogue: this restores %ebp (corresponding to the pushl %ebp above).

        leal    -4(%ecx), %esp

We restore %esp to its initial value. The effect of this opcode is to store in %esp the value %ecx-4. %ecx was set in the first function opcode. This cancels any alteration to %esp, including the andl.

ret

Function exit.

        .size   main, .-main

This sets the size of the main() function: at any point during assembly, "." is an alias for "the address at which we are adding things right now". If another instruction was added here, it would go at the address specified by ".". Thus, ".-main", here, is the exact size of the code of the function main(). The .size directive instructs the assembler to write that information in the object file.

        .ident  "GCC: (GNU) 4.3.0 20080428 (Red Hat 4.3.0-8)"

GCC just loves to leave traces of its action. This string ends up as a kind of comment in the object file. The linker will remove it.

        .section        .note.GNU-stack,"",@progbits

A special section where GCC writes that the code can accommodate a non-executable stack. This is the normal case. Executable stacks are needed for some special usages (not standard C). On modern processors, the kernel can make a non-executable stack (a stack which triggers an exception if someone tries to execute as code some data which is on the stack); this is viewed by some people as a "security feature" because putting code on the stack is a common way to exploit buffer overflows. With this section, the executable will be marked as "compatible with a non-executable stack" which the kernel will happily provide as such.

Solution 2

Here is some supplement to @Thomas Pornin's answer.

.LC0 local constant, e.g string literal.
.LFB0 local function beginning,
.LFE0 local function ending,

The suffix of these label is a number, and start from 0.

This is gcc assembler convention.

Solution 3

    leal    4(%esp), %ecx
    andl    $-16, %esp
    pushl   -4(%ecx)
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %ecx
    subl    $20, %esp

these instructions don't compare in your c program, they're always executed at the beginning of every function (but it depends on compiler/platform)

    movl    $.LC0, (%esp)
    call    printf

this block corresponds to your printf() call. the first instruction places on the stack its argument (a pointer to "hello world") then calls the function.

    addl    $20, %esp
    popl    %ecx
    popl    %ebp
    leal    -4(%ecx), %esp
    ret

these instructions are opposite to the first block, they're some sort of stack manipulation stuffs. always executed too

21,401

Author by

Admin

Updated on September 19, 2020

Comments

Admin over 3 years

I ran gcc -S over this:

int main()
{
printf ("Hello world!");
}

and I got this assembly code:

        .file   "test.c"
        .section        .rodata
.LC0:
        .string "Hello world!"
        .text
.globl main
        .type   main, @function
main:
        leal    4(%esp), %ecx
        andl    $-16, %esp
        pushl   -4(%ecx)
        pushl   %ebp
        movl    %esp, %ebp
        pushl   %ecx
        subl    $20, %esp
        movl    $.LC0, (%esp)
        call    printf
        addl    $20, %esp
        popl    %ecx
        popl    %ebp
        leal    -4(%ecx), %esp
        ret
        .size   main, .-main
        .ident  "GCC: (GNU) 4.3.0 20080428 (Red Hat 4.3.0-8)"
        .section        .note.GNU-stack,"",@progbits

I am curious to understand this output. Can someone share some pointers in understanding this output, or if someone could mark comments against each of these lines/group of lines explaining what it does it would be great.

Ragtime over 3 years

> subl $20, %esp > We reserve 32 bytes on the stack (remember that the stack grows "down"). Shouldn't it be 20 bytes instead of 32?
Karambit about 2 years

@Ragtime No because $20 is in written in base 16 (20 in base 16 is 32).