x86 Linux assembler get program parameters from _start

14,571

On Linux, the familiar argc and argv variables from C are always passed on the stack by the kernel, available even to assembly programs that are completely standalone and don't link with the startup code in the C library. This is documented in the i386 System V ABI, along with other details of the process startup environment (register values, stack alignment).

At the ELF entry point (a.k.a. _start) of an x86 Linux executable:

  1. ESP points to argc
  2. ESP + 4 points to argv[0], the start of the array. i.e. the value you should pass to main as char **argv is lea eax, [esp+4], not mov eax, [esp+4])

How a Minimal Assembly Program Obtains argc and argv

I'll show how to read argv and argc[0] in GDB.

cmdline-x86.S

#include <sys/syscall.h>

    .global _start
_start:
    /* Cause a breakpoint trap */
    int $0x03

    /* exit_group(0) */
    mov $SYS_exit_group, %eax
    mov $0, %ebx
    int $0x80

cmdline-x86.gdb

set confirm off
file cmdline-x86
run
# We'll regain control here after the breakpoint trap
printf "argc: %d\n", *(int*)$esp
printf "argv[0]: %s\n",  ((char**)($esp + 4))[0]
quit

Sample Session

$ cc -nostdlib -g3 -m32 cmdline-x86.S -o cmdline-x86
$ gdb -q -x cmdline-x86.gdb cmdline-x86
<...>  
Program received signal SIGTRAP, Trace/breakpoint trap.
_start () at cmdline-x86.S:8
8   mov $SYS_exit_group, %eax
argc: 1
argv[0]: /home/scottt/Dropbox/stackoverflow/cmdline-x86

Explanation

  • I placed a software breakpoint (int $0x03) to cause the program to trap back into the debugger right after the ELF entry point (_start).
  • I then used printf in the GDB script to print
    1. argc with the expression *(int*)$esp
    2. argv with the expression ((char**)($esp + 4))[0]

x86-64 version

The differences are minimal:

  • Replace ESP with RSP
  • Change address size from 4 to 8
  • Conform to different Linux syscall calling conventions when we call exit_group(0) to properly terminate the process

cmdline.S

#include <sys/syscall.h>

    .global _start
_start:
    /* Cause a breakpoint trap */
    int $0x03

    /* exit_group(0) */
    mov $SYS_exit_group, %rax
    mov $0, %rdi
    syscall

cmdline.gdb

set confirm off
file cmdline
run
printf "argc: %d\n", *(int*)$rsp
printf "argv[0]: %s\n",  ((char**)($rsp + 8))[0]
quit

How Regular C Programs Obtain argc and argv

You can disassemble _start from a regular C program to see how it obtains argc and argv from the stack and passes them as it calls __libc_start_main. Using the /bin/true program on my x86-64 machine as an example:

$ gdb -q /bin/true
Reading symbols from /usr/bin/true...Reading symbols from /usr/lib/debug/usr/bin/true.debug...done.
done.
(gdb) disassemble _start
Dump of assembler code for function _start:
   0x0000000000401580 <+0>: xor    %ebp,%ebp
   0x0000000000401582 <+2>: mov    %rdx,%r9
   0x0000000000401585 <+5>: pop    %rsi
   0x0000000000401586 <+6>: mov    %rsp,%rdx
   0x0000000000401589 <+9>: and    $0xfffffffffffffff0,%rsp
   0x000000000040158d <+13>:    push   %rax
   0x000000000040158e <+14>:    push   %rsp
   0x000000000040158f <+15>:    mov    $0x404040,%r8
   0x0000000000401596 <+22>:    mov    $0x403fb0,%rcx
   0x000000000040159d <+29>:    mov    $0x4014c0,%rdi
   0x00000000004015a4 <+36>:    callq  0x401310 <__libc_start_main@plt>
   0x00000000004015a9 <+41>:    hlt    
   0x00000000004015aa <+42>:    xchg   %ax,%ax
   0x00000000004015ac <+44>:    nopl   0x0(%rax)

The first three arguments to __libc_start_main() are:

  1. RDI: pointer to main()
  2. RSI: argc, you can see how it was the first thing popped off the stack
  3. RDX: argv, the value of RSP right after argc was popped. (ubp_av in the GLIBC source)

The x86 _start is very similar:

Dump of assembler code for function _start:
   0x0804842c <+0>: xor    %ebp,%ebp
   0x0804842e <+2>: pop    %esi
   0x0804842f <+3>: mov    %esp,%ecx
   0x08048431 <+5>: and    $0xfffffff0,%esp
   0x08048434 <+8>: push   %eax
   0x08048435 <+9>: push   %esp
   0x08048436 <+10>:    push   %edx
   0x08048437 <+11>:    push   $0x80485e0
   0x0804843c <+16>:    push   $0x8048570
   0x08048441 <+21>:    push   %ecx
   0x08048442 <+22>:    push   %esi
   0x08048443 <+23>:    push   $0x80483d0
   0x08048448 <+28>:    call   0x80483b0 <__libc_start_main@plt>
   0x0804844d <+33>:    hlt    
   0x0804844e <+34>:    xchg   %ax,%ax
End of assembler dump.
Share:
14,571
Lefsler
Author by

Lefsler

Updated on June 04, 2022

Comments

  • Lefsler
    Lefsler almost 2 years

    I'm trying to create a program to just write the param on the screen. I created some programs to get the C function parameter, or i used C to send the parameter to my asm program. Is there a way to get the program parameter using only assembler

    EX:

    ./Program "text"
    

    I'm using as (Gnu Assembler)

    Usually i get those parameters using

    [esp+4]
    

    Because the esp is the program/function call pointer, but in pure asm it don't get the command line parameter.

    Is there a way to do that?

    I googled it, but i wans't able to find much information

  • Lefsler
    Lefsler almost 11 years
    One question: How can i do dword [ebp + 4 * ebx] in gas?
  • Peter Cordes
    Peter Cordes over 7 years
    (%ebp, %ebx, 4) actually. But you can always just assemble with nasm -felf32 and disassemble with objdump -d to see how something is written in AT&T syntax.
  • Peter Cordes
    Peter Cordes over 7 years
    Also a good idea: link to the ABI standard which specifies the initial process environment (i.e. what's in registers and memory before the first instruction of _start runs). github.com/hjl-tools/x86-psABI/wiki/X86-psABI currently links to revision 252 of the x86-64 SystemV ABI.