Running 32 bit assembly code on a 64 bit Linux & 64 bit Processor : Explain the anomaly

12,695

Solution 1

Remember that everything by default on a 64-bit OS tends to assume 64-bit. You need to make sure that you are (a) using the 32-bit versions of your #includes where appropriate (b) linking with 32-bit libraries and (c) building a 32-bit executable. It would probably help if you showed the contents of your makefile if you have one, or else the commands that you are using to build this example.

FWIW I changed your code slightly (_start -> main):

#include <asm/unistd.h>
#include <syscall.h>
#define STDOUT 1

    .data
hellostr:
    .ascii "hello wolrd\n" ;
helloend:

    .text
    .globl main

main:
    movl $(SYS_write) , %eax  //ssize_t write(int fd, const void *buf, size_t count);
    movl $(STDOUT) , %ebx
    movl $hellostr , %ecx
    movl $(helloend-hellostr) , %edx
    int $0x80

    movl $(SYS_exit), %eax //void _exit(int status);
    xorl %ebx, %ebx
    int $0x80

    ret

and built it like this:

$ gcc -Wall test.S -m32 -o test

verfied that we have a 32-bit executable:

$ file test
test: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.4, dynamically linked (uses shared libs), not stripped

and it appears to run OK:

$ ./test
hello wolrd

Solution 2

As noted by Paul, if you want to build 32-bit binaries on a 64-bit system, you need to use the -m32 flag, which may not be available by default on your installation (some 64-bit Linux distros don't include 32-bit compiler/linker/lib support by default).

On the other hand, you could instead build your code as 64-bit, in which case you need to use the 64-bit calling conventions. In that case, the system call number goes in %rax, and the arguments go in %rdi, %rsi, and %rdx

Edit

Best place I've found for this is www.x86-64.org, specifically abi.pdf

Solution 3

64-bit CPUs can run 32-bit code, but they have to use a special mode to do it. Those instructions are all valid in 64-bit mode, so nothing stopped you from building a 64-bit executable.

Your code builds and runs correctly with gcc -m32 -nostdlib hello.S. That's because -m32 defines __i386, so /usr/include/asm/unistd.h includes <asm/unistd_32.h>, which has the right constants for the int $0x80 ABI.

See also Assembling 32-bit binaries on a 64-bit system (GNU toolchain) for more about _start vs. main with/without libc and static vs. dynamic executables.

$ file a.out 
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, BuildID[sha1]=973fd6a0b7fa15b2d95420c7a96e454641c31b24, not stripped

$ strace ./a.out  > /dev/null
execve("./a.out", ["./a.out"], 0x7ffd43582110 /* 64 vars */) = 0
strace: [ Process PID=2773 runs in 32 bit mode. ]
write(1, "hello wolrd\n", 12)           = 12
exit(0)                                 = ?
+++ exited with 0 +++

Technically, if you'd used the right call numbers, your code would happen to work from 64-bit mode as well: What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code? But int 0x80 is not recommended in 64-bit code. (Actually, it's never recommended. For efficiency, 32-bit code should call through the kernel's exported VDSO page so it can use sysenter for fast system calls on CPUs that support it).


But that doesn't answer my my questions. What exactly is happening in this case?

Good question.

On Linux, int $0x80 with eax=1 is sys_exit(ebx), regardless of what mode the calling process was in. The 32-bit ABI is available in 64-bit mode (unless your kernel was compiled without i386 ABI support), but don't use it. Your exit status is from movl $(STDOUT), %ebx.

(BTW, there's a STDOUT_FILENO macro defined in unistd.h, but you can't #include <unistd.h> from a .S because it also contains C prototypes which aren't valid asm syntax.)

Notice that __NR_exit from unistd_32.h and __NR_write from unistd_64.h are both 1, so your first int $0x80 exits your process. You're using the wrong system call numbers for the ABI you're invoking.


strace is decoding it incorrectly, as if you'd invoked syscall (because that's the ABI a 64-bit process is expected to use). What are the calling conventions for UNIX & Linux system calls on x86-64

eax=1 / syscall means write(rd=edi, buf=rsi, len=rdx), and this is how strace is incorrectly decoding your int $0x80.

rdi and rsi are 0 (aka NULL) on entry to _start, and your code sets rdx=12 with movl $(helloend-hellostr) , %edx.

Linux initializes registers to zero in a fresh process after execve. (The ABI says undefined, Linux chooses zero to avoid info leaks). In your statically-linked executable, _start is the first user-space code that runs. (In a dynamic executable, the dynamic linker runs before _start, and does leave garbage in registers).

See also the tag wiki for more asm links.

Share:
12,695
claws
Author by

claws

Updated on June 05, 2022

Comments

  • claws
    claws almost 2 years

    I'm in an interesting problem.I forgot I'm using 64bit machine & OS and wrote a 32 bit assembly code. I don't know how to write 64 bit code.

    This is the x86 32-bit assembly code for Gnu Assembler (AT&T syntax) on Linux.

    //hello.S
    #include <asm/unistd.h>
    #include <syscall.h>
    #define STDOUT 1
    
    .data
    hellostr:
        .ascii "hello wolrd\n";
    helloend:
    
    .text
    .globl _start
    
    _start:
        movl $(SYS_write) , %eax  //ssize_t write(int fd, const void *buf, size_t count);
        movl $(STDOUT) , %ebx
        movl $hellostr , %ecx
        movl $(helloend-hellostr) , %edx
        int $0x80
    
        movl $(SYS_exit), %eax //void _exit(int status);
        xorl %ebx, %ebx
        int $0x80
    
        ret
    

    Now, This code should run fine on a 32bit processor & 32 bit OS right? As we know 64 bit processors are backward compatible with 32 bit processors. So, that also wouldn't be a problem. The problem arises because of differences in system calls & call mechanism in 64-bit OS & 32-bit OS. I don't know why but they changed the system call numbers between 32-bit linux & 64-bit linux.

    asm/unistd_32.h defines:

    #define __NR_write        4
    #define __NR_exit         1
    

    asm/unistd_64.h defines:

    #define __NR_write              1
    #define __NR_exit               60
    

    Anyway using Macros instead of direct numbers is paid off. Its ensuring correct system call numbers.

    when I assemble & link & run the program.

    $cpp hello.S hello.s //pre-processor
    $as hello.s -o hello.o //assemble
    $ld hello.o // linker : converting relocatable to executable
    

    Its not printing helloworld.

    In gdb its showing:

    • Program exited with code 01.

    I don't know how to debug in gdb. using tutorial I tried to debug it and execute instruction by instruction checking registers at each step. its always showing me "program exited with 01". It would be great if some on could show me how to debug this.

    (gdb) break _start
    Note: breakpoint -10 also set at pc 0x4000b0.
    Breakpoint 8 at 0x4000b0
    (gdb) start
    Function "main" not defined.
    Make breakpoint pending on future shared library load? (y or [n]) y
    Temporary breakpoint 9 (main) pending.
    Starting program: /home/claws/helloworld 
    
    Program exited with code 01.
    (gdb) info breakpoints 
    Num     Type           Disp Enb Address            What
    8       breakpoint     keep y   0x00000000004000b0 <_start>
    9       breakpoint     del  y   <PENDING>          main
    

    I tried running strace. This is its output:

    execve("./helloworld", ["./helloworld"], [/* 39 vars */]) = 0
    write(0, NULL, 12 <unfinished ... exit status 1>
    
    1. Explain the parameters of write(0, NULL, 12) system call in the output of strace?
    2. What exactly is happening? I want to know the reason why exactly its exiting with exitstatus=1?
    3. Can some one please show me how to debug this program using gdb?
    4. Why did they change the system call numbers?
    5. Kindly change this program appropriately so that it can run correctly on this machine.

    EDIT:

    After reading Paul R's answer. I checked my files

    claws@claws-desktop:~$ file ./hello.o 
    ./hello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
    
    claws@claws-desktop:~$ file ./hello
    ./hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, not stripped
    

    I agree with him that these should be ELF 32-bit relocatable & executable. But that doesn't answer my my questions. All of my questions still questions. What exactly is happening in this case? Can someone please answer my questions and provide an x86-64 version of this code?