How to invoke a system call via syscall or sysenter in inline assembly?

linux gcc x86 system-calls inline-assembly

22,403

Solution 1

First of all, you can't safely use GNU C Basic asm(""); syntax for this (without input/output/clobber constraints). You need Extended asm to tell the compiler about registers you modify. See the inline asm in the GNU C manual and the inline-assembly tag wiki for links to other guides for details on what things like "D"(1) means as part of an asm() statement.

You also need asm volatile because that's not implicit for Extended asm statements with 1 or more output operands.

I'm going to show you how to execute system calls by writing a program that writes Hello World! to standard output by using the write() system call. Here's the source of the program without an implementation of the actual system call :

#include <sys/types.h>

ssize_t my_write(int fd, const void *buf, size_t size);

int main(void)
{
    const char hello[] = "Hello world!\n";
    my_write(1, hello, sizeof(hello));
    return 0;
}

You can see that I named my custom system call function as my_write in order to avoid name clashes with the "normal" write, provided by libc. The rest of this answer contains the source of my_write for i386 and amd64.

i386

System calls in i386 Linux are implemented using the 128th interrupt vector, e.g. by calling int 0x80 in your assembly code, having set the parameters accordingly beforehand, of course. It is possible to do the same via SYSENTER, but actually executing this instruction is achieved by the VDSO virtually mapped to each running process. Since SYSENTER was never meant as a direct replacement of the int 0x80 API, it's never directly executed by userland applications - instead, when an application needs to access some kernel code, it calls the virtually mapped routine in the VDSO (that's what the call *%gs:0x10 in your code is for), which contains all the code supporting the SYSENTER instruction. There's quite a lot of it because of how the instruction actually works.

If you want to read more about this, have a look at this link. It contains a fairly brief overview of the techniques applied in the kernel and the VDSO. See also The Definitive Guide to (x86) Linux System Calls - some system calls like getpid and clock_gettime are so simple the kernel can export code + data that runs in user-space so the VDSO never needs to enter the kernel, making it much faster even than sysenter could be.

It's much easier to use the slower int $0x80 to invoke the 32-bit ABI.

// i386 Linux
#include <asm/unistd.h>      // compile with -m32 for 32 bit call numbers
//#define __NR_write 4
ssize_t my_write(int fd, const void *buf, size_t size)
{
    ssize_t ret;
    asm volatile
    (
        "int $0x80"
        : "=a" (ret)
        : "0"(__NR_write), "b"(fd), "c"(buf), "d"(size)
        : "memory"    // the kernel dereferences pointer args
    );
    return ret;
}

As you can see, using the int 0x80 API is relatively simple. The number of the syscall goes to the eax register, while all the parameters needed for the syscall go into respectively ebx, ecx, edx, esi, edi, and ebp. System call numbers can be obtained by reading the file /usr/include/asm/unistd_32.h.

Prototypes and descriptions of the functions are available in the 2nd section of the manual, so in this case write(2).

The kernel saves/restores all the registers (except EAX) so we can use them as input-only operands to the inline asm. See What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64

Keep in mind that the clobber list also contains the memory parameter, which means that the instruction listed in the instruction list references memory (via the buf parameter). (A pointer input to inline asm does not imply that the pointed-to memory is also an input. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used?)

amd64

Things look different on the AMD64 architecture which sports a new instruction called SYSCALL. It is very different from the original SYSENTER instruction, and definitely much easier to use from userland applications - it really resembles a normal CALL, actually, and adapting the old int 0x80 to the new SYSCALL is pretty much trivial. (Except it uses RCX and R11 instead of the kernel stack to save the user-space RIP and RFLAGS so the kernel knows where to return).

In this case, the number of the system call is still passed in the register rax, but the registers used to hold the arguments now nearly match the function calling convention: rdi, rsi, rdx, r10, r8 and r9 in that order. (syscall itself destroys rcx so r10 is used instead of rcx, letting libc wrapper functions just use mov r10, rcx / syscall.)

// x86-64 Linux
#include <asm/unistd.h>      // compile without -m32 for 64 bit call numbers
// #define __NR_write 1
ssize_t my_write(int fd, const void *buf, size_t size)
{
    ssize_t ret;
    asm volatile
    (
        "syscall"
        : "=a" (ret)
        //                 EDI      RSI       RDX
        : "0"(__NR_write), "D"(fd), "S"(buf), "d"(size)
        : "rcx", "r11", "memory"
    );
    return ret;
}

(See it compile on Godbolt)

Do notice how practically the only thing that needed changing were the register names, and the actual instruction used for making the call. This is mostly thanks to the input/output lists provided by gcc's extended inline assembly syntax, which automagically provides appropriate move instructions needed for executing the instruction list.

The "0"(callnum) matching constraint could be written as "a" because operand 0 (the "=a"(ret) output) only has one register to pick from; we know it will pick EAX. Use whichever you find more clear.

Note that non-Linux OSes, like MacOS, use different call numbers. And even different arg-passing conventions for 32-bit.

Solution 2

Explicit register variables

https://gcc.gnu.org/onlinedocs/gcc-8.2.0/gcc/Explicit-Register-Variables.html#Explicit-Reg-Vars)

I believe this should now generally be the recommended approach over register constraints because:

it can represent all registers, including r8, r9 and r10 which are used for system call arguments: How to specify register constraints on the Intel x86_64 register r8 to r15 in GCC inline assembly?
it's the only optimal option for other ISAs besides x86 like ARM, which don't have the magic register constraint names: How to specify an individual register as constraint in ARM GCC inline assembly? (besides using a temporary register + clobbers + and an extra mov instruction)
I'll argue that this syntax is more readable than using the single letter mnemonics such as S -> rsi

main_reg.c

#define _XOPEN_SOURCE 700
#include <inttypes.h>
#include <sys/types.h>

ssize_t my_write(int fd, const void *buf, size_t size) {
    register int64_t rax __asm__ ("rax") = 1;
    register int rdi __asm__ ("rdi") = fd;
    register const void *rsi __asm__ ("rsi") = buf;
    register size_t rdx __asm__ ("rdx") = size;
    __asm__ __volatile__ (
        "syscall"
        : "+r" (rax)
        : "r" (rdi), "r" (rsi), "r" (rdx)
        : "rcx", "r11", "memory"
    );
    return rax;
}

void my_exit(int exit_status) {
    register int64_t rax __asm__ ("rax") = 60;
    register int rdi __asm__ ("rdi") = exit_status;
    __asm__ __volatile__ (
        "syscall"
        : "+r" (rax)
        : "r" (rdi)
        : "rcx", "r11", "memory"
    );
}

void _start(void) {
    char msg[] = "hello world\n";
    my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
}

GitHub upstream.

Compile and run:

gcc -O3 -std=c99 -ggdb3 -ffreestanding -nostdlib -Wall -Werror \
  -pedantic -o main_reg.out main_reg.c
./main.out
echo $?

Output

hello world
0

For comparison, the following analogous to How to invoke a system call via syscall or sysenter in inline assembly? produces equivalent assembly:

main_constraint.c

#define _XOPEN_SOURCE 700
#include <inttypes.h>
#include <sys/types.h>

ssize_t my_write(int fd, const void *buf, size_t size) {
    ssize_t ret;
    __asm__ __volatile__ (
        "syscall"
        : "=a" (ret)
        : "0" (1), "D" (fd), "S" (buf), "d" (size)
        : "rcx", "r11", "memory"
    );
    return ret;
}

void my_exit(int exit_status) {
    ssize_t ret;
    __asm__ __volatile__ (
        "syscall"
        : "=a" (ret)
        : "0" (60), "D" (exit_status)
        : "rcx", "r11", "memory"
    );
}

void _start(void) {
    char msg[] = "hello world\n";
    my_exit(my_write(1, msg, sizeof(msg)) != sizeof(msg));
}

GitHub upstream.

Disassembly of both with:

objdump -d main_reg.out

is almost identical, here is the main_reg.c one:

Disassembly of section .text:

0000000000001000 <my_write>:
    1000:   b8 01 00 00 00          mov    $0x1,%eax
    1005:   0f 05                   syscall 
    1007:   c3                      retq   
    1008:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
    100f:   00 

0000000000001010 <my_exit>:
    1010:   b8 3c 00 00 00          mov    $0x3c,%eax
    1015:   0f 05                   syscall 
    1017:   c3                      retq   
    1018:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
    101f:   00 

0000000000001020 <_start>:
    1020:   c6 44 24 ff 00          movb   $0x0,-0x1(%rsp)
    1025:   bf 01 00 00 00          mov    $0x1,%edi
    102a:   48 8d 74 24 f3          lea    -0xd(%rsp),%rsi
    102f:   48 b8 68 65 6c 6c 6f    movabs $0x6f77206f6c6c6568,%rax
    1036:   20 77 6f 
    1039:   48 89 44 24 f3          mov    %rax,-0xd(%rsp)
    103e:   ba 0d 00 00 00          mov    $0xd,%edx
    1043:   b8 01 00 00 00          mov    $0x1,%eax
    1048:   c7 44 24 fb 72 6c 64    movl   $0xa646c72,-0x5(%rsp)
    104f:   0a 
    1050:   0f 05                   syscall 
    1052:   31 ff                   xor    %edi,%edi
    1054:   48 83 f8 0d             cmp    $0xd,%rax
    1058:   b8 3c 00 00 00          mov    $0x3c,%eax
    105d:   40 0f 95 c7             setne  %dil
    1061:   0f 05                   syscall 
    1063:   c3                      retq

So we see that GCC inlined those tiny syscall functions as would be desired.

my_write and my_exit are the same for both, but _start in main_constraint.c is slightly different:

0000000000001020 <_start>:
    1020:   c6 44 24 ff 00          movb   $0x0,-0x1(%rsp)
    1025:   48 8d 74 24 f3          lea    -0xd(%rsp),%rsi
    102a:   ba 0d 00 00 00          mov    $0xd,%edx
    102f:   48 b8 68 65 6c 6c 6f    movabs $0x6f77206f6c6c6568,%rax
    1036:   20 77 6f 
    1039:   48 89 44 24 f3          mov    %rax,-0xd(%rsp)
    103e:   b8 01 00 00 00          mov    $0x1,%eax
    1043:   c7 44 24 fb 72 6c 64    movl   $0xa646c72,-0x5(%rsp)
    104a:   0a 
    104b:   89 c7                   mov    %eax,%edi
    104d:   0f 05                   syscall 
    104f:   31 ff                   xor    %edi,%edi
    1051:   48 83 f8 0d             cmp    $0xd,%rax
    1055:   b8 3c 00 00 00          mov    $0x3c,%eax
    105a:   40 0f 95 c7             setne  %dil
    105e:   0f 05                   syscall 
    1060:   c3                      retq

It is interesting to observe that in this case GCC found a slightly shorter equivalent encoding by picking:

    104b:   89 c7                   mov    %eax,%edi

to set the fd to 1, which equals the 1 from the syscall number, rather than a more direct:

    1025:   bf 01 00 00 00          mov    $0x1,%edi

For an in-depth discussion of the calling conventions, see also: What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64

Tested in Ubuntu 18.10, GCC 8.2.0.

22,403

Infinite

Updated on May 11, 2022

Comments

Infinite about 2 years
How can we implement the system call using sysenter/syscall directly in x86 Linux? Can anybody provide help? It would be even better if you can also show the code for amd64 platform.

I know in x86, we can use
```
__asm__(
"               movl $1, %eax  \n"
"               movl $0, %ebx \n"
"               call *%gs:0x10 \n"
);
```
to route to sysenter indirectly.

But how can we code using sysenter/syscall directly to issue a system call?

I find some material http://damocles.blogbus.com/tag/sysenter/ . But still find it difficult to figure out.
- jww over 6 years
  
  How to access the system call from user-space?, How to invoke a system call via sysenter in inline assembly?, Linux system call table or cheetsheet in assembly language, Assembly and System Calls, What does “int 0x80” mean in assembly code?, etc.
Infinite over 12 years

Thanks! It seems that it very unlikely for even weird programmer to directly code using sysenter to invoke a system calls. We are actually working on a binary (including malware) analyzer for listing all the system calls in a target program. That is why we want to collect all the ways a system call is issued. It seems that we can ignore this direct sysenter approach.
Calmarius almost 8 years

Why is the first input argument is "0", shouldn't it be "a" as the system call number goes into eax/rax?
Daniel Kamil Kozar almost 8 years

@Calmarius : the 0 here means "the first output argument". AFAIR (this was a long time ago), the particular version of gcc that I used to compile this for some reason rejected the - one would think - perfectly valid "a"(__NR_write) here. gcc 6.1.1 doesn't have a problem with that, so I guess you can use it.
pts over 7 years

According to lxr.free-electrons.com/source/arch/x86/kernel/… , to need to specify "cc" (because eflags are saved) or "edi" or "esi" (because these registers are also saved) in the clobbers list.
Michael Petch over 6 years

@pls : On the x86 targets cc clobber doesn't do anything. You can change the flags in the inline assembly and no harm will be done to the surrounding C code. Other targets (not Intel) require cc clobbers. For consistency and documentation more than anything it's not a bad idea to list cc as a clobber if the flags do change.
Peter Cordes over 6 years

@MichaelPetch: Right, but it's not good documentation in this case because the Linux system-call ABIs preserve EFLAGS (and edi / esi). Daniel: you could also avoid the "memory" clobber with a dummy memory input (cast to a struct or array) to tell the compiler that it will read length bytes from buf, like the example in the manual gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html: "m" (*(const char (*)[length]) buf)
Peter Cordes over 6 years

@Infinite: right, the kernel developers don't want people to use sysenter directly, so they can change it's ABI (as far as the "dance" that user-space and the kernel do to pass a return address to the kernel).
Ciro Santilli OurBigBook.com over 5 years

@PeterCordes and Daniel, I was playing around with an explicit register version stackoverflow.com/a/54956854/895245 , and noticed that "D" does not seem to extend fd to a quad. Do you think there is a missing int64_t cast here, or do you understand why the cast is not necessary? Cheers.
Peter Cordes over 5 years

@CiroSantilli新疆改造中心六四事件法轮功: Why would it? The input is an int, which means the upper 32 bits of the register are unused by the asm. It would be a missed optimization if it wasted an instruction zero or sign-extending narrow inputs without asking for it. The system-call arg is declared as int, so you can safely count on the kernel to ignore such garbage, too.
Peter Cordes over 5 years

The casts are unnecessary. The system-call args that are declared as int will safely to ignore high garbage in the register in the kernel. And BTW, register asm is the only way to specify a specific one of r8..r15.
Ciro Santilli OurBigBook.com over 5 years

@PeterCordes ah awesome, mentioned about r8 - r10.
PhilipRoman over 3 years

If you're using TCC, you might want to avoid using explicit register variables - my syscalls weren't working until I changed it (but they did work with gcc).
Peter Cordes about 3 years

Also, Linux system calls specifically don't clobber RFLAGS, so a "cc" clobber is inappropriate here. (It's implicit anyway, so there's no performance to be gained by removing it.)