Writing a Linux int 80h system-call wrapper in GNU C inline assembly

16,069

Well, you don't say specifically, but by your post, it appears like you're using gcc and its inline asm with constraints syntax (other C compilers have very different inline syntax). That said, you probably need to use AT&T assembler syntax rather than Intel, as that's what gets used with gcc.

So with the above said, lets look at your write2 function. First, you don't want to create a stack frame, as gcc will create one, so if you create one in the asm code, you'll end up with two frames, and things will probably get very confused. Second, since gcc is laying out the stack frame, you can't access vars with "[ebp + offset]" as you don't know how it's being laid out.

That's what the constraints are for -- you say what kind of place you want gcc to put the value (any register, memory, specific register) and the use "%X" in the asm code. Finally, if you use explicit registers in the asm code, you need to list them in the 3rd section (after the input constraints) so gcc knows you are using them. Otherwise it might put some important value in one of those registers, and you'd clobber that value.

You also need to tell the compiler that inline asm will or might read from or write to memory pointed-to by the input operands; that is not implied.

So with all that, your write2 function looks like:

void write2(char *str, int len) {
    __asm__ volatile (
        "movl $4, %%eax;"      // SYS_write
        "movl $1, %%ebx;"      // file descriptor = stdout_fd
        "movl %0, %%ecx;"
        "movl %1, %%edx;"
        "int $0x80"
        :: "g" (str), "g" (len)       // input values we MOV from
        : "eax", "ebx", "ecx", "edx", // registers we destroy
          "memory"                    // memory has to be in sync so we can read it
     );
}

Note the AT&T syntax -- src, dest rather than dest, src and % before the register name.

Now this will work, but its inefficient as it will contain lots of extra movs. In general, you should NEVER use mov instructions or explicit registers in asm code, as you're much better off using constraints to say where you want things and let the compiler ensure that they're there. That way, the optimizer can probably get rid of most of the movs, particularly if it inlines the function (which it will do if you specify -O3). Conveniently, the i386 machine model has constraints for specific registers, so you can instead do:

void write2(char *str, int len) {
    __asm__ volatile (
        "movl $4, %%eax;"
        "movl $1, %%ebx;"
        "int $0x80"
        :: "c" (str), /* c constraint tells the compiler to put str in ecx */
           "d" (len)  /* d constraint tells the compiler to put len in edx */
        : "eax", "ebx", "memory");
}

or even better

// UNSAFE: destroys EAX (with return value) without telling the compiler
void write2(char *str, int len) {
    __asm__ volatile ("int $0x80"
        :: "a" (4), "b" (1), "c" (str), "d" (len)
        : "memory");
}

Note also the use of volatile which is needed to tell the compiler that this can't be eliminated as dead even though its outputs (of which there are none) are not used. (asm with no output operands is already implicitly volatile, but making it explicit doesn't hurt when the real purpose isn't to calculate something; it's for a side effect like a system call.)

edit

One final note -- this function is doing a write system call, which does return a value in eax -- either the number of bytes written or an error code. So you can get that with an output constraint:

int write2(const char *str, int len) {
    __asm__ volatile ("int $0x80" 
     : "=a" (len)
     : "a" (4), "b" (1), "c" (str), "d" (len),
       "m"( *(const char (*)[])str )       // "dummy" input instead of memory clobber
     );
    return len;
}

All system calls return in EAX. Values from -4095 to -1 (inclusive) are negative errno codes, other values are non-errors. (This applies globally to all Linux system calls).

If you're writing a generic system-call wrapper, you probably need a "memory" clobber because different system calls have different pointer operands, and might be inputs or outputs. See https://godbolt.org/z/GOXBue for an example that breaks if you leave it out, and this answer for more details about dummy memory inputs/outputs.

With this output operand, you need the explicit volatile -- exactly one write system call per time the asm statement "runs" in the source. Otherwise the compiler is allowed to assume that it exists only to compute its return value, and can eliminate repeated calls with the same input instead of writing multiple lines. (Or remove it entirely if you didn't check the return value.)

Share:
16,069

Related videos on Youtube

RodrigoCR
Author by

RodrigoCR

Updated on June 04, 2022

Comments

  • RodrigoCR
    RodrigoCR almost 2 years

    I'm trying to use inline assembly... I read this page http://www.codeproject.com/KB/cpp/edujini_inline_asm.aspx but I can't understand the parameters passing to my function.

    I'm writing a C write example.. this is my function header:

    write2(char *str, int len){
    }
    

    And this is my assembly code:

    global write2
    write2:
        push ebp
        mov ebp, esp
        mov eax, 4      ;sys_write
        mov ebx, 1      ;stdout
        mov ecx, [ebp+8]    ;string pointer
        mov edx, [ebp+12]   ;string size
        int 0x80        ;syscall
        leave
        ret
    

    What do I have to do pass that code to the C function... I'm doing something like this:

    write2(char *str, int len){
        asm ( "movl 4, %%eax;"
              "movl 1, %%ebx;"
              "mov %1, %%ecx;"
              //"mov %2, %%edx;"
              "int 0x80;"
               :
               : "a" (str), "b" (len)
        );
    }
    

    That's because I don't have an output variable, so how do I handle that? Also, with this code:

    global main
    main:
        mov ebx, 5866       ;PID
        mov ecx, 9      ;SIGKILL
        mov eax, 37     ;sys_kill
        int 0x80        ;interruption
        ret 
    

    How can I put that code inline in my code.. so I can ask for the pid to the user.. like this.. This is my precode

    void killp(int pid){
        asm ( "mov %1, %%ebx;"
              "mov 9, %%ecx;"
              "mov 37, %%eax;"
               :
               : "a" (pid)         /* optional */
        );
    }
    
  • ughoavgfhw
    ughoavgfhw about 13 years
    You missed one thing while converting to AT&T: Constants need a $ in front of them. Otherwise they are memory references, and I'm pretty sure you dont want to perform whatever interrupt happens to be at address 0x80.
  • RodrigoCR
    RodrigoCR about 13 years
    Thanks a lot for that big helpful answer. I realized about the AT&T sintax, so i modified my code... But it was too late for you to see :P, altought now i understand about optimizations... So, i need to put int $0x80 right?
  • RodrigoCR
    RodrigoCR about 13 years
    void write2(char *str, int len) { asm volatile ("int $0x80" :: "a" (4), "b" (1), "c" (str), "d" (len)); } Note: That is the correct answer! interchange values and using $. Thanks you both!
  • Chris Dodd
    Chris Dodd about 13 years
    Whoops -- you're right, I completely forgot the $ signs
  • Alexander
    Alexander about 12 years
    You can use intel syntax with the ".intel_syntax" directive.
  • Timothy Baldwin
    Timothy Baldwin almost 9 years
    The third example in this answer in incorrect as the compiler is not informed eax is changed and therefore will assume it isn't.
  • Z boson
    Z boson over 8 years
  • Peter Cordes
    Peter Cordes almost 5 years
    @Zboson: yes, but you need to tell the compiler somehow that int 0x80 modifies EAX. So if you wanted to avoid outputs, you're stuck with a clobber and a "mov $" __NR_write ", %%eax" instead of an input constraint. And for system-calls that copy to user-space memory (like read(2), not write(2)) you need a dummy memory output operand to tell the compiler about it. This needs a dummy input, or a "memory" clobber, otherwise stores to the buffer before the call are dead stores that can optimize away.
  • Peter Cordes
    Peter Cordes about 3 years
    @Alexander: Don't use .intel_syntax noprefix / .att_syntax at the top/bottom of an asm template; instead compile with -masm=intel. Although in this one case, where you don't have any operands like %0 substituting into the template (where you want the compiler to use eax or 4, not %eax or $4), it may not matter. How to set gcc to use intel syntax permanently?
  • Peter Cordes
    Peter Cordes about 3 years
    @Chris: This answer could use some maintenance to better show best practices, like #include <asm/unistd.h> and using __NR_write instead of 4. And maybe not showing the "unsafe" version (that doesn't tell the compiler about EAX being modified) at all, or better integrating the buggy vs. fixed versions instead of just a tacked-on edit after presenting it like it was good.

Related