How to write self modifying code in C?

41,144

Solution 1

You might want to consider writing a virtual machine in C, where you can build your own self-modifying code.

If you wish to write self-modifying executables, much depends on the operating system you are targeting. You might approach your desired solution by modifying the in-memory program image. To do so, you would obtain the in-memory address of your program's code bytes. Then, you might manipulate the operating system protection on this memory range, allowing you to modify the bytes without encountering an Access Violation or '''SIG_SEGV'''. Finally, you would use pointers (perhaps '''unsigned char *''' pointers, possibly '''unsigned long *''' as on RISC machines) to modify the opcodes of the compiled program.

A key point is that you will be modifying machine code of the target architecture. There is no canonical format for C code while it is running -- C is a specification of a textual input file to a compiler.

Solution 2

It is possible, but it's most probably not portably possible and you may have to contend with read-only memory segments for the running code and other obstacles put in place by your OS.

Solution 3

Sorry, I am answering a bit late, but I think I found exactly what you are looking for : https://shanetully.com/2013/12/writing-a-self-mutating-x86_64-c-program/

In this article, they change the value of a constant by injecting assembly in the stack. Then they execute a shellcode by modifying the memory of a function on the stack.

Below is the first code :

#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/mman.h>

void foo(void);
int change_page_permissions_of_address(void *addr);

int main(void) {
    void *foo_addr = (void*)foo;

    // Change the permissions of the page that contains foo() to read, write, and execute
    // This assumes that foo() is fully contained by a single page
    if(change_page_permissions_of_address(foo_addr) == -1) {
        fprintf(stderr, "Error while changing page permissions of foo(): %s\n", strerror(errno));
        return 1;
    }

    // Call the unmodified foo()
    puts("Calling foo...");
    foo();

    // Change the immediate value in the addl instruction in foo() to 42
    unsigned char *instruction = (unsigned char*)foo_addr + 18;
    *instruction = 0x2A;

    // Call the modified foo()
    puts("Calling foo...");
    foo();

    return 0;
}

void foo(void) {
    int i=0;
    i++;
    printf("i: %d\n", i);
}

int change_page_permissions_of_address(void *addr) {
    // Move the pointer to the page boundary
    int page_size = getpagesize();
    addr -= (unsigned long)addr % page_size;

    if(mprotect(addr, page_size, PROT_READ | PROT_WRITE | PROT_EXEC) == -1) {
        return -1;
    }

    return 0;
}

Solution 4

This would be a good start. Essentially Lisp functionality in C:

http://nakkaya.com/2010/08/24/a-micro-manual-for-lisp-implemented-in-c/

Solution 5

Depending on how much freedom you need, you may be able to accomplish what you want by using function pointers. Using your pseudocode as a jumping-off point, consider the case where we want to modify that variable x in different ways as the loop index i changes. We could do something like this:

#include <stdio.h>

void multiply_x (int * x, int multiplier)
{
    *x *= multiplier;
}

void add_to_x (int * x, int increment)
{
    *x += increment;
}

int main (void)
{
    int x = 0;
    int i;

    void (*fp)(int *, int);

    for (i = 1; i < 6; ++i) {
            fp = (i % 2) ? add_to_x : multiply_x;

            fp(&x, i);

            printf("%d\n", x);
    }

    return 0;
}

The output, when we compile and run the program, is:

1
2
5
20
25

Obviously, this will only work if you have finite number of things you want to do with x on each run through. In order to make the changes persistent (which is part of what you want from "self-modification"), you would want to make the function-pointer variable either global or static. I'm not sure I really can recommend this approach, because there are often simpler and clearer ways of accomplishing this sort of thing.

Share:
41,144
AnkurVj
Author by

AnkurVj

Updated on October 08, 2020

Comments

  • AnkurVj
    AnkurVj over 3 years

    I want to write a piece of code that changes itself continuously, even if the change is insignificant.

    For example maybe something like

    for i in 1 to  100, do 
    begin
       x := 200
       for j in 200 downto 1, do
        begin
           do something
        end
    end
    

    Suppose I want that my code should after first iteration change the line x := 200 to some other line x := 199 and then after next iteration change it to x := 198 and so on.

    Is writing such a code possible ? Would I need to use inline assembly for that ?

    EDIT : Here is why I want to do it in C:

    This program will be run on an experimental operating system and I can't / don't know how to use programs compiled from other languages. The real reason I need such a code is because this code is being run on a guest operating system on a virtual machine. The hypervisor is a binary translator that is translating chunks of code. The translator does some optimizations. It only translates the chunks of code once. The next time the same chunk is used in the guest, the translator will use the previously translated result. Now, if the code gets modified on the fly, then the translator notices that, and marks its previous translation as stale. Thus forcing a re-translation of the same code. This is what I want to achieve, to force the translator to do many translations. Typically these chunks are instructions between to branch instructions (such as jump instructions). I just think that self modifying code would be fantastic way to achieve this.

  • AnkurVj
    AnkurVj almost 13 years
    Okay. But I really need to do it in a C code. Could it be possible using assembly instructions that can be written in C using inline assembly ?
  • Jonathan M
    Jonathan M almost 13 years
    Well, C is a compiled language, which means you'll have to compile after each change, link (if necessary) and then execute the new executable file. C really isn't designed for on-the-fly code changes.
  • Jonathan M
    Jonathan M almost 13 years
    In the your original post, you might tell a bit about why it needs to be in C, if it really, really does.
  • SimonHawkins
    SimonHawkins almost 13 years
    @AnkurVj If you have to ask this kind of question, you are probably not capable of doing it.
  • Jonathan M
    Jonathan M almost 13 years
    @Alan, on the contrary, asking such questions is how we become able to do such things.
  • AnkurVj
    AnkurVj almost 13 years
    @Alan, I have added an explanation to why I want to do this.
  • Jonathan M
    Jonathan M almost 13 years
    Sounds like they're developing an OS, so portability isn't a concern.
  • AnkurVj
    AnkurVj almost 13 years
    Will this code example really do self modification ? Shouldn't modifying code require writing to the memory locations that contain the code ? I mean this code will be compiled to something where the call is made to either of the two conditions by evaluating a condition. But that is static code after all ? Isn't it ?
  • Pillsy
    Pillsy almost 13 years
    No, you're right, it's static code, and won't serve your particular purpose (which sounds really interesting, BTW).
  • Dmitri
    Dmitri almost 13 years
    Of course, you don't need to set the function pointer in each loop iteration. You can initialize it to some function before, and change it whenever you want. You don't need to stick with one, either.. you could call a list of them sequentially. There's a lot that can be done with this if the idea is extended further.. don't give up on it too quickly..
  • bloodphp
    bloodphp about 11 years
    mprotect(2) on Linux can be used to allow writes. mprotect(..., PROT_WRITE | PROT_EXEC) The non-portable answer that you're getting at - rewriting the functions themselves - is most certainly possible on many real-world systems, but it's not based on functionality present in C.
  • Engineer
    Engineer about 7 years
    The issue here is that the code modifies itself by setting Assembly instructions, which means that thereafter it's no longer the cross-platform C that it was meant to be - loses portability. So doesn't quite answer the Q.
  • Engineer
    Engineer about 7 years
    "A key point is that you will be modifying machine code of the target architecture." Meaning that you are breaking portability of regular C code. Just a heads up for others reading this (should be obvious).
  • Basile Starynkevitch
    Basile Starynkevitch almost 6 years
    This is not really C. And is very brittle (the C compiler is allowed to compile a call or a branch in various ways, and you'll have different code size even for that jump or call)