Can a C program modify its executable file?

c metaprogramming self-modifying

12,171

Solution 1

On Windows, when a program is run the entire *.exe file is mapped into memory using the memory-mapped-file functions in Windows. This means that the file isn't necessarily all loaded at once, but instead the pages of the file are loaded on-demand as they are accessed.

When the file is mapped in this way, another application (including itself) can't write to the same file to change it while it's running. (Also, on Windows the running executable can't be renamed either, but it can on Linux and other Unix systems with inode-based filesystems).

It is possible to change the bits mapped into memory, but if you do this the OS does it using "copy-on-write" semantics, which means that the underlying file isn't changed on disk, but a copy of the page(s) in memory is made with your modifications. Before being allowed to do this though, you usually have to fiddle with protection bits on the memory in question (e.g. VirtualProtect).

At one time, it used to be common for low-level assembly programs that were in very constrained memory environments to use self-modifying code. However, nobody does this anymore because we're not running in the same constrained environments, and modern processors have long pipelines that get very upset if you start changing code from underneath them.

Solution 2

If you are using Windows, you can do the following:

Step-by-Step Example:

Call VirtualProtect() on the code pages you want to modify, with the PAGE_WRITECOPY protection.
Modify the code pages.
Call VirtualProtect() on the modified code pages, with the PAGE_EXECUTE protection.
Call FlushInstructionCache().

For more information, see How to Modify Executable Code in Memory (Archived: Aug. 2010)

Solution 3

It is very operating system dependent. Some operating systems lock the file, so you could try to cheat by making a new copy of it somewhere, but the you're just running another compy of the program.

Other operating systems do security checks on the file, e.g. iPhone, so writing it will be a lot of work, plus it resides as a readonly file.

With other systems you might not even know where the file is.

Solution 4

All present answers more or less revolve around the fact that today you cannot easily do self-modifying machine code anymore. I agree that that is basically true for today's PCs.

However, if you really want to see own self-modifying code in action, you have some possibilities available:

Try out microcontrollers, the simpler ones do not have advanced pipelining. The cheapest and quickest choice I found is an MSP430 USB-Stick
If an emulation is ok for you, you can run an emulator for an older non-pipelined platform.
If you wanted self-modifying code just for the fun of it, you can have even more fun with self-destroying code (more exactly enemy-destroying) at Corewars.
If you are willing to move from C to say a Lisp dialect, code that writes code is very natural there. I would suggest Scheme which is intentionally kept small.

Solution 5

If we're talking about doing this in an x86 environment it shouldn't be impossible. It should be used with caution though because x86 instructions are variable-length. A long instruction may overwrite the following instruction(s) and a shorter one will leave residual data from the overwritten instruction which should be noped (NOP instruction).

When the x86 first became protected the intel reference manuals recommended the following method for debugging access to XO (execute only) areas:

create a new, empty selector ("high" part of far pointers)
set its attributes to that of the XO area
the new selector's access properties must be set RO DATA if you only want to look at what's in it
if you want to modify the data the access properties must be set to RW DATA

So the answer to the problem is in the last step. The RW is necessary if you want to be able to insert the breakpoint instruction which is what debuggers do. More modern processors than the 80286 have internal debug registers to enable non-intrusive monitoring functionality which could result in a breakpoint being issued.

Windows made available the building blocks for doing this starting with Win16. They are probably still in place. I think Microsoft calls this class of pointer manipulation "thunking."

I once wrote a very fast 16-bit database engine in PL/M-86 for DOS. When Windows 3.1 arrived (running on 80386s) I ported it to the Win16 environment. I wanted to make use of the 32-bit memory available but there was no PL/M-32 available (or Win32 for that matter).

to solve the problem my program used thunking in the following way

defined 32-bit far pointers (sel_16:offs_32) using structures
allocated 32-bit data areas (<=> >64KB size) using global memory and received them in 16-bit far pointer (sel_16:offs_16) format
filled in the data in the structures by copying the selector, then calculating the offset using 16-bit multiplication with 32-bit results.
loaded the pointer/structure into es:ebx using the instruction size override prefix
accessed the data using a combination of the instruction size and operand size prefixes

Once the mechanism was bug free it worked without a hitch. The largest memory areas my program used were 2304*2304 double precision which comes out to around 40MB. Even today, I would call this a "large" block of memory. In 1995 it was 30% of a typical SDRAM stick (128 MB PC100).

View more solutions

12,171

Joel

Updated on June 02, 2020

Comments

Joel about 4 years
I had a little too much time on my hands and started wondering if I could write a self-modifying program. To that end, I wrote a "Hello World" in C, then used a hex editor to find the location of the "Hello World" string in the compiled executable. Is it possible to modify this program to open itself and overwrite the "Hello World" string?
```
char* str = "Hello World\n";

int main(int argc, char* argv) {

  printf(str);

  FILE * file = fopen(argv, "r+");

  fseek(file, 0x1000, SEEK_SET);
  fputs("Goodbyewrld\n", file);      
  fclose(file);    

  return 0;
}
```
This doesn't work, I'm assuming there's something preventing it from opening itself since I can split this into two separate programs (A "Hello World" and something to modify it) and it works fine.

EDIT: My understanding is that when the program is run, it's loaded completely into ram. So the executable on the hard drive is, for all intents and purposes a copy. Why would it be a problem for it to modify itself?

Is there a workaround?

Thanks
- Joel over 13 years
  
  Haha, no. Just idle curiosity.
- Admin over 13 years
  
  Cheat Engine works by finding values in programs and letting you change them. It can give you extra lives in video games etc, but I don't think you really want cheat engine as answer.
- dmckee --- ex-moderator kitten over 13 years
  
  Note that modifying the file is not necessarily--OS dependent--the same thing as modifying the in-memory code that is actually executing. You should be clear on which you wish to achieve.
- Andy Lester over 13 years
  
  Yes, you can modify your own executable file. What you're asking about is modifying the running program.
asveikau over 13 years

You can modify the code of a program while it's running. (On Windows, this would be WriteProcessMemory()). This is how your debugger works. That said, it's a very bad idea.
Joel over 13 years

I don't think I ever claimed it was a good one :P. But why should Windows lock it? My understanding is that when the program is run, it's loaded completely into ram. So the executable on the hard drive is, for all intents and purposes a copy. Why would it be a problem to modify this?
Winston Ewert over 13 years

@asveikau this has to do with a file on disk not in memory, but you are correct about what can be done in memory.
Winston Ewert over 13 years

@Joel, this isn't necessarily entirely true. For a sufficiently large program windows may swap parts of the executable out of memory. Regardless, I don't see a really good reason for the behaviour
Ferruccio over 13 years

@Joel - when a program is run, it is not necessarily all loaded into memory. Portions may be paged in as needed. This is especially true of large programs which may swap data in and out as necessary (actually the OS does all the paging, it's transparent to the program). By locking the program file, the OS never has to swap out code because it knows where it can get a pristine copy of the code whenever it needs it.
Déjà vu over 13 years

Note that on Unix / Linux, while you can rename or even delete a running executable on disk, it is kept intact in memory until the process dies.
Chibueze Opata almost 13 years

And when it is necessary, one can also create a file copy of the modified version, start a new process to perform the file replacement and terminate itself.
Michael Chourdakis almost 5 years

The executable can be renamed while it's running.