How to convert an ELF executable to C code? The generated C code need not be human-readable

31,123

Solution 1

This will give you (almost) an assembly code translation:

objdump --disassemble <elf file>

I say almost because the output contains some annotations like binary file position markers and can't serve directly as input to an assembler, but it's close.

Solution 2

You write a simulator for the processor, then you run the elf instructions through your simulated processor. It's generally not too much of a task because even CISC processors have relatively small, contained instruction sets that perform simple operations.

When you've got that working, look at more efficient solutions, like outputting C code to match instructions.

Share:
31,123
Harry
Author by

Harry

Updated on August 01, 2022

Comments

  • Harry
    Harry almost 2 years

    I have an ELF file that I would like to decompile into C code, and make simple changes to the resulting C code and rebuild it into an ELF.

    The decompiled C code need not be fully human readable. Eg, if variables and function names come out obfuscated, it is okay.

    Which tools can I use to accomplish this on Linux?

    PS: If decompiling to C is not possible or is not easy, I'm willing to consider decompiling to assembly language, though tweaking the assembly source will be very difficult for me.

    UPDATE: You may assume that I'm using the following C program to get my a.out ELF. Now, assume further that I've lost this original C source. So, I would now like to decompile it to (a possibly obfuscated) C source in which I'm at least able to change small things like the strings "world", "Hello", and "Bye", or be able to reverse the sense of the if statement, etc.

    #include <stdio.h>
    #include <string.h>
    
    char buf[256];
    
    const char *Hello = "Hello";
    const char *Bye = "Bye";
    const char *Who = "world";
    
    char * greet(const char *greeting, const char *str) {
        strcpy(buf, greeting);
        strcat(buf, ", ");
        strcat(buf, str);
        strcat(buf, "!");
        return buf;
    }
    
    int main(int argc, char *argv[]) {
        int sayHello = 0;
    
        if(sayHello) {
            printf("%s\n", greet(Hello, Who));
        } else {
            printf("%s\n", greet(Bye, Who));
        }
        return 0;   
    }
    
    • Basile Starynkevitch
      Basile Starynkevitch almost 11 years
      You cannot do that. It is impossible. an optimized ELF image has lost information w.r.t. the original C source.
    • Basile Starynkevitch
      Basile Starynkevitch almost 11 years
      What is your ELF file and how did you get it?
    • Harry
      Harry almost 11 years
      Please see the Update above.
    • Basile Starynkevitch
      Basile Starynkevitch almost 11 years
      Compile your example with gcc -Wall -O3 -S -fverbose-asm myhello.c and look inside the produced myhello.s
    • Basile Starynkevitch
      Basile Starynkevitch almost 11 years
      Why do you ask? There might be other ways to achieve your goals (LD_PRELOAD tricks perhaps). Please explain what you really want to do.
    • Harry
      Harry almost 11 years
      What I really want to do is, given any arbitrary ELF program for which I do not have the source, I want to be able to do trivial reverse engineering tasks like, say, changing menu-option strings, window and dialog, titles, etc. Now, I hope you won't ask me 'why' I want to do all of this. So far, I'm quite shocked though that the a.out doesn't even have my constant strings in it: if they aren't there, then where the heck are they?
    • Basile Starynkevitch
      Basile Starynkevitch almost 11 years
      In general, faithful decompilation is impossible: the compilation process is losing information from the source file, and in general you cannot recover it. This is why free software is so useful and important: you keep the freedom to improve it!
    • Basile Starynkevitch
      Basile Starynkevitch almost 11 years
      You could use the strings utility to find some strings in an executable, but the compiler might have removed them...
    • Harry
      Harry almost 11 years
      "... but the compiler might have removed them" But removed where, and why? I can understand it folding duplicate constant strings into a single string instance, but other than that, I doubt if it would bend over backwards to hide/obfuscate/encrypt/compress something as fundamental as a character string. I can understand it doing various stunts with code/logic, but string constant data...? I'm quite taken aback to say the least. Btw, thanks for your comments so far.
    • Basile Starynkevitch
      Basile Starynkevitch almost 11 years
      For instance, the compiler could optimize !strcmp(s,"ab") into (s[0]=='a' && s[1]=='b' && !s[2]) so the "ab" string is not in the binary. Likewise, some printf with a constant format string may be optimized.
    • Harry
      Harry almost 11 years
      As I said, it's string constant data. I may not be invoking any function calls over these constant strings, so why would the compiler break strings apart in anticipation? Plus, strcmp is a library function and not a language primitive, so I doubt if the compiler would go to such lengths, namely, of storing intelligence inside of itself as to what strcmp and other string functions of the C library do internally. Am I right?
    • Basile Starynkevitch
      Basile Starynkevitch almost 11 years
    • Harry
      Harry almost 11 years
      Not to nit-pick (in fact, thanks for this side info!), but I'm not using any strcmp but rather wholesale copy operations with string for which I cannot think of any optimizations along the lines you suggested above. Even in the gcc link you provided, I can see .string "hello world\n" clearly sitting in there. That's what I too would expect of my program.
    • n. m.
      n. m. over 7 years
      Tool requests are off-topic here.
  • Harry
    Harry almost 11 years
    I noticed that some strings (const char *'s) -- which I know exist in the ELF -- are missing from the output. Also, how do recompile the output of objdump to get another ELF?
  • R.. GitHub STOP HELPING ICE
    R.. GitHub STOP HELPING ICE almost 11 years
    objdump can't reverse reloctions, so it's nowhere near complete for this purpose.
  • datenwolf
    datenwolf over 7 years
    @Harry: I'm pretty sure your strings are still there.You have to disassemble with -D and most likely your strings will get misinterpreted as instructions. To reassembly into an ELF you can pass the assembly to GCC.