How to convert an ELF executable to C code? The generated C code need not be human-readable
Solution 1
This will give you (almost) an assembly code translation:
objdump --disassemble <elf file>
I say almost because the output contains some annotations like binary file position markers and can't serve directly as input to an assembler, but it's close.
Solution 2
You write a simulator for the processor, then you run the elf instructions through your simulated processor. It's generally not too much of a task because even CISC processors have relatively small, contained instruction sets that perform simple operations.
When you've got that working, look at more efficient solutions, like outputting C code to match instructions.
Harry
Updated on August 01, 2022Comments
-
Harry almost 2 years
I have an ELF file that I would like to decompile into
C
code, and make simple changes to the resultingC
code and rebuild it into an ELF.The decompiled
C
code need not be fully human readable. Eg, if variables and function names come out obfuscated, it is okay.Which tools can I use to accomplish this on Linux?
PS: If decompiling to
C
is not possible or is not easy, I'm willing to consider decompiling to assembly language, though tweaking the assembly source will be very difficult for me.UPDATE: You may assume that I'm using the following
C
program to get mya.out
ELF. Now, assume further that I've lost this originalC
source. So, I would now like to decompile it to (a possibly obfuscated)C
source in which I'm at least able to change small things like the strings"world"
,"Hello"
, and"Bye"
, or be able to reverse the sense of theif
statement, etc.#include <stdio.h> #include <string.h> char buf[256]; const char *Hello = "Hello"; const char *Bye = "Bye"; const char *Who = "world"; char * greet(const char *greeting, const char *str) { strcpy(buf, greeting); strcat(buf, ", "); strcat(buf, str); strcat(buf, "!"); return buf; } int main(int argc, char *argv[]) { int sayHello = 0; if(sayHello) { printf("%s\n", greet(Hello, Who)); } else { printf("%s\n", greet(Bye, Who)); } return 0; }
-
Basile Starynkevitch almost 11 yearsYou cannot do that. It is impossible. an optimized ELF image has lost information w.r.t. the original C source.
-
Basile Starynkevitch almost 11 yearsWhat is your ELF file and how did you get it?
-
Harry almost 11 yearsPlease see the Update above.
-
Basile Starynkevitch almost 11 yearsCompile your example with
gcc -Wall -O3 -S -fverbose-asm myhello.c
and look inside the producedmyhello.s
-
Basile Starynkevitch almost 11 yearsWhy do you ask? There might be other ways to achieve your goals (
LD_PRELOAD
tricks perhaps). Please explain what you really want to do. -
Harry almost 11 yearsWhat I really want to do is, given any arbitrary ELF program for which I do not have the source, I want to be able to do trivial reverse engineering tasks like, say, changing menu-option strings, window and dialog, titles, etc. Now, I hope you won't ask me 'why' I want to do all of this. So far, I'm quite shocked though that the
a.out
doesn't even have my constant strings in it: if they aren't there, then where the heck are they? -
Basile Starynkevitch almost 11 yearsIn general, faithful decompilation is impossible: the compilation process is losing information from the source file, and in general you cannot recover it. This is why free software is so useful and important: you keep the freedom to improve it!
-
Basile Starynkevitch almost 11 yearsYou could use the
strings
utility to find some strings in an executable, but the compiler might have removed them... -
Harry almost 11 years"... but the compiler might have removed them" But removed where, and why? I can understand it folding duplicate constant strings into a single string instance, but other than that, I doubt if it would bend over backwards to hide/obfuscate/encrypt/compress something as fundamental as a character string. I can understand it doing various stunts with code/logic, but string constant data...? I'm quite taken aback to say the least. Btw, thanks for your comments so far.
-
Basile Starynkevitch almost 11 yearsFor instance, the compiler could optimize
!strcmp(s,"ab")
into(s[0]=='a' && s[1]=='b' && !s[2])
so the"ab"
string is not in the binary. Likewise, someprintf
with a constant format string may be optimized. -
Harry almost 11 yearsAs I said, it's string constant data. I may not be invoking any function calls over these constant strings, so why would the compiler break strings apart in anticipation? Plus,
strcmp
is a library function and not a language primitive, so I doubt if the compiler would go to such lengths, namely, of storing intelligence inside of itself as to whatstrcmp
and other string functions of the C library do internally. Am I right? -
Basile Starynkevitch almost 11 yearsGCC knows
printf
andstrcmp
See ciselant.de/projects/gcc_printf/gcc_printf.html and gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html -
Harry almost 11 yearsNot to nit-pick (in fact, thanks for this side info!), but I'm not using any
strcmp
but rather wholesale copy operations with string for which I cannot think of any optimizations along the lines you suggested above. Even in the gcc link you provided, I can see.string "hello world\n"
clearly sitting in there. That's what I too would expect of my program. -
n. m. over 7 yearsTool requests are off-topic here.
-
-
Harry almost 11 yearsI noticed that some strings (
const char *
's) -- which I know exist in the ELF -- are missing from the output. Also, how do recompile the output ofobjdump
to get another ELF? -
R.. GitHub STOP HELPING ICE almost 11 yearsobjdump can't reverse reloctions, so it's nowhere near complete for this purpose.
-
datenwolf over 7 years@Harry: I'm pretty sure your strings are still there.You have to disassemble with
-D
and most likely your strings will get misinterpreted as instructions. To reassembly into an ELF you can pass the assembly to GCC.