How to disassemble the main function of a stripped application?

29,244

Solution 1

Ok, here a big edition of my previous answer. I think I found a way now.

You (still :) have this specific problem:

(gdb) disas main
No symbol table is loaded.  Use the "file" command.

Now, if you compile the code (I added a return 0 at the end), you will get with gcc -S:

    pushq   %rbp
    movq    %rsp, %rbp
    movl    $.LC0, %edi
    call    puts
    movl    $0, %eax
    leave
    ret

Now, you can see that your binary gives you some info:

Striped:

(gdb) info files
Symbols from "/home/beco/Documents/fontes/cpp/teste/stackoverflow/distrip".
Local exec file:
    `/home/beco/Documents/fontes/cpp/teste/stackoverflow/distrip', file type elf64-x86-64.
    Entry point: 0x400440
    0x0000000000400238 - 0x0000000000400254 is .interp
    ...
    0x00000000004003a8 - 0x00000000004003c0 is .rela.dyn
    0x00000000004003c0 - 0x00000000004003f0 is .rela.plt
    0x00000000004003f0 - 0x0000000000400408 is .init
    0x0000000000400408 - 0x0000000000400438 is .plt
    0x0000000000400440 - 0x0000000000400618 is .text
    ...
    0x0000000000601010 - 0x0000000000601020 is .data
    0x0000000000601020 - 0x0000000000601030 is .bss

The most important entry here is .text. It is a common name for a assembly start of code, and from our explanation of main bellow, from its size, you can see that it includes main. If you disassembly it, you will see a call to __libc_start_main. Most important, you are disassembling a good entry point that is real code (you are not misleading to change DATA to CODE).

disas 0x0000000000400440,0x0000000000400618
Dump of assembler code from 0x400440 to 0x400618:
   0x0000000000400440:  xor    %ebp,%ebp
   0x0000000000400442:  mov    %rdx,%r9
   0x0000000000400445:  pop    %rsi
   0x0000000000400446:  mov    %rsp,%rdx
   0x0000000000400449:  and    $0xfffffffffffffff0,%rsp
   0x000000000040044d:  push   %rax
   0x000000000040044e:  push   %rsp
   0x000000000040044f:  mov    $0x400540,%r8
   0x0000000000400456:  mov    $0x400550,%rcx
   0x000000000040045d:  mov    $0x400524,%rdi
   0x0000000000400464:  callq  0x400428 <__libc_start_main@plt>
   0x0000000000400469:  hlt
   ...

   0x000000000040046c:  sub    $0x8,%rsp
   ...
   0x0000000000400482:  retq   
   0x0000000000400483:  nop
   ...
   0x0000000000400490:  push   %rbp
   ..
   0x00000000004004f2:  leaveq 
   0x00000000004004f3:  retq   
   0x00000000004004f4:  data32 data32 nopw %cs:0x0(%rax,%rax,1)
   ...
   0x000000000040051d:  leaveq 
   0x000000000040051e:  jmpq   *%rax
   ...
   0x0000000000400520:  leaveq 
   0x0000000000400521:  retq   
   0x0000000000400522:  nop
   0x0000000000400523:  nop
   0x0000000000400524:  push   %rbp
   0x0000000000400525:  mov    %rsp,%rbp
   0x0000000000400528:  mov    $0x40062c,%edi
   0x000000000040052d:  callq  0x400418 <puts@plt>
   0x0000000000400532:  mov    $0x0,%eax
   0x0000000000400537:  leaveq 
   0x0000000000400538:  retq   

The call to __libc_start_main gets as its first argument a pointer to main(). So, the last argument in the stack just immediately before the call is your main() address.

   0x000000000040045d:  mov    $0x400524,%rdi
   0x0000000000400464:  callq  0x400428 <__libc_start_main@plt>

Here it is 0x400524 (as we already know). Now you set a breakpoint an try this:

(gdb) break *0x400524
Breakpoint 1 at 0x400524
(gdb) run
Starting program: /home/beco/Documents/fontes/cpp/teste/stackoverflow/disassembly/d2 

Breakpoint 1, 0x0000000000400524 in main ()
(gdb) n
Single stepping until exit from function main, 
which has no line number information.
hello 1
__libc_start_main (main=<value optimized out>, argc=<value optimized out>, ubp_av=<value optimized out>, 
    init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>, 
    stack_end=0x7fffffffdc38) at libc-start.c:258
258 libc-start.c: No such file or directory.
    in libc-start.c
(gdb) n

Program exited normally.
(gdb) 

Now you can disassembly it using:

(gdb) disas 0x0000000000400524,0x0000000000400600
Dump of assembler code from 0x400524 to 0x400600:
   0x0000000000400524:  push   %rbp
   0x0000000000400525:  mov    %rsp,%rbp
   0x0000000000400528:  sub    $0x10,%rsp
   0x000000000040052c:  movl   $0x1,-0x4(%rbp)
   0x0000000000400533:  mov    $0x40064c,%eax
   0x0000000000400538:  mov    -0x4(%rbp),%edx
   0x000000000040053b:  mov    %edx,%esi
   0x000000000040053d:  mov    %rax,%rdi
   0x0000000000400540:  mov    $0x0,%eax
   0x0000000000400545:  callq  0x400418 <printf@plt>
   0x000000000040054a:  mov    $0x0,%eax
   0x000000000040054f:  leaveq 
   0x0000000000400550:  retq   
   0x0000000000400551:  nop
   0x0000000000400552:  nop
   0x0000000000400553:  nop
   0x0000000000400554:  nop
   0x0000000000400555:  nop
   ...

This is primarily the solution.

BTW, this is a different code, to see if it works. That is why the assembly above is a bit different. The code above is from this c file:

#include <stdio.h>

int main(void)
{
    int i=1;
    printf("hello %d\n", i);
    return 0;
}

But!


if this does not work, then you still have some hints:

You should be looking to set breakpoints in the beginning of all functions from now on. They are just before a ret or leave. The first entry point is .text itself. This is the assembly start, but not the main.

The problem is that not always a breakpoint will let your program run. Like this one in the very .text:

(gdb) break *0x0000000000400440
Breakpoint 2 at 0x400440
(gdb) run
Starting program: /home/beco/Documents/fontes/cpp/teste/stackoverflow/disassembly/d2 

Breakpoint 2, 0x0000000000400440 in _start ()
(gdb) n
Single stepping until exit from function _start, 
which has no line number information.
0x0000000000400428 in __libc_start_main@plt ()
(gdb) n
Single stepping until exit from function __libc_start_main@plt, 
which has no line number information.
0x0000000000400408 in ?? ()
(gdb) n
Cannot find bounds of current function

So you need to keep trying until you find your way, setting breakpoints at:

0x400440
0x40046c
0x400490
0x4004f4
0x40051e
0x400524

From the other answer, we should keep this info:

In the non-striped version of the file, we see:

(gdb) disas main
Dump of assembler code for function main:
   0x0000000000400524 <+0>: push   %rbp
   0x0000000000400525 <+1>: mov    %rsp,%rbp
   0x0000000000400528 <+4>: mov    $0x40062c,%edi
   0x000000000040052d <+9>: callq  0x400418 <puts@plt>
   0x0000000000400532 <+14>:    mov    $0x0,%eax
   0x0000000000400537 <+19>:    leaveq 
   0x0000000000400538 <+20>:    retq   
End of assembler dump.

Now we know that main is at 0x0000000000400524,0x0000000000400539. If we use the same offset to look at the striped binary we get the same results:

(gdb) disas 0x0000000000400524,0x0000000000400539
Dump of assembler code from 0x400524 to 0x400539:
   0x0000000000400524:  push   %rbp
   0x0000000000400525:  mov    %rsp,%rbp
   0x0000000000400528:  mov    $0x40062c,%edi
   0x000000000040052d:  callq  0x400418 <puts@plt>
   0x0000000000400532:  mov    $0x0,%eax
   0x0000000000400537:  leaveq 
   0x0000000000400538:  retq   
End of assembler dump.

So, unless you can get some tip where the main starts (like using another code with symbols), another way is if you can have some info about the firsts assembly instructions, so you can disassembly at specifics places and look if it matches. If you have no access at all to the code, you still can read the ELF definition to understand how many sections should appear in the code and try a calculated address. Still, you need info about sections in the code!

That is hard work, my friend! Good luck!

Beco

Solution 2

How about doing info files to get the section list (with addresses), and going from there?

Example:

gdb) info files

Symbols from "/home/bob/tmp/t".
Local exec file:
`/home/bob/tmp/t', file type elf64-x86-64.
Entry point: 0x400490
0x0000000000400270 - 0x000000000040028c is .interp
0x000000000040028c - 0x00000000004002ac is .note.ABI-tag
    ....

0x0000000000400448 - 0x0000000000400460 is .init
    ....

The disassemble .init:

(gdb) disas 0x0000000000400448,0x0000000000400460
Dump of assembler code from 0x400448 to 0x400460:
   0x0000000000400448:  sub    $0x8,%rsp
   0x000000000040044c:  callq  0x4004bc
   0x0000000000400451:  callq  0x400550
   0x0000000000400456:  callq  0x400650
   0x000000000040045b:  add    $0x8,%rsp
   0x000000000040045f:  retq   

Then go ahead and disassemble the rest.

If I were you, and I had the same GCC version as your executable was built with, I'd examine the sequence of functions called on a dummy non-stripped executable. The sequence of calls is probably similar in most usual cases, so that might help you grind through the startup sequence up to your main by comparison. Optimizations will probably come in the way though.

If your binary is stripped and optimized, main might not exist as an "entity" in the binary; chances are you can't get much better than this type of procedure.

Solution 3

There's a great new free tool called unstrip from the paradyn project (full disclosure: I work on this project) that will rewrite your program binary, adding symbol information to it, and recover all (or nearly all) of the functions in stripped Elf binaries for you, with great accuracy. It won't identify the main function as "main", but it will find it, and you can apply the heuristic you already mentioned above to figure out which function is main.

http://www.paradyn.org/html/tools/unstrip.html

I'm sorry this isn't a gdb-only solution.

Share:
29,244
karlphillip
Author by

karlphillip

Helpful posts: How much research effort is expected of Stack Overflow users? How to create a Minimal, Reproducible Example How does accepting an answer work? Answering technical questions helpfully Achievements: 1st                  1st                  2nd

Updated on July 13, 2020

Comments

  • karlphillip
    karlphillip almost 4 years

    Let's say I compiled the application below and stripped it's symbols.

    #include <stdio.h>
    
    int main()
    {
        printf("Hello\n");
    }
    

    Build procedure:

    gcc -o hello hello.c
    strip --strip-unneeded hello
    

    If the application wasn't stripped, disassembling the main function would be easy. However, I have no idea how to disassemble the main function of a stripped application.

    (gdb) disas main
    No symbol table is loaded.  Use the "file" command.
    
    (gdb) info line main
    Function "main" not defined.
    

    How could I do it? Is it even possible?

    Notes: this must be done with GDB only. Forget objdump. Assume that I don't have access to the code.

    A step-by-step example would be greatly appreciated.

  • karlphillip
    karlphillip about 13 years
    I already know that, and it doesn't help me at all to disassemble the main function since the problem is locating it first.
  • Laurent G
    Laurent G about 13 years
    So your question is about localizing main. Getting instruction out of the binary flow is secondary. I misunderstood the question.
  • 0xC0000022L
    0xC0000022L about 13 years
    @karlphillip: it's about as far as you will get. The art of disassembly is to find out those things, even when you have no symbolic names. The file structure will on all platforms allow you to see where to start, but then it's entirely up to you to dig through the CRT code and find the main(). IDA, for example, uses signatures to automate this to a large part using a similar approach to the manual one suggested by Mat.
  • Moudis
    Moudis about 13 years
    If the binary is dynamically linked, you can still use ltrace to find __libc_start_main (calls main(), plus some setup), which will get you close.
  • karlphillip
    karlphillip about 13 years
    Indeed, if I had access to the code it would be fairly easy. All I would have to do is compile a version with symbols and either load that binary, or tell gdb where to load symbols from if I really had to debug the stripped version. The problem is that I really don't have the sources nor a non-stripped version of the application I'm interested in debugging. Thanks for your efforts. +1
  • DrBeco
    DrBeco about 13 years
    What you are looking for is a way to calculate the starting point, given an ELF. You need to take a deep look into ELF definition, and understand how much each section can move main down in the offset. But still you need to know the number and size of sections. That info files can help you a bit. If I find some update in the next days I comment here. Good luck.
  • DrBeco
    DrBeco about 13 years
    Another useful information is how to try a step-by-step without knowing where to set a breakpoint. You can use catch syscall write or just catch syscall, and then you try to run. It not always works, because the lack of context if the breakpoint is too early.
  • DrBeco
    DrBeco about 13 years
    I just tested it with the infamous command gdb gdb, info files, disas 0x44f400,0x44f429, disas 0x44f4f0,0x44f500 and it worked. ;)
  • DrBeco
    DrBeco about 13 years
    @karlphillip : Just to be sure, I tested with a non-C program. I compiled a FORTRAN test and could successfully find and disassembly main. BTW, we are fellow contryman. ;)
  • DrBeco
    DrBeco about 13 years
    Yeiy! 10 upvotes! I just earned my first "nice answer" badge. Thanks y'all. ;)
  • karlphillip
    karlphillip about 13 years
    Congratulations. Thank you and enjoy your bounty!
  • DrBeco
    DrBeco about 13 years
    Thanks for assigning it. My answer was older than your bounty, so I think I would not get the "half" bounty automatically. It is put to immediate use here!
  • Igor Skochinsky
    Igor Skochinsky about 13 years
    You don't need to guess the entrypoint address (do you think system loader does?). It's right there in the ELF header. See "Entry point address" in readelf -h output.
  • prathmesh.kallurkar
    prathmesh.kallurkar over 9 years
    Hey, I tried the unstrip command on the linux kernel binary. I used the command "unstrip -f vmlinux". However, it did not output anything. Since vmlinux is a special binary, should any option be provided to unstrip command ? Here is the binary under consideration dl.dropboxusercontent.com/u/56211033/vmlinux
  • robert
    robert over 8 years
    @DrBeco, excellent answer! By the way, how did you know in statement (gdb) disas 0x0000000000400524,0x0000000000400600 where is the end of main?
  • Ebrahim Ghasemi
    Ebrahim Ghasemi almost 7 years
    @robert I'm confused with that too.
  • DrBeco
    DrBeco almost 7 years
    From the 0x0000000000400440 - 0x0000000000400618 is .text. But bear in mind that the end of main may be somewhere else due to obfuscation. You really should follow the assembly and keep going until it makes sense. The important point is the entry point. From there, the PC (program counter) will keep going, following numbers as instructions (instead of the data segment where numbers are data).