Translation of machinecode into LLVM IR (disassembly / reassembly of X86_64. X86. ARM into LLVM bitcode)

16,574

Solution 1

mcsema is a production-quality binary lifter. It takes x86 and x86-64 and statically "lifts" it to LLVM IR. It's actively maintained, BSD licensed, and has extensive tests and documentation.

https://github.com/trailofbits/mcsema

Solution 2

Consider using RevGen tool developed within the S2E project. It allows converting x86 binaries to LLVM IR. The source code could be checked out from Revgen branch of GIT repository available by url https://dslabgit.epfl.ch/git/s2e/s2e.git.

Solution 3

As regards to RevGen tool mentioned by @bsa2000, this latest paper "A compiler level intermediate representation based binary analysis and rewriting system" has pointed out some limitations in S2E and Revinc.

I pull them out here.

  1. shortcoming of dynamic translation:

    S2E [16] and Revnic [14] present a method for dynamically translating x86 to LLVM using QEMU. Unlike our approach, these methods convert blocks of code to LLVM on the fly which limits the application of LLVM analyses to only one block at a time.

  2. IR incomplete:

    Revnic [14] and RevGen [15] recover an IR by merging the translated blocks, but the recovered IR is incomplete and is only valid for current execution; consequently, various whole program analyses will provide incomplete information.

  3. no abstract stack or promoting information

    Further, the translated code retains all the assumptions of the original bi- nary about the stack layout. They do not provide any methods for obtaining an abstract stack or promoting memory locations to symbols, which are essential for the application of several source-level analyses.

Solution 4

I doubt there will be universal solution (think about indirect branches, etc.), LLVM IR is much "higher level" than any assembler. Though it's possible to translate on per-BB basis. You might want to check llvm-qemu and libcpu projects among others.

Solution 5

There is new project, being in some early phases, The libbeauty: https://github.com/jcdutton/libbeauty

Article about project: Libbeauty: Another Reverse-Engineering Tool, 24 December 2013, Michael Larabel - http://www.phoronix.com/scan.php?page=news_item&px=MTU1MTU

It only supports subset of x86_64 as input now. One of the project goals - is to be able to compile the generated LLVM IR back to assembly to get the binary with same functionality.

Share:
16,574
Grzegorz Wierzowiecki
Author by

Grzegorz Wierzowiecki

Updated on June 03, 2022

Comments

  • Grzegorz Wierzowiecki
    Grzegorz Wierzowiecki almost 2 years

    I would like to translate X86_64, x86, ARM executables into LLVM IR (disassembly).

    What solution do you suggest ?