Decompiling Binaries into LLVM IR Using McSema and Dyninst

There are many tools that operate on LLVM bitcode. To use these tools, LLVM bitcode or original source code are required (LLVM bitcode can be obtained from the source code by a compiler). Sometimes however, only already compiled binaries are available – there is no standard and well-defined process to obtain LLVM bitcode from a binary. McSema is a tool that translates binaries into LLVM bitcode; it makes the tools applicable on previously unavailable targets. McSema itself is open-source, although it relies on proprietary third-party libraries to provide disassembly capabilities. This is problematic, as it prevents many users from using the software. This thesis provides alternative implementation to the proprietary component of McSema which uses open-source Dyninst disassembler. With this new implementation McSema can be used without proprietary software. The performance of the open source version is demonstrated on a set of programs and the results are compared with already existing components.

University

Faculty of Informatics

Date of Completion

spring 2019

Resources

Leader

Petr Ročkai

Student

Lukáš Korenčik