There are many tools that operate on LLVM bitcode. To use these tools, LLVM bitcode or original source code are required (LLVM bitcode can be obtained from the source code by a compiler). Sometimes however, only already compiled binaries are available – there is no standard and well-defined process to obtain LLVM bitcode from a binary. McSema is a tool that translates binaries into LLVM bitcode; it makes the tools applicable on previously unavailable targets. McSema itself is open-source, although it relies on proprietary third-party libraries to provide disassembly capabilities. This is problematic, as it prevents many users from using the software. This thesis provides alternative implementation to the proprietary component of McSema which uses open-source Dyninst disassembler. With this new implementation McSema can be used without proprietary software. The performance of the open source version is demonstrated on a set of programs and the results are compared with already existing components.
Decompiling Binaries into LLVM IR Using McSema and Dyninst
University
Faculty of Informatics
Date of Completion
spring 2019
Resources
Leader
Petr Ročkai
Student
Lukáš Korenčik