Have you ever wanted to analyze or mutate some C or C++ code? You can do some simple pattern matching with regular expressions, but there’s always some special case or another that will break your logic. To do it right, you need to develop an entire parser, perhaps using a formal grammar and a tool like Yacc. That’s a big job, though, just to change all the floats to doubles.
[Adrian Sampson] wrote a blog entry to make you go from “mostly uninterested in compilers to excited to use LLVM to do great work.” LLVM – the Low Level Virtual Machine compiler infrastructure — provides tools for a lot of languages, including CLANG for C and C++. [Adrian] points out a few key differences between LLVM and other compilers and tools you might use for a similar purpose:
- LLVM uses a consistent intermediate representation that is human-readable
- It is extremely modular
- It is both highly hackable and an industrial-strength, well-supported compiler
He points out that compiler tools aren’t just for compiling. You can use them to analyze source code, build simulators, and inject code for security or testing, among other things (speaking of security testing, check out the use of LLVM to analyze binaries for security issues in the video after the break). The high hackability of LLVM is due to its modular nature. By default, a front end chews up the C or C++ code into the intermediate representation. Then multiple passes can modify the representation before handing it off for the next pass. The final pass does actual code generation for the target processor.