Have you ever wanted to analyze or mutate some C or C++ code? You can do some simple pattern matching with regular expressions, but there’s always some special case or another that will break your logic. To do it right, you need to develop an entire parser, perhaps using a formal grammar and a tool like Yacc. That’s a big job, though, just to change all the floats to doubles.
[Adrian Sampson] wrote a blog entry to make you go from “mostly uninterested in compilers to excited to use LLVM to do great work.” LLVM – the Low Level Virtual Machine compiler infrastructure — provides tools for a lot of languages, including CLANG for C and C++. [Adrian] points out a few key differences between LLVM and other compilers and tools you might use for a similar purpose:
- LLVM uses a consistent intermediate representation that is human-readable
- It is extremely modular
- It is both highly hackable and an industrial-strength, well-supported compiler
He points out that compiler tools aren’t just for compiling. You can use them to analyze source code, build simulators, and inject code for security or testing, among other things (speaking of security testing, check out the use of LLVM to analyze binaries for security issues in the video after the break). The high hackability of LLVM is due to its modular nature. By default, a front end chews up the C or C++ code into the intermediate representation. Then multiple passes can modify the representation before handing it off for the next pass. The final pass does actual code generation for the target processor.
It is fairly easy to add passes, and [Adrian] even provides a template and an example. Naturally, you could use a pass to do some kind of exotic or experimental optimization. But you can also use a pass to analyze or modify code for reasons other than code optimization.
We recently talked about the difficulties of building infrastructure around a custom CPU. LLVM was mentioned, and [Adrian’s] template and examples would be a big head start in creating a custom C compiler. CLANG can even generate code for the Arduino which, after all, is just C++ code.