Have you ever wanted to analyze or mutate some C or C++ code? You can do some simple pattern matching with regular expressions, but there’s always some special case or another that will break your logic. To do it right, you need to develop an entire parser, perhaps using a formal grammar and a tool like Yacc. That’s a big job, though, just to change all the floats to doubles.
[Adrian Sampson] wrote a blog entry to make you go from “mostly uninterested in compilers to excited to use LLVM to do great work.” LLVM – the Low Level Virtual Machine compiler infrastructure — provides tools for a lot of languages, including CLANG for C and C++. [Adrian] points out a few key differences between LLVM and other compilers and tools you might use for a similar purpose:
- LLVM uses a consistent intermediate representation that is human-readable
- It is extremely modular
- It is both highly hackable and an industrial-strength, well-supported compiler
He points out that compiler tools aren’t just for compiling. You can use them to analyze source code, build simulators, and inject code for security or testing, among other things (speaking of security testing, check out the use of LLVM to analyze binaries for security issues in the video after the break). The high hackability of LLVM is due to its modular nature. By default, a front end chews up the C or C++ code into the intermediate representation. Then multiple passes can modify the representation before handing it off for the next pass. The final pass does actual code generation for the target processor.
It is fairly easy to add passes, and [Adrian] even provides a template and an example. Naturally, you could use a pass to do some kind of exotic or experimental optimization. But you can also use a pass to analyze or modify code for reasons other than code optimization.
We recently talked about the difficulties of building infrastructure around a custom CPU. LLVM was mentioned, and [Adrian’s] template and examples would be a big head start in creating a custom C compiler. CLANG can even generate code for the Arduino which, after all, is just C++ code.
I really liked the linked article. It’s gotten me curious, though most of my software work is on Windows applications so I don’t know when I’ll get to playing around with any of it.
llvm for windows: http://llvm.org/docs/GettingStartedVS.html (no personal experience) (uses visual studio!)
LLVM works very fine on Windows, as does clang. It’s Visual Studio Compiler compatibility is getting better by the minute. For relatively small projects, it should work without a hassle. I would first try to build the project with clang to make sure it understands your code. Then you can use the clang API and tool to refactor, mutate or analyze your code.
Have fun!