Hack your C++ with LLVM

Have you ever wanted to analyze or mutate some C or C++ code? You can do some simple pattern matching with regular expressions, but there’s always some special case or another that will break your logic. To do it right, you need to develop an entire parser, perhaps using a formal grammar and a tool like Yacc. That’s a big job, though, just to change all the floats to doubles.

[Adrian Sampson] wrote a blog entry to make you go from “mostly uninterested in compilers to excited to use LLVM to do great work.” LLVM – the Low Level Virtual Machine compiler infrastructure — provides tools for a lot of languages, including CLANG for C and C++. [Adrian] points out a few key differences between LLVM and other compilers and tools you might use for a similar purpose:

  • LLVM uses a consistent intermediate representation that is human-readable
  • It is extremely modular
  • It is both highly hackable and an industrial-strength, well-supported compiler

He points out that compiler tools aren’t just for compiling. You can use them to analyze source code, build simulators, and inject code for security or testing, among other things (speaking of security testing, check out the use of LLVM to analyze binaries for security issues in the video after the break). The high hackability of LLVM is due to its modular nature. By default, a front end chews up the C or C++ code into the intermediate representation. Then multiple passes can modify the representation before handing it off for the next pass. The final pass does actual code generation for the target processor.

It is fairly easy to add passes, and [Adrian] even provides a template and an example. Naturally, you could use a pass to do some kind of exotic or experimental optimization. But you can also use a pass to analyze or modify code for reasons other than code optimization.

We recently talked about the difficulties of building infrastructure around a custom CPU. LLVM was mentioned, and [Adrian’s] template and examples would be a big head start in creating a custom C compiler. CLANG can even generate code for the Arduino which, after all, is just C++ code.

3 thoughts on “Hack your C++ with LLVM

  1. I really liked the linked article. It’s gotten me curious, though most of my software work is on Windows applications so I don’t know when I’ll get to playing around with any of it.

      1. LLVM works very fine on Windows, as does clang. It’s Visual Studio Compiler compatibility is getting better by the minute. For relatively small projects, it should work without a hassle. I would first try to build the project with clang to make sure it understands your code. Then you can use the clang API and tool to refactor, mutate or analyze your code.
        Have fun!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s