Warnings On Steroids – Static Code Analysis Tools

A little while back, we were talking about utilizing compiler warnings as first step to make our C code less error-prone and increase its general stability and quality. We know now that the C compiler itself can help us here, but we also saw that there’s a limit to it. While it warns us about the most obvious mistakes and suspicious code constructs, it will leave us hanging when things get a bit more complex.

But once again, that doesn’t mean compiler warnings are useless, we simply need to see them for what they are: a first step. So today we are going to take the next step, and have a look at some other common static code analysis tools that can give us more insight about our code.

You may think that voluntarily choosing C as primary language in this day and age might seem nostalgic or anachronistic, but preach and oxidate all you want: C won’t be going anywhere. So let’s make use of the tools we have available that help us write better code, and to defy the pitfalls C is infamous for. And the general concept of static code analysis is universal. After all, many times a bug or other issue isn’t necessarily caused by the language, but rather some general flaw in the code’s logic.

Compiler Warnings Recap

But let’s first take a step back again to compiler warnings. If we recall the nonnull attribute which indicates that a function’s parameter can’t and therefore won’t be NULL, we saw that the compiler’s perspective is extremely shortsighted on it:

extern void foo(char *) __attribute__((nonnull));

void bar(void) {
    char *ptr = NULL;

    foo(NULL); // warning
    foo(ptr);  // no warning here
}

The compiler will warn about the foo(NULL) call, as it is an obvious violation of the nonnull declaration, but it won’t realize that the second call will eventually also pass NULL as parameter. To be fair though, why should it understand that, its primary job is to generate a machine-readable executable from our source code?

Now, this example is a rather clear case, and while the compiler may not warn about it, it is still easy to spot. If you have decent code review practices in place, it should be straightforward to detect the mishap. But sometimes it’s just us by ourselves, no other developer to review our code, and due to tiredness or other reasons, it might simply slip by our eyes. Other times, the potential issue hiding underneath is a lot less obvious, and it might take a whole series of unfortunate events for it to become an actual problem. We’d have to go mentally through every possible execution path to be sure it’s all good.

Either way, it rather sounds like a waste of time to use manual labor for something that practically screams for automatization. So let’s have a look at a few common tools made just for that. Note that we’ll be merely scratching the surface here, consider this more a brief overview of what tools are available.

Static Code Analysis Tools

Static code analysis involves inspecting our program just by analyzing its source code, without ever executing it. For example, it won’t consider the actual data that is processed in a set of functions, but instead make sure that data is passed along and handled in a safe and logical way. This is certainly a subject where throwing money at the problem will get you bigger and shinier tools, and while they have their place in the professional world, we’ll focus on the everyday hacker tinkering on their free time projects, and see what the open source community has to offer.

While the initial example was good to recall the shortcomings of compiler warnings, demonstrating the full strength of the other tools cannot be done with a simple scenario. The best way is to see for yourself by using them on either your own code, some other tools and programs you frequently compile or use, or then browsing for some random projects on GitHub and the likes.

clang

Yes, let’s start with clang. But before you start to groan and think “drop the compiler warnings already and move on”, there’s more to clang than its compiler infrastructure, such as its own static code analyzer. It supports the same targets clang does, and can be invoked by preceding your usual build command with the scan-build command.

$ scan-build clang -o foo foo.c

The analyzer doesn’t necessarily require clang as compiler, so this will work as well:

$ scan-build gcc -o foo foo.c

Or then you just run make:

$ scan-build make
...
scan-build: n bugs found
scan-build: Run 'scan-view /tmp/scan-build-xyz' to examine bug reports.
$

While you can’t simply pass a list of source files to scan-build, but rather need to perform an actual build, it has the advantage that the compilation and analysis are done at the same time. This makes the analysis part of the build process itself, instead of some tedious extra task you should always remember about. After all, it’s up to us to actually use and act on what the tools can provide us. The less they interfere with our flow, the less reluctant we might be to eventually use them and see what they have to say.

Speaking of seeing what they have to say, if you take another look at the last output line scan-build displays, you will find a command to display the results of the analysis. Behind the scan-view command is a simple Python script that starts a local web server and opens the report overview page in your browser. You’ll get more or less the same if you just open file:///tmp/scan-build-xyz/index.html in your browser, and in case you despise anything that doesn’t run in a terminal, this works well enough in your common text mode browsers.

When running scan-build, it might for example output that in a specific place NULL might be passed somewhere where it shouldn’t be, but it won’t tell you under which circumstances. The great thing about the browser-based report here is that you can navigate through the code and follow step by step, for each loop and condition branch, how a potential issue might turn into a bug. Keep in mind that the program is never actually run, so you might encounter some false positives that are never a valid or possible scenario in reality. The other way around, each tool has a different focus, so some issues might not even be considered.

Static code analysis is by no means a one-size-fits-all job, so it won’t hurt to use more than a single tool for it. Well, let’s move on to the next one then.

(sp)lint

The probably best known tool for static code analysis is lint, which has somewhat become a synonym for static code analysis itself. In your average Linux distribution, you should find splint as one implementation of it. Unlike clang‘s static analyzer, splint will take the source files and analyzes them without running any compilation.

$ splint foo.c
...
Finished checking --- 3 code warnings
$

splint is a quite complex tool with plenty of flags to enable and disable checks, and control its behavior. It also comes with its own source code annotations defined with a special formatted comment /*@annotation@*/ that will influence what is analyzed and reported. Whether you like this sort of (debatable) noise in your code is of course up to you.

You should probably be aware though that the latest release of splint is from 2007. Of course, that doesn’t mean it’s outdated, plenty of potential issues are timeless and have been around for longer than the last 11 years. Theoretically, you should also be able to use splint for code targeting for example AVR microcontrollers, but that might have some emphasis on the “theoretical” part. It will generally take a lot of tweaking and digging through the output to get the most out of it. If you are curious and persistent enough, the splint manual is probably a good place to start.

flawfinder

As mentioned before, every tool usually has a different focus area. In case of flawfinder, that focus is security vulnerabilities, Common Weakness Enumerations (CWE) in particular. While this offers a generally good overview of insecure C functions and practices, it mainly warns whenever a dangerous construction is detected. It doesn’t seem to check if there is an actual problem in the code, just that there might be in case you end up using it wrong.

Nevertheless, there is a reason for the word common in CWE, so even though you made sure everything is okay with your current implementation, it doesn’t hurt to be reminded every once in a while about those common weaknesses, without proactively digging through every man page. And on a side note, the author or flawfinder has also written a book about secure programming and released it under the GNU Free Documentation License, in case you want to read up some more on that topic.

cppcheck

The last tool we’ll be mentioning, albeit its misleading name, is cppcheck, which covers both C++ and C, and focuses on undefined behavior. If you can afford or already possess the MISRA rule texts, you can include them as well. Some of them are also covered out of the box, and of course, it’s still a fully functional code analyzer even without purchasing the rule texts.

cppcheck also lets you write your own rules, and reports its finding either as custom formattable text, or as XML, and offers integration to most common IDEs. And in case you want to click something every once in a while, or are otherwise somewhat put off by wading through walls of console text, it also comes with a graphical user interface as alternative to the command line, which will show the reported issues along with the matching source code.

Honorable Mention

One more tool that sounds promising and might be worth looking into is frama-c.

Limitations

Clearly, no single tool can analyze and detect every possible flaw, otherwise the list would have been a lot shorter. And just as some tools will miss some issues, they can also overcompensate by enthusiasticly reporting what turn out to be false positives. As mentioned before, you need to decide for yourself which warning you consider valid and something you need to address. This may seem tedious and a waste of time — exactly what the tools were supposed to help you avoid. And maybe it often is, but it will also help you to better understand your own code, and see some of its implications from an angle you never may have considered. And when it does find a rare bug, it’ll pay off.

After some initial fiddling with the tools, you will also notice that some of them will require a lot of tweaking to get the most out of them, as was already mentioned for splint. So it’s again up to you to weigh whether investing that time will be worth in the long run. Unlike compiler warnings, getting rid of each and every warning from code analysis tools might not be the most rewarding process, especially when so many are false positives. Coders discretion is advised.

Of course, static code analysis has by design the limitation that actual data and its meaning is neither considered nor checked. An int is an int, and as long as we don’t cause an overflow or other operations that violate the language specification or end in undefined territory, we’ll be most likely good to go from static code analysis point of view. It won’t detect or care if the int‘s value must be in a certain range in order to make sense and cause no harm in the rest of the program’s context, for instance. We’d have to actually execute our code to know what’s happening there. So with that being said, next time we will talk about assertions, and why it’s often better to go out with a bang early on.

10 thoughts on “Warnings On Steroids – Static Code Analysis Tools

  1. Just want to point out there is no substitute for dynamic analysis tools. I’ve worked on many embedded projects targeted for architectures other than x86 and made a point of keeping the code portable enough it could still be run on x86/Linux for frequent valgrind testing. Saved a lot of headaches over the years.

    1. I completely agree with keeping it building for x86 and Linux. I’ve used valgrind and helgrind a bit but compiling with asan, tsan and ubsan in recent versions of GCC/clang have been more useful to me in recent years.

      Oh and American fuzzy lop finds all the crash bugs

  2. Static analysis provides a lot of benefits to the quality of software. It also has a lot of impact in the security of software. A great example here are things like buffer overruns, and/or data taint, which are notorious to lead to exploitable weaknesses by overwriting sensitive areas, or by reading sensitive data.

    Some examples here: http://blogs.grammatech.com/what-is-taint-checking

    As mentioned, there are many open source and commercial static analyzers and there is now a single interchange format to bind them all: https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=sarif

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.