Misconceptions About Loops, Or: Static Code Analysis Is Hard

When thinking about loops in programming languages, they often get simplified down to a conditions section and a body, but this belies the dizzying complexity that emerges when considering loop edge cases within the context of static analysis. A paper titled Misconceptions about Loops in C by [Martin Brain] and colleagues as presented to SOAP 2024 conference goes through a whole list of false assumptions when it comes to loops, including for languages other than C. Perhaps most interesting is the conclusion that these ‘edge cases’ are in fact a lot more common than generally assumed, courtesy of how creative languages and their users can be when writing their code, with or without dragging in the meta-language of C’s preprocessor.

Assumptions like loop equivalence can fall apart when considering the CFG ( control flow graph) interpretation versus a parse tree one where the former may e.g. merge loops. There are also doozies like assuming that the loop body will always exist, that the first instruction(s) in a loop are always the entry point, and the horrors of estimating loop exits in the context of labels, inlined functions and more. Some languages have specific loop control flow features that differ from C (e.g. Python’s for/else and Ada’s loop), all of which affect a static analysis.

Ultimately, writing a good static analysis tool is hard, and there are plenty of cases where it’s likely to trip up and give an invalid result. A language which avoids ambiguity (e.g. Ada) helps immensely here, but for other languages it helps to write your code as straightforward as possible to give the static analysis tool a fighting chance, or just get really good at recognizing confused static analysis tool noises.

(Heading image: Control flow merges can create multiple loop entry
edges (Credit: Martin Brand, et al., SOAP 2024) )

39 thoughts on “Misconceptions About Loops, Or: Static Code Analysis Is Hard

  1. all that paper showed is how poorly people write code without thinking. Far to many coders focus on just getting the job done as fast as possible, never thinking of the what if scenarios.

    1. Static analysis is there to process *any* code, regardless of whether it was written by God or by that programmer who got fired last month for sheer incompetence. Or if it was run through a deliberate obfuscation step, or modern optimisers that sacrifice semantic clarity for absolute performance (some might say they’re the same thing…)

      After all, if everyone wrote perfect code in the first place, nobody would need static analysis tools.

  2. Most of my loops fail because of using > instead of >= (or < instead of <=) or vice versa. The complicated things we think about thoroughly , the easy things we tend to ignore resulting in some silly buy nasty bugs.

  3. The paper itself is one big misconception. The following example doesn’t need a do-while. A simple while is sufficient:
    do {
    o p e n _ s o c k e t ( ) ;
    i f ( c o n n e c t ( ) == SUCCESS ) { b r e a k ; }
    c l o s e _ s o c k e t ( ) ;
    } w h i l e ( 1 ) ;
    Like so:
    w h i l e ( 1 ) {
    o p e n _ s o c k e t ( ) ;
    i f ( c o n n e c t ( ) == SUCCESS ) { b r e a k ; }
    c l o s e _ s o c k e t ( ) ;
    }

    (Such a loop without delays, timeouts and limits in retries is poorly written code. But that’s beside the point.)

        1. while(1) does the same. It always executes at least once. If you don’t want to execute you can break. Or you can create a boolean with the condition. and set that to true initially. I often see do-while used in macros as it is the only way to create multiline macros. But I avoid macros like the plague. I prefer inline functions.

        2. Normally used construction is:

          do{ … if(something) { break; } …}while(0);

          which can be rewritten(but not every time, and the code can look anti-logic):

          { … if(something) { break; } …};

          And it is used to perform something once with jumping out to error check in case of that error the while(1); is stupid.

      1. “do { } while(0)” is needed if GOTO is banned. You can break from the loop with “break” at any expression, skipping the later part. This isn’t a loop by itself, it’s smaller than a function (no epilog, no prolog, no stack saving/loading) and it can access all the “local” variable of the function it’s running into.

          1. False. The entire C language is never needed, as everything you can do in c can be done in macros and assembly.

            If you’re going to start using synctactic sugar, it’s silly to complain about the syntactic sugar that someone else uses.

    1. You could at least try reading the paper before posting. The code you quoted is given as an example of something that syntactically looks like an infinite loop, but may still terminate (through e.g. a longjmp()). Rewriting it as a while()-loop doesn’t change that.

  4. I like many got taught not to use goto’s in our code, but in later years, you realise that goto’s can make far more legible code if used correctly, and not using them creates deeply nested spaghetti that is largely unmaintainable.

    There are a lot of “theoretical” advice that goes against good practice. For example MISRA C did (or still does) mandates a single return point. However doing intial checks (e.g NULL pointer) at the top of the code and exiting on fail makes much better code than entering a giant if..else structure

    1. If I’m only allowed 1 return and don’t want nesting I often do this:

      bool success = true;

      if (success)
      {
      success = step1();
      }

      if (success)
      {
      success = step2();
      }

      if (success)
      {
      success = step3();
      }

      return success;

      The compiler will optimize it.

      1. > The compiler will optimize it.

        Since the goal isn’t for the compiler to understand your code, but to the (next) developer to understand it, when reading this misrable code, I just make my own opinion on your poor competencies. You could have written `if (!step1() || !step2() || !step3()) return false;` in a single line without creating a useless variable and keeping code readable without having to file your middle finger on the scroll wheel.

        1. “Since the goal isn’t for the compiler to understand your code, but to the (next) developer to understand it, when reading this misrable[sic] code, ”
          It’s perfectly readable.

          “I just make[sic] my own opinion on your poor competencies.”
          I have over 10 years of experience in embedded programming in C and C++. I rarely use goto or do-while. I always try to use a while instead of a do-while. I’ve written code following many different coding conventions and passing multiple static analysis tools. My code builds with 0 warnings and runs for years without a problem.

          “You could have written `if (!step1() || !step2() || !step3()) return false;` in a single line”
          How is writing everything in a single line more readable? You are contradicting yourself.
          I wrote a simple code snippet, but in practice the code blocks contain more steps and it wouldn’t work in a single line. It was intended as a replacement of nested ifs.
          Also your code always returns false. That’s literally guaranteed failure.

          “without creating a useless variable”
          Not useless. Very useful in case of debugging. You can put an error code in there. And in a release version it would be optimized away.

          “and keeping code readable without having to file your middle finger on the scroll wheel.”
          It only adds a single line of code compared to nested ifs. If this tiny code snippet requires you to scroll you need to buy a monitor with higher vertical resolution.

        2. if step1/step2/step3 are longer than 7 characters each then your proposed merger may be less readable. even if they’re brief, if they are logically separate, it can be better to spread them out over multiple lines.

          i’m a huge fan of trying to keep code concise, especially in terms of vertical space. i want to see a meaningful chunk of it in one screenful. and i generally detest variables like ‘success’ or ‘done’ that exist just to hold a boolean. really they should be expressed as control flow. but every single one of my preferences is fungible…they all have to be balanced against eachother in a specific context. i will sometimes use goto, or i will use sequences of repeated tests of a success variable.

          code is an artform and we can get advantage from using the whole palette and all of the brushes too.

        3. You function would get a better maintainability score.
          Using the count control flow metric. (Name of that metric escapes me).

          But that proves that all metrics can be gamed.

          Your counting on C behavior to determine how many of the steps are actually executed.
          Expecting subsequent coders to also know that behavior.

        4. > You could have written `if (!step1() || !step2() || !step3()) return false;`

          At that point I’d skip the double inversion and just write `return step1() && step2() && step2();`

    2. Rules exist for good reasons, and just because there exist cases where breaking them is a better solution, it doesn’t mean that the rule shouldn’t exist. They exist to give you a safe, simple answer to a complex problem, so you can allocate your finite resources to more important issues.

      If you don’t know the problems goto can cause, you’ll shoot yourself in the foot 99 times out of 100. It’s only when you have a good understanding of the advantages and disadvantages, *and* can adequately justify the decision, that breaking the rule becomes a sensible choice.

      The linux kernel deliberately favours gotos to avoid messy nests of conditionals… but the key word there is “deliberate”. If you just saw a goto in the kernel, decided “hey, if linux uses them, the rule against goto’s must be shit so I can ignore it” and tried to submit a patch… Linus would (rightly) tear you a new one. (…if it somehow got past all the reviewers who are there to protect him from dealing with such garbage.)

      Something like MISRA C has a dual purpose as well – it’s not simply a set of carved-in-stone commandments you have to follow or be sent to hell, it’s a style guide. Having a consistent style across a project that can be adhered to and validated against is “best practise” at a much higher abstraction level than just how a particular function can be made better. What the style is, or if it’s even “good” doesn’t even really matter. There’s nothing worse than a programmer in a team that decides they know better and pollutes a consistent codebase with edits that only care about themselves and not those of the team as a whole.

  5. This is why I generally avoid For loops, and use while/do with if/else break points. This way I can customize when a loop break check happens, and am required to think about how and why the loop should terminate. Edge cases, like zero or negative numbers for indexes, can be built into the loop deliberately.

  6. i guess i don’t understand the audience for this paper. static analysis *is* hard but it seems like the punchline (which they don’t really state explicitly??) is that correct static analysis has to use the control flow graph. correct analysis of a parse tree is only possible if you essentially reproduce all the characteristics of the CFG. to which i say, duh.

    i guess i’m biased coming from a compilers background. every optimizer, once it starts to consider loops, has to use a control flow graph. and it helps to be rigid — if something doesn’t meet your definition of a loop, it isn’t a loop, QED. you don’t want to start breaking your algorithms to accept malformed loops. loop-optimization algorithms are already complicated enough even with good narrow definitions. better to fail to optimize something because the programmer got too creative than to break something because you rushed in where angels fear to tread.

    i guess the problem is faced by the sort of tools that are meant to provide source-level feedback to the programmers. they will want a diagnostic message that refers to the original source, and will want validation of intuitive source-level constructs. but again i say duh: ornament your control flow graph with source information, so that you can reconstruct source-level error messages from your control flow visit. but the visit must be driven by control flow! alternatively, if it’s really what is desired, you can just eagerly and loudly reject anything that doesn’t conform to your narrow definition of a loop…if really what you’re doing isn’t comprehensive static analysis but rather enforcing a style guide.

      1. Because replies to deleted comments don’t have anywhere to be shown, they just disappear. This sucks, but it’s a limitation of our comment system.

        If you find yourself replying to obvious trolling or negative BS, just stop and hit “report comment” instead. Don’t waste your time/effort on the negative folks, and certainly don’t follow up their crap with insightful commentary. :)

  7. “but for other languages it helps to write your code as straightforward as possible to give the static analysis tool a fighting chance”

    That is a bit like: “Please adjust your driving to give self-driving cars a chance.” (my AI works fine, as long as we put all cars on rails and have no humans make decisions, buy my product!)

    Maybe that static analysis tool made some bad assumption about how to convey code.

Leave a Reply

Your email address will not be published. Required fields are marked *

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.