Misconceptions About Loops, Or: Static Code Analysis Is Hard

July 9, 2024

When thinking about loops in programming languages, they often get simplified down to a conditions section and a body, but this belies the dizzying complexity that emerges when considering loop edge cases within the context of static analysis. A paper titled Misconceptions about Loops in C by [Martin Brain] and colleagues as presented to SOAP 2024 conference goes through a whole list of false assumptions when it comes to loops, including for languages other than C. Perhaps most interesting is the conclusion that these ‘edge cases’ are in fact a lot more common than generally assumed, courtesy of how creative languages and their users can be when writing their code, with or without dragging in the meta-language of C’s preprocessor.

Assumptions like loop equivalence can fall apart when considering the CFG ( control flow graph) interpretation versus a parse tree one where the former may e.g. merge loops. There are also doozies like assuming that the loop body will always exist, that the first instruction(s) in a loop are always the entry point, and the horrors of estimating loop exits in the context of labels, inlined functions and more. Some languages have specific loop control flow features that differ from C (e.g. Python’s for/else and Ada’s loop), all of which affect a static analysis.

Ultimately, writing a good static analysis tool is hard, and there are plenty of cases where it’s likely to trip up and give an invalid result. A language which avoids ambiguity (e.g. Ada) helps immensely here, but for other languages it helps to write your code as straightforward as possible to give the static analysis tool a fighting chance, or just get really good at recognizing confused static analysis tool noises.

(Heading image: Control flow merges can create multiple loop entry
edges (Credit: Martin Brand, et al., SOAP 2024) )

39 thoughts on “Misconceptions About Loops, Or: Static Code Analysis Is Hard”

Mike says:

July 9, 2024 at 7:43 pm

all that paper showed is how poorly people write code without thinking. Far to many coders focus on just getting the job done as fast as possible, never thinking of the what if scenarios.

Report comment

Reply
1. pelrun says:
  
  July 10, 2024 at 5:18 am
  
  Static analysis is there to process *any* code, regardless of whether it was written by God or by that programmer who got fired last month for sheer incompetence. Or if it was run through a deliberate obfuscation step, or modern optimisers that sacrifice semantic clarity for absolute performance (some might say they’re the same thing…)
  
  After all, if everyone wrote perfect code in the first place, nobody would need static analysis tools.
  
  Report comment
  
  Reply
HaHa says:

July 9, 2024 at 8:21 pm

Example contains ‘Goto’.

That’s bad!

Rewrite using ‘ComeFrom’

Report comment

Reply
1. LordNothing says:
  
  July 9, 2024 at 10:43 pm
  
  goto is for c coders who think they have asm chops.
  
  Report comment
  
  Reply
  1. KDawg says:
    
    July 9, 2024 at 11:14 pm
    
    Having grown up in the 80’s using goto outside of being 9 years old was taboo
    
    Report comment
    
    Reply
    1. limroh says:
      
      July 12, 2024 at 4:47 am
      
      +1
      
      but: … after being 9 years old was taboo.
      
      You are allowed to use “goto” 0-9 (including 9). ;-)
      
      Report comment
      
      Reply
2. Marcos says:
  
  July 10, 2024 at 8:47 am
  
  Yep, most of the hard things to deal with are stuff Djikstra was in the 80s already asking people to not include on their languages.
  
  Report comment
  
  Reply
3. abjq says:
  
  July 10, 2024 at 9:25 am
  
  My goto here is setjmp()/longjmp()
  
  Report comment
  
  Reply
4. wscp says:
  
  July 10, 2024 at 10:47 am
  
  and don’t forget to say please. oh, but do it too much and the compiler will leave in a huff.
  
  Report comment
  
  Reply
Jan says:

July 9, 2024 at 11:47 pm

Most of my loops fail because of using > instead of >= (or < instead of <=) or vice versa. The complicated things we think about thoroughly , the easy things we tend to ignore resulting in some silly buy nasty bugs.

Report comment

Reply
C says:

July 10, 2024 at 1:08 am

The paper itself is one big misconception. The following example doesn’t need a do-while. A simple while is sufficient:
do {
o p e n _ s o c k e t ( ) ;
i f ( c o n n e c t ( ) == SUCCESS ) { b r e a k ; }
c l o s e _ s o c k e t ( ) ;
} w h i l e ( 1 ) ;
Like so:
w h i l e ( 1 ) {
o p e n _ s o c k e t ( ) ;
i f ( c o n n e c t ( ) == SUCCESS ) { b r e a k ; }
c l o s e _ s o c k e t ( ) ;
}

(Such a loop without delays, timeouts and limits in retries is poorly written code. But that’s beside the point.)

Report comment

Reply
1. hammarbytp says:
  
  July 10, 2024 at 3:01 am
  
  Its hard to think of many cases where a do{}while is really needed
  
  Report comment
  
  Reply
  1. Doctor Duck says:
    
    July 10, 2024 at 5:24 am
    
    Isn’t the point of ‘do’ that the loop contents are guaranteed to execute at least once?
    
    Report comment
    
    Reply
    1. C says:
      
      July 10, 2024 at 6:46 am
      
      while(1) does the same. It always executes at least once. If you don’t want to execute you can break. Or you can create a boolean with the condition. and set that to true initially. I often see do-while used in macros as it is the only way to create multiline macros. But I avoid macros like the plague. I prefer inline functions.
      
      Report comment
      
      Reply
    2. PP says:
      
      July 10, 2024 at 6:57 am
      
      Normally used construction is:
      
      do{ … if(something) { break; } …}while(0);
      
      which can be rewritten(but not every time, and the code can look anti-logic):
      
      { … if(something) { break; } …};
      
      And it is used to perform something once with jumping out to error check in case of that error the while(1); is stupid.
      
      Report comment
      
      Reply
  2. sweethack says:
    
    July 10, 2024 at 7:05 am
    
    “do { } while(0)” is needed if GOTO is banned. You can break from the loop with “break” at any expression, skipping the later part. This isn’t a loop by itself, it’s smaller than a function (no epilog, no prolog, no stack saving/loading) and it can access all the “local” variable of the function it’s running into.
    
    Report comment
    
    Reply
    1. C says:
      
      July 10, 2024 at 7:33 am
      
      false. It’s never needed. Except in multi line macros.
      
      Report comment
      
      Reply
      1. BCPL says:
        
        August 3, 2024 at 10:23 am
        
        False. The entire C language is never needed, as everything you can do in c can be done in macros and assembly.
        
        If you’re going to start using synctactic sugar, it’s silly to complain about the syntactic sugar that someone else uses.
        
        Report comment
2. asdf says:
  
  July 10, 2024 at 6:36 am
  
  You could at least try reading the paper before posting. The code you quoted is given as an example of something that syntactically looks like an infinite loop, but may still terminate (through e.g. a longjmp()). Rewriting it as a while()-loop doesn’t change that.
  
  Report comment
  
  Reply
  1. C says:
    
    July 10, 2024 at 6:49 am
    
    “Rewriting it as a while()-loop doesn’t change that.”
    Exactly my point. So use a while instead of a do-while as it’s not needed.
    
    Report comment
    
    Reply
    1. asdf says:
      
      July 10, 2024 at 7:36 am
      
      So your claim that the paper which you didn’t understand is “one big misconception” is based on a completely irrelevant tiny cosmetic non-issue?
      
      Report comment
      
      Reply
      1. V says:
        
        July 11, 2024 at 12:51 am
        
        No goid programmer uses a do while when a while is sufficient.
        
        Report comment
Zoe Nagy says:

July 10, 2024 at 1:47 am

goto is evil, just get a CS degree.

Report comment

Reply
hammarbytp says:

July 10, 2024 at 3:05 am

I like many got taught not to use goto’s in our code, but in later years, you realise that goto’s can make far more legible code if used correctly, and not using them creates deeply nested spaghetti that is largely unmaintainable.

There are a lot of “theoretical” advice that goes against good practice. For example MISRA C did (or still does) mandates a single return point. However doing intial checks (e.g NULL pointer) at the top of the code and exiting on fail makes much better code than entering a giant if..else structure

Report comment

Reply
1. Clara says:
  
  July 10, 2024 at 4:11 am
  
  MISRA C…so named because it’ll make you MISRAble. 😜
  
  Report comment
  
  Reply
  1. abjq says:
    
    July 10, 2024 at 9:27 am
    
    And writing code that complies gives you MRSA
    
    Report comment
    
    Reply
2. C says:
  
  July 10, 2024 at 5:02 am
  
  If I’m only allowed 1 return and don’t want nesting I often do this:
  
  bool success = true;
  
  if (success)
  {
  success = step1();
  }
  
  if (success)
  {
  success = step2();
  }
  
  if (success)
  {
  success = step3();
  }
  
  return success;
  
  The compiler will optimize it.
  
  Report comment
  
  Reply
  1. sweethack says:
    
    July 10, 2024 at 7:09 am
    
    > The compiler will optimize it.
    
    Since the goal isn’t for the compiler to understand your code, but to the (next) developer to understand it, when reading this misrable code, I just make my own opinion on your poor competencies. You could have written `if (!step1() || !step2() || !step3()) return false;` in a single line without creating a useless variable and keeping code readable without having to file your middle finger on the scroll wheel.
    
    Report comment
    
    Reply
    1. C says:
      
      July 10, 2024 at 7:49 am
      
      “Since the goal isn’t for the compiler to understand your code, but to the (next) developer to understand it, when reading this misrable[sic] code, ”
      It’s perfectly readable.
      
      “I just make[sic] my own opinion on your poor competencies.”
      I have over 10 years of experience in embedded programming in C and C++. I rarely use goto or do-while. I always try to use a while instead of a do-while. I’ve written code following many different coding conventions and passing multiple static analysis tools. My code builds with 0 warnings and runs for years without a problem.
      
      “You could have written `if (!step1() || !step2() || !step3()) return false;` in a single line”
      How is writing everything in a single line more readable? You are contradicting yourself.
      I wrote a simple code snippet, but in practice the code blocks contain more steps and it wouldn’t work in a single line. It was intended as a replacement of nested ifs.
      Also your code always returns false. That’s literally guaranteed failure.
      
      “without creating a useless variable”
      Not useless. Very useful in case of debugging. You can put an error code in there. And in a release version it would be optimized away.
      
      “and keeping code readable without having to file your middle finger on the scroll wheel.”
      It only adds a single line of code compared to nested ifs. If this tiny code snippet requires you to scroll you need to buy a monitor with higher vertical resolution.
      
      Report comment
      
      Reply
    2. Greg A says:
      
      July 10, 2024 at 8:12 am
      
      if step1/step2/step3 are longer than 7 characters each then your proposed merger may be less readable. even if they’re brief, if they are logically separate, it can be better to spread them out over multiple lines.
      
      i’m a huge fan of trying to keep code concise, especially in terms of vertical space. i want to see a meaningful chunk of it in one screenful. and i generally detest variables like ‘success’ or ‘done’ that exist just to hold a boolean. really they should be expressed as control flow. but every single one of my preferences is fungible…they all have to be balanced against eachother in a specific context. i will sometimes use goto, or i will use sequences of repeated tests of a success variable.
      
      code is an artform and we can get advantage from using the whole palette and all of the brushes too.
      
      Report comment
      
      Reply
    3. HaHa says:
      
      July 10, 2024 at 5:08 pm
      
      You function would get a better maintainability score.
      Using the count control flow metric. (Name of that metric escapes me).
      
      But that proves that all metrics can be gamed.
      
      Your counting on C behavior to determine how many of the steps are actually executed.
      Expecting subsequent coders to also know that behavior.
      
      Report comment
      
      Reply
    4. C++ says:
      
      July 20, 2024 at 10:52 am
      
      > You could have written `if (!step1() || !step2() || !step3()) return false;`
      
      At that point I’d skip the double inversion and just write `return step1() && step2() && step2();`
      
      Report comment
      
      Reply
3. pelrun says:
  
  July 10, 2024 at 5:09 am
  
  Rules exist for good reasons, and just because there exist cases where breaking them is a better solution, it doesn’t mean that the rule shouldn’t exist. They exist to give you a safe, simple answer to a complex problem, so you can allocate your finite resources to more important issues.
  
  If you don’t know the problems goto can cause, you’ll shoot yourself in the foot 99 times out of 100. It’s only when you have a good understanding of the advantages and disadvantages, *and* can adequately justify the decision, that breaking the rule becomes a sensible choice.
  
  The linux kernel deliberately favours gotos to avoid messy nests of conditionals… but the key word there is “deliberate”. If you just saw a goto in the kernel, decided “hey, if linux uses them, the rule against goto’s must be shit so I can ignore it” and tried to submit a patch… Linus would (rightly) tear you a new one. (…if it somehow got past all the reviewers who are there to protect him from dealing with such garbage.)
  
  Something like MISRA C has a dual purpose as well – it’s not simply a set of carved-in-stone commandments you have to follow or be sent to hell, it’s a style guide. Having a consistent style across a project that can be adhered to and validated against is “best practise” at a much higher abstraction level than just how a particular function can be made better. What the style is, or if it’s even “good” doesn’t even really matter. There’s nothing worse than a programmer in a team that decides they know better and pollutes a consistent codebase with edits that only care about themselves and not those of the team as a whole.
  
  Report comment
  
  Reply
SETH says:

July 10, 2024 at 6:31 am

This is why I generally avoid For loops, and use while/do with if/else break points. This way I can customize when a loop break check happens, and am required to think about how and why the loop should terminate. Edge cases, like zero or negative numbers for indexes, can be built into the loop deliberately.

Report comment

Reply
Greg A says:

July 10, 2024 at 8:28 am

i guess i don’t understand the audience for this paper. static analysis *is* hard but it seems like the punchline (which they don’t really state explicitly??) is that correct static analysis has to use the control flow graph. correct analysis of a parse tree is only possible if you essentially reproduce all the characteristics of the CFG. to which i say, duh.

i guess i’m biased coming from a compilers background. every optimizer, once it starts to consider loops, has to use a control flow graph. and it helps to be rigid — if something doesn’t meet your definition of a loop, it isn’t a loop, QED. you don’t want to start breaking your algorithms to accept malformed loops. loop-optimization algorithms are already complicated enough even with good narrow definitions. better to fail to optimize something because the programmer got too creative than to break something because you rushed in where angels fear to tread.

i guess the problem is faced by the sort of tools that are meant to provide source-level feedback to the programmers. they will want a diagnostic message that refers to the original source, and will want validation of intuitive source-level constructs. but again i say duh: ornament your control flow graph with source information, so that you can reconstruct source-level error messages from your control flow visit. but the visit must be driven by control flow! alternatively, if it’s really what is desired, you can just eagerly and loudly reject anything that doesn’t conform to your narrow definition of a loop…if really what you’re doing isn’t comprehensive static analysis but rather enforcing a style guide.

Report comment

Reply
C says:

July 10, 2024 at 8:37 am

Where did all the comments go? I only see 10/20 comments. There were some interesting discussions. All gone…

Report comment

Reply
1. eriklscott says:
  
  July 10, 2024 at 12:43 pm
  
  I got here kind of late, but that usually happens when a flamewar gets deleted. :-)
  
  Report comment
  
  Reply
  1. Elliot Williams says:
    
    July 11, 2024 at 5:35 am
    
    Because replies to deleted comments don’t have anywhere to be shown, they just disappear. This sucks, but it’s a limitation of our comment system.
    
    If you find yourself replying to obvious trolling or negative BS, just stop and hit “report comment” instead. Don’t waste your time/effort on the negative folks, and certainly don’t follow up their crap with insightful commentary. :)
    
    Report comment
    
    Reply
Christian says:

July 10, 2024 at 8:53 am

“but for other languages it helps to write your code as straightforward as possible to give the static analysis tool a fighting chance”

That is a bit like: “Please adjust your driving to give self-driving cars a chance.” (my AI works fine, as long as we put all cars on rails and have no humans make decisions, buy my product!)

Maybe that static analysis tool made some bad assumption about how to convey code.

Report comment

Reply

Hackaday

Misconceptions About Loops, Or: Static Code Analysis Is Hard

39 thoughts on “Misconceptions About Loops, Or: Static Code Analysis Is Hard”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Two For The Price Of One: BornHack 2024 And 2025 Badges

Hands On: The Hacker Pager

Power Line Patrols: The Grid’s Eye In The Sky

A History Of Pong

Supersonic Flight May Finally Return To US Skies

Our Columns

Thanks, Tamiya-san

Hackaday Podcast Ep 331: Clever Machine Tools, Storing Data In Birds, And The Ultimate Cyberdeck

This Week In Security: Spilling Tea, Rooting AIs, And Accusing Of Backdoors

When Online Safety Means Surrendering Your ID, What Can You Do?

Farewell Shunsaku Tamiya: The Man Who Gave Us The Best Things To Build

39 thoughts on “Misconceptions About Loops, Or: Static Code Analysis Is Hard”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns