Code Craft: Subtle Interrupt Problems Stack Up

October 9, 2015

[Elliot Williams’] column, Embed with Elliot, just did a great series on interrupts. It came in three parts, illustrating the Good, the Bad, and the Ugly of using interrupts on embedded systems. More than a few memories floated by while reading it. Some pretty painful because debugging interrupt problems can be a nightmare.

One of the things I’ve learned to watch out for over the years is the subtlety of stack based languages, like C/C++, which can ensnare the unwary. This problem has to do with the corruption of arrays of values on a stack during interrupt handling. The fix for this problem points up another one often used by black hats to gain access to systems.

Almost all processors popular with hackers today use a stack. To visualize a stack think of a stack of plates. You can take the top plate off the stack. You can add some plates back on the top of the stack. What you can’t do is add or remove plates from the middle or the bottom. The stack in a processor works the same way. You can push data onto the stack and you pop data to take it off. But just like the plates, you can’t remove something from the middle of the stack.

Basics of the Stack

For some strange historical reason in diagrams stacks are always drawn growing downward so the top is the bottom. I think some hardware guy did that. Actually, it’s because stacks start at higher memory addresses and grow toward lower memory addresses.
The diagram should help explain how stacks work. The actions on the stack are shown on the top line and each column is the stack as it changes. For instance, the first action is to push D and E onto the stack and the second to pop one item off the stack.

A CPU has multiple registers. The two of import here are the stack pointer (SP) and the program counter (PC). The SP always points to the top of the stack. If something is pushed onto the stack the SP is decreased. (Remember it grows toward lower memory addresses). If an item is popped, the SP is increased. The amount of change to the SP depends on the size of the data pushed onto the stack.

The PC points to the code being executed. As instructions are executed the PC steps from one to another. When a function is called the instruction pushes the contents of the PC onto the stack. The instruction then loads the PC with the address of the first instruction of the called function. The first code sequence executed does some housekeeping and adjusts the SP to allow space for the local variables of the called function.

At the return instruction the housekeeping is undone and the SP is moved to point where the return address was stored. The address is popped to the PC and execution continues in the calling function.

Consider two functions, one that adds data to a buffer array and another that calls the first function but itself will be interrupted (shown as a comment in this example).

char* putInBuffer(...params...) {
   char local_buf[25];
   ... code to put something in local_buf ...
   return local_buf;
}

void callingFunc(...params...) {
   ...some code...
   char* buffer = putInBuffer(....params...);
   // note: an interrupt is going to occur right here
   ... code that does something with buffer ...
}

Here are the details:

When callingFunc calls putInBuffer the location of that call is taken from the PC and pushed onto the stack. The SP is adjusted.
The space for local_buf is allocated on the stack, i.e. 25 bytes are allocated. The SP is adjusted to allow for that space.
The putInBuffer code puts something into local_buf.
At the return, the pointer to local_buf is passed back to callingFunc. The details don’t matter here.
The SP is adjusted to point at the return address.
The return address is popped from the stack into the PC.
Execution continues in callingFunc.
Somewhat later callingFunc uses the data pointed to by buffer.

Consider where the SP is pointing. It is pointing above the address where it stored the PC. This means that the pointer in buffer is pointing to an address beyond the top of the stack.

When an interrupt occurs:

The PC counter is pushed onto the stack.
Many of the other registers in the CPU are pushed onto the stack.
The PC is changed to point at the code for the interrupt.
The interrupt code executes and returns.
The CPU registers are restored.
The PC counter is restored.
The SP is back where it started and processing continues in callingFunc.

Assume the interrupt occurred where I marked it in callingFunc.

What happened to the data that was in local_buf?

An Interrupt Ate My Data

The data in local_buf got clobbered by the interrupt stack manipulation. This happens before callingFunc could use it. As I said, subtle. I’ve seen inexperienced and experienced developers fall into this pitfall a number of times.

It is especially tricky to find this error because you can get away with it when returning non-pointer values. If you return an integer or floating point value it is actually put into the storage allocated in callingFunc. It doesn’t matter what happens to the stack in this situation. What makes it really nasty is the code will work correctly most of the time. An intermittent bug like this takes painstaking analysis to find. I’ve seen a function with this problem re-written five times by a developer to fix the bug. He never found the problem until I, after looking at it more times than I’d like to admit, was able to point it out.

The function putInBuffer can be saved, maybe, by making local_buf a static variable. A static local variable is not kept on the stack but in the global memory space (there’s a post on the static keyword if you want to learn more). That also means the data from the previous call is still in the variable. Sometimes that is an acceptable approach. It looks like this:

char* putInBuffer(...params...) {
   static char local_buf[25];
   ... code to put something in local_buf ...
   return local_buf;
}

void callingFunc(...params...) {
   ...some code...
   char* buffer = putInBuffer(....params...);
   // note: an interrupt is going to occur right here
   ... code that does something with buffer ...
}

Reentrant Functions

I said it’s maybe possible to fix the function using static because by doing so putInBuffer is no longer reentrant. A reentrant function is one that can be called simultaneously or recursively. Simultaneous calls require the use of a multi-tasking operating system with multiple threads. In that case thread R could call the function, be swapped out, and then thread S also calls the function. One of those two threads is going to get the wrong data because it’s stored in the global memory space.

What about recursion? And what is it? The old programming joke is recursion: see recursion. Recursion is when a routine calls itself. There are classes of problems where a recursive solution is easier. In this case, if a recursive function called putInBuffer, and then called itself before using the data, the data is going to be messed up.

The real solution is to change putInBuffer so the calling program provides the buffer and the size of the buffer. The buffer size is critical to avoid another problem: a buffer overrun exploit. This is a black hat hack used to compromise systems. A buffer passed into a function must be a fixed length. The function must be sure any input data does not run past the end of the buffer. In the best case this just causes your system to crash. In the worst, it provides a way for black hats to introduce malicious code. Here’s the best way to write this routine.

char* putInBuffer(char* buffer, const int buf_size, ...params...) {
   if (buf_size &lt; ...whatever the output size is going to be...) {
      return null;   // or another value callers will understand
   }
   // doesn't matter if interrupt is called here
   ... code to put something in buffer ...
   return buffer;
}

void callingFunc(...params...) {
   ...some code...
   char buffer[20];
   putInBuffer(buffer, sizeof(buffer), ....params...);
   // note: an interrupt is going to occur right here
   ... code that does something with buffer ...
}

It’s a convenience to the calling routine to return the pointer to the buffer. This allows the called routine to be used as an argument in other function calls. One typical situation is converting a numeric value to text.

Stack related problems live in one of those dirty little corners of C/C++, and any stack based language, that can cause hours of hair pulling frustration. You can see why I’m bald.

42 thoughts on “Code Craft: Subtle Interrupt Problems Stack Up”

Pat says:

October 9, 2015 at 10:25 am

Wait, there’s no way in the world that anyone would consider the first implementation of putInBuffer to be in any way okay. “local_buf” is exactly that – local. When the function returns, that storage should be considered *gone*.

Interrupts aren’t the problem here – you’re using an invalid pointer. In some cases the data might still be OK, but the pointer’s still invalid.

Report comment

Reply
1. CircuitPeople (@circuitpeople) says:
  
  October 9, 2015 at 10:29 am
  
  Some folks still fear the new.
  
  Report comment
  
  Reply
  1. CircuitPeople (@circuitpeople) says:
    
    October 9, 2015 at 10:37 am
    
    To be clear, the pointer is actually valid probably since it points to static memory. Maybe. Unless a race condition killed it. Like an interrupt?
    
    http://stackoverflow.com/questions/246564/what-is-the-lifetime-of-a-static-variable-in-a-c-function
    
    Report comment
    
    Reply
    1. Skrogh says:
      
      October 9, 2015 at 12:35 pm
      
      In the “first” implementation local_buf was not declared static.
      
      Report comment
      
      Reply
2. Rud Merriam says:
  
  October 9, 2015 at 10:56 am
  
  Pat, I’ve seen this done numerous times. It’s only worse today. That’s a side-effect of teaching ‘coding’ without covering some of the fundamentals like how a stack works.
  
  Report comment
  
  Reply
  1. Ralph Doncaster (Nerd Ralph) says:
    
    October 9, 2015 at 11:36 am
    
    What, are you saying learning to code Arduino/Wiring leads to problems when doing more serious coding?
    
    For an MCU like an AVR, assembly programming is less complicated than C++, and arguably less complex than C. ARM thumb isn’t much different. Once you’ve learned to be a half-decent assembly programmer, you’ll be a *much* better C or C++ programmer.
    
    Report comment
    
    Reply
    1. Ralph Doncaster (Nerd Ralph) says:
      
      October 9, 2015 at 11:37 am
      
      Hmmm… my sarcasm on and sarcasm off brackets around the first sentence seem to have been stripped.
      
      Report comment
      
      Reply
      1. Ren says:
        
        October 9, 2015 at 7:34 pm
        
        That’s because…
        a.) Our Evil Overlords strip off the brackets to generate more followup comments by people who were offended.
        (and they also introdude speeling and gramma misteaks to encourage increased comment counts)
        b.) HaD readers are always sarcastic, so putting /Sarcasm=ON\ and OFF tags are redundant.
        c.) Like HTML tags, your brackets were recognized by WordPress and removed before printing.
        (or your browser software strips them off, so you are the only one who didn’t see them)
        d.) Anything that appears to be a sarcastic comment on HaD against their bread and butter Arduino gets editted.
        
        B^)
        
        Report comment
    2. Rud Merriam says:
      
      October 9, 2015 at 11:41 am
      
      You said it, not I. LOL
      
      I learned assembly decades ago and fully agree with you.
      
      Report comment
      
      Reply
  2. tekkieneet says:
    
    October 9, 2015 at 1:46 pm
    
    tl;dr Noobs making noobs type of mistake. You can’t begin to code without understanding how the variables are stored. You can change the behavior in the declaration.
    
    C has enough robe to hang yourself.
    
    Report comment
    
    Reply
  3. Pat says:
    
    October 12, 2015 at 1:04 pm
    
    How do you teach C++ without teaching scope?
    
    Report comment
    
    Reply
    1. Rud Merriam says:
      
      October 13, 2015 at 9:55 am
      
      Pat, a big part of the issue, and motivation for the article, is we are seeing a lot of coders who are not taught. They are picking up C/C++ through working with the Arduino. I mentioned seeing this during my career and one of the most interesting times was as an expert witness in a software copyright lawsuit. Basically, The Kid defendant had taken code from one company to another. One set of utility routines were the smoking gun because all that was changed were the function names. But one routine was re-written over and over again. It had this problem. The Kid was self taught. He actually did an good job overall but this problem was beyond his understanding.
      
      Report comment
      
      Reply
      1. Pat says:
        
        October 14, 2015 at 9:44 am
        
        No, your original comment was “without covering some of the fundamentals like how a stack works.”
        
        You don’t need to know how a stack works to find this problem. You need to know that an object that falls out of scope no longer exists.
        
        Report comment
      2. Rud Merriam says:
        
        October 14, 2015 at 1:40 pm
        
        Pat, back in my Compuserve days we’d say you and I are in violent agreement. We’re just approaching this problem, the explanation, and the debugging of it from different perspectives. Some of our disagreement is simply semantics. Peace.
        
        Report comment
3. some guy says:
  
  October 9, 2015 at 11:19 am
  
  > When the function returns, that storage should be considered *gone*.
  For a local variable yes – but this is local STATIC, the storage remains valid even after the function exits. So no problem (i didn’t say it is beautiful).
  
  Report comment
  
  Reply
  1. some guy says:
    
    October 9, 2015 at 11:28 am
    
    Oops, should have read the article first…
    
    Report comment
    
    Reply
    1. some guy says:
      
      October 9, 2015 at 11:30 am
      
      because in the first code there is no static so there is a real problem.
      
      Report comment
      
      Reply
4. Brendan says:
  
  October 9, 2015 at 11:20 am
  
  That’s what I noticed too. It’s a dangling pointer caused by that variable falling out of scope, not some subtle interrupt bug. Look at the first pseudo-code example on the dangling pointer wikipedia article, it’s pretty much the same deal.
  
  Report comment
  
  Reply
5. Douglas Henke says:
  
  October 9, 2015 at 11:46 am
  
  Pat is absolutely right. To expand on that a little:
  
  1) The reason this is wrong and bad has nothing to do with interrupts specifically. In a single-threaded all-interrupts-disabled context, you’re still boned if you call the first putInBuffer(), then call any other function from callingFunc() before using the result. (In other words, if code like the first example “works”, consider it accidental and temporary.)
  
  2) Any reasonable compiler (invoked with reasonable options) will smack you down for code that resembles the first example. (On gcc 4.8.4 with “-Wall”, I get “function returns address of local variable” which is pretty expository.) Ditto various static analysis tools.
  
  3) Compiling with warnings off is asking for trouble, a request which the universe will fulfill with zero hesitation.
  
  4) Somewhat surprisingly, valgrind didn’t catch the error (in the trivial test case I created equivalent to the first example).
  
  Report comment
  
  Reply
  1. Dan says:
    
    October 10, 2015 at 2:52 am
    
    >Compiling with warnings off is asking for trouble, a request which the universe will fulfI’ll with zero hesitation
    
    It’ll only filly it with zeros if you compile in debug; in release it’ll be filled with random crap from something else ;)
    
    Report comment
    
    Reply
  2. Pat says:
    
    October 12, 2015 at 1:22 pm
    
    “you’re still boned if you call the first putInBuffer(), then call any other function from callingFunc() before using the result.”
    
    It’s worse than that: you could just end up boned, period. You could actually try using that data immediately, and you’re still screwed. That memory (allocated on the stack) is gone – so the compiler is completely free to reuse it, including creating its own temporaries. It just depends on the architecture itself.
    
    I mean, if you did something like “char c = *buffer” you could be completely boned if the compiler decided to put ‘c’ on the stack. I can’t think of an architecture that would do that offhand (maybe an 8051 or a PIC? Dunno) but obviously there’s lots of code you could put there that might look innocent but you’d still get screwed.
    
    Report comment
    
    Reply
    1. Rud Merriam says:
      
      October 13, 2015 at 10:01 am
      
      Pat, sorry, but you just not correct for stack based processors. The memory is not deallocated. It’s still on the stack and reachable which is why this problem can be difficult to spot. Take a look at my stack diagrams. When the data is corrupted by a function call stepping through the code makes it obvious what is happening, i.e. the function call is corrupting the data. It’s more difficult to debug in a multi-tasking system and it reaches near impossibility when an interrupt intervenes. (Which may be the timer interrupt in a preemptive system.) The really nasty part is the intermittent nature. The code may work over 95% of the time, then “Bam!”.
      
      Report comment
      
      Reply
      1. Pat says:
        
        October 14, 2015 at 9:33 am
        
        “Pat, sorry, but you just not correct for stack based processors.”
        
        I’m really not wrong. You’re talking about the hardware, but you’re not programming in assembly. You’re programming in C, so you’re not programming for the hardware. You’re programming for the *compiler*. And you’re not following the compiler’s rules. This is where you’re getting confused – here, you’re talking about “stack-based processors,” as if the *architecture* is causing the problem. You then say:
        
        “The memory is not deallocated. It’s still on the stack and reachable”
        
        which shows a confusion between the *architecture* and the *language* – effectively you’re talking about a different layer of abstraction. Hardware can’t ‘deallocate’ memory at all. Plenty of attacks take advantage of this – after code is run, its results are all still stored in RAM. Even if you reboot the system the results are still in RAM somewhere.
        
        ‘Allocation’ of memory occurs at the *language* level. And C and C++ say that when a local variable falls out of scope, it’s *gone*. It’s deallocated, from the compiler’s point of view. Which means the compiler is allowed to do *whatever it wants* with it!
        
        In your case, you’re looking at some compiler’s implementation of what you wrote, and you say “what, it should work fine.” But turn on optimization, and it might not. Switch to an 8051, and it might not. Switch to Weird Architecture That Still Has a C Compiler, and it might not.
        
        You are accessing unallocated memory. That’s architecture specific as to whether or not it will work.
        
        “The code may work over 95% of the time,”
        
        The code may also work 0% of the time. Even without interrupts. It’s totally compiler and architecture dependent. It’s just flat out wrong – it has nothing to do with the interrupt. You’re accessing a dangling pointer. The compiler didn’t know it had to have lifetime outside of the function, so it stored it on the stack.
        
        Once the function returned, the compiler can reuse that stack space all it wants. Some architectures use stack space to preserve registers for various reasons. Some of them do it well before they actually use it just because the code timing is better.
        
        You seem to believe that the stack is always preserved except for function calls. This is wrong.
        
        Report comment
      2. Pat says:
        
        October 14, 2015 at 10:55 am
        
        I just thought of an example which is illustrative. Note that I don’t know that any compiler actually works like this, but they could, easily.
        
        void callingFunc(unsigned char myChar) {
        unsigned char buf2[24];
        unsigned char *buf;
        for (i=0;i<24;i=i+1) buf2[i] = 0;
        buf = putInBuffer(myChar);
        buf2[0] = *buf;
        sendOutMyBuffer(buf2);
        }
        
        What happens here? It looks like callingFunc() allocates space on the stack and initializes it, then calls putInBuffer (which allocates more, and stores something there) and then callingFunc copies that other data over.
        
        The problem is that the compiler can happily recognize that you *didn't use* buf2 before calling putInBuffer except to initialize it, so it can move the initialization and stack allocation until *after* putInBuffer.
        
        So then putInBuffer's buffer, as *well* as callingFunc's buffer, are both overlayed on top of each other on the stack. And initializing callingFunc's buffer overwrote that data.
        
        Why would it do that? It saves stack space, and it's completely identical to what you asked it to do.
        
        This is exactly identical to using memory space on the heap (allocated through malloc) after free() has been called. It's a dangling pointer problem, not an interrupt problem.
        
        Report comment
6. 0xfred says:
  
  October 9, 2015 at 1:06 pm
  
  I found the article really confusing. I really couldn’t get what the author was on about and why the interrupt was the problem – it correctly leaves the stack as it found it.
  
  Pat’s comment totally nailed it though. Way better than the original article.
  
  Report comment
  
  Reply
  1. solipso says:
    
    October 13, 2015 at 11:52 am
    
    I agree completely. I got the point of the article only after reading those comments here. Maybe my English is just not on par.
    
    Report comment
    
    Reply
RoyTheReaper says:

October 9, 2015 at 10:40 am

Is it just me? I’m getting some CSS in the code examples.

Report comment

Reply
1. Rud Merriam says:
  
  October 9, 2015 at 10:48 am
  
  Hopefully its been fixed. There is a bug in the blogging software that messes with code snippets.
  
  Report comment
  
  Reply
Charles says:

October 9, 2015 at 11:10 am

“The function must be sure any input data does run past the end of the buffer.”
I think you mean “… does not run…”, right?

And now since I’ve picked at one thing: “hats”, not “hat’s” in the next sentence :)

Report comment

Reply
1. Ren says:
  
  October 9, 2015 at 7:37 pm
  
  Okay, you have now reached your limit of nitpicks for this article.
  See you on the next!
  B^)
  
  Report comment
  
  Reply
Bob Alexander says:

October 9, 2015 at 11:24 am

The C spec says that local variables don’t exist after the function (or block) exits. The lesson of this article is not “interrupts can cause subtle problems”. The lesson is “DON’T RETURN POINTERS TO LOCAL VARIABLES!!!!”

Report comment

Reply
1. some guy says:
  
  October 9, 2015 at 11:27 am
  
  >DON’T RETURN POINTERS TO LOCAL VARIABLES!!!!”
  unless they are static. Iirc this is used in the implementation of the C standard library that comes with GCC (and I suppose the guys who wrote this know what they are doing).
  
  Report comment
  
  Reply
  1. stephenrwarren says:
    
    October 9, 2015 at 12:25 pm
    
    To state the lesson more precisely: Don’t return pointers to local automatic variables.
    
    Report comment
    
    Reply
  2. Bob Alexander says:
    
    October 9, 2015 at 1:16 pm
    
    Return pointers to local static variables can work, but is still a bad idea (despite the fact that some standard C library routines do it). It’s real easy for that memory to get overwritten by other calls to the function.
    
    Report comment
    
    Reply
    1. Elliot Williams says:
      
      October 9, 2015 at 2:21 pm
      
      Pointers to static variables that “sit inside” functions are even more opaque than shared global variables. If you find yourself doing that, consider a simple global instead. (Well-named, well-documented.)
      
      It’s a tremendous way to shoot yourself in the foot, or obfuscate code if you need to slip a trojan/back-door in on somebody. :)
      
      Report comment
      
      Reply
      1. TheRegnirps says:
        
        October 9, 2015 at 10:53 pm
        
        Yes, and besides, static has two different meaning in C depending on where you use it.
        
        Report comment
Niobe says:

October 9, 2015 at 11:36 am

Bruh, you are returning a pointer to an out of scope variable. This is where the undefined behavior is coming from.

Report comment

Reply
Jan says:

October 9, 2015 at 12:05 pm

In my opinion the biggest problem is the code ,under the hood, with these higher level languages as C++, Delphi and C# where you as programmer have _no_ grip on as how and when it is done.

I have done a lot of interrupt programming started with a 8086 (1981) and later (1985) on 8 bit 8051 based boards/systems what we in house developed to run realtime and semi realtime applications, think about fast weighing unit and also vision (using up to 24 dsp’s) for four sets of paired BW and color camera to look at a moving object.

The most time i have spend (1997..2012) on a dos (MSDOS 6.22) to create a: user interface b: (semi) realtime calculations and IO handling with the 1 mSec bios interrupt as the high prio task, HwInts for measure values, keyboard as normal level and a background for low prio tasks as writing to a logfile.
The whole system is done with BP 7.0 in protected mode (DPMI = DosProtectedModeInterface), as the 640 Kb realmode memory was way to small while with DPMI you got a max of 15 MB even with stacked programs (after startup the interrupt based processes start a new shell with a other program what also can use ,a other, up to 15 MB of memory).

Hardware interrupts are in realmode while (most) code is run in protected mode so a lot of switching between these two modes with a OS what was_not_ written to do multitasking.

A other pitfall with BP7 is/was the problem that the compiler did only know about 80486 instruction prefetch while we are running these “old” programs on Core Duo processors (after also hacking the default library to get rid of the runtime 200, div by zero, error on init video). The problem solved by using a relative jump to the next instruction, a jump clears the prefetch cache.

At this moment i am using (ansi) C with vs2010 and Delphi (radxe2) for Windows XP (embedded) and Windows 7 on i7 hardware, where the W7 os gives a lot of trouble caused by design changes between XP and 7.

Report comment

Reply
Xark says:

October 9, 2015 at 4:29 pm

I suggest Googling “recursion” if you haven’t tried that already. :-)

Report comment

Reply
1. Rud Merriam says:
  
  October 9, 2015 at 6:09 pm
  
  Cute…
  
  Report comment
  
  Reply
TheRegnirps says:

October 9, 2015 at 10:54 pm

No on actually uses recursion in embedded programming. I hope. I cringe at the thought.

Report comment

Reply
steelman says:

October 10, 2015 at 12:21 pm

[…] stacks start at higher memory addresses and grow toward lower memory addresses.

This is true only for some CPU-s, in particular some of those which provide instructions like push and pop (not all do). A buffer overflow on such machines indeed is more dangerous.

Report comment

Reply