Code Craft: Subtle Interrupt Problems Stack Up

[Elliot Williams’] column, Embed with Elliot, just did a great series on interrupts. It came in three parts, illustrating the Good, the Bad, and the Ugly of using interrupts on embedded systems. More than a few memories floated by while reading it. Some pretty painful because debugging interrupt problems can be a nightmare.

One of the things I’ve learned to watch out for over the years is the subtlety of stack based languages, like C/C++, which can ensnare the unwary. This problem has to do with the corruption of arrays of values on a stack during interrupt handling. The fix for this problem points up another one often used by black hats to gain access to systems.

Almost all processors popular with hackers today use a stack.  To visualize a stack think of a stack of plates. You can take the top plate off the stack. You can add some plates back on the top of the stack. What you can’t do is add or remove plates from the middle or the bottom. The stack in a processor works the same way. You can push data onto the stack and you pop data to take it off. But just like the plates, you can’t remove something from the middle of the stack.

Basics of the Stack

For some strange historical reason in diagrams stacks are always drawn growing downward so the top is the bottom. I think some hardware guy did that. Actually, it’s because stacks start at higher memory addresses and grow toward lower memory addresses.
abcd stack diagramThe diagram should help explain how stacks work. The actions on the stack are shown on the top line and each column is the stack as it changes. For instance, the first action is to push D and E onto the stack and the second to pop one item off the stack.

A CPU has multiple registers. The two of import here are the stack pointer (SP) and the program counter (PC). The SP always points to the top of the stack. If something is pushed onto the stack the SP is decreased. (Remember it grows toward lower memory addresses). If an item is popped, the SP is increased. The amount of change to the SP depends on the size of the data pushed onto the stack.

The PC points to the code being executed. As instructions are executed the PC steps from one to another. When a function is called the instruction pushes the contents of the PC onto the stack. The instruction then loads the PC with the address of the first instruction of the called function. The first code sequence executed does some housekeeping and adjusts the SP to allow space for the local variables of the called function.

At the return instruction the housekeeping is undone and the SP is moved to point where the return address was stored. The address is popped to the PC and execution continues in the calling function.

Consider two functions, one that adds data to a buffer array and another that calls the first function but itself will be interrupted (shown as a comment in this example).

char* putInBuffer(...params...) {
   char local_buf[25];
   ... code to put something in local_buf ...
   return local_buf;
}

void callingFunc(...params...) {
   ...some code...
   char* buffer = putInBuffer(....params...);
   // note: an interrupt is going to occur right here
   ... code that does something with buffer ...
}

Here are the details:

  1. When callingFunc calls putInBuffer the location of that call is taken from the PC and pushed onto the stack. The SP is adjusted.
  2. The space for local_buf is allocated on the stack, i.e. 25 bytes are allocated. The SP is adjusted to allow for that space.
  3. The putInBuffer code puts something into local_buf.stack pointer diagram
  4. At the return, the pointer to local_buf is passed back to callingFunc. The details don’t matter here.
  5. The SP is adjusted to point at the return address.
  6. The return address is popped from the stack into the PC.
  7. Execution continues in callingFunc.
  8. Somewhat later callingFunc uses the data pointed to by buffer.

Consider where the SP is pointing. It is pointing above the address where it stored the PC. This means that the pointer in buffer is pointing to an address beyond the top of the stack.

When an interrupt occurs:

  1. The PC counter is pushed onto the stack.
  2. Many of the other registers in the CPU are pushed onto the stack.
  3. The PC is changed to point at the code for the interrupt.
  4. The interrupt code executes and returns.
  5. The CPU registers are restored.
  6. The PC counter is restored.
  7. The SP is back where it started and processing continues in callingFunc.

Assume the interrupt occurred where I marked it in callingFunc. 

What happened to the data that was in local_buf?

An Interrupt Ate My Data

The data in local_buf got clobbered by the interrupt stack manipulation. This happens before callingFunc could use it. As I said, subtle. I’ve seen inexperienced and experienced developers fall into this pitfall a number of times.

It is especially tricky to find this error because you can get away with it when returning non-pointer values. If you return an integer or floating point value it is actually put into the storage allocated in callingFunc. It doesn’t matter what interrupt stack pointer diagramhappens to the stack in this situation. What makes it really nasty is the code will work correctly most of the time. An intermittent bug like this takes painstaking analysis to find. I’ve seen a function with this problem re-written five times by a developer to fix the bug. He never found the problem until I,  after looking at it more times than I’d like to admit, was able to point it out.

The function putInBuffer can be saved, maybe, by making local_buf a static variable. A static local variable is not kept on the stack but in the global memory space (there’s a post on the static keyword if you want to learn more). That also means the data from the previous call is still in the variable. Sometimes that is an acceptable approach. It looks like this:

char* putInBuffer(...params...) {
   static char local_buf[25];
   ... code to put something in local_buf ...
   return local_buf;
}

void callingFunc(...params...) {
   ...some code...
   char* buffer = putInBuffer(....params...);
   // note: an interrupt is going to occur right here
   ... code that does something with buffer ...
}

Reentrant Functions

I said it’s maybe possible to fix the function using static because by doing so putInBuffer is no longer reentrant. A reentrant function is one that can be called simultaneously or recursively. Simultaneous calls require the use of a multi-tasking operating system with multiple threads. In that case thread R could call the function, be swapped out, and then thread S also calls the function. One of those two threads is going to get the wrong data because it’s stored in the global memory space.

What about recursion? And what is it? The old programming joke is recursion: see recursion. Recursion is when a routine calls itself. There are classes of problems where a recursive solution is easier. In this case, if a recursive function called putInBuffer, and then called itself before using the data, the data is going to be messed up.

The real solution is to change putInBuffer so the calling program provides the buffer and the size of the buffer. The buffer size is critical to avoid another problem: a buffer overrun exploit. This is a black hat hack used to compromise systems. A buffer passed into a function must be a fixed length. The function must be sure any input data does not run past the end of the buffer. In the best case this just causes your system to crash. In the worst, it provides a way for black hats to introduce malicious code. Here’s the best way to write this routine.

char* putInBuffer(char* buffer, const int buf_size, ...params...) {
   if (buf_size < ...whatever the output size is going to be...) {
      return null;   // or another value callers will understand
   }
   // doesn't matter if interrupt is called here
   ... code to put something in buffer ...
   return buffer;
}

void callingFunc(...params...) {
   ...some code...
   char buffer[20];
   putInBuffer(buffer, sizeof(buffer), ....params...);
   // note: an interrupt is going to occur right here
   ... code that does something with buffer ...
}

It’s a convenience to the calling routine to return the pointer to the buffer. This allows the called routine to be used as an argument in other function calls. One typical situation is converting a numeric value to text.

Stack related problems live in one of those dirty little corners of C/C++, and any stack based language, that can cause hours of hair pulling frustration. You can see why I’m bald.

42 thoughts on “Code Craft: Subtle Interrupt Problems Stack Up

  1. Wait, there’s no way in the world that anyone would consider the first implementation of putInBuffer to be in any way okay. “local_buf” is exactly that – local. When the function returns, that storage should be considered *gone*.

    Interrupts aren’t the problem here – you’re using an invalid pointer. In some cases the data might still be OK, but the pointer’s still invalid.

    1. Pat, I’ve seen this done numerous times. It’s only worse today. That’s a side-effect of teaching ‘coding’ without covering some of the fundamentals like how a stack works.

      1. What, are you saying learning to code Arduino/Wiring leads to problems when doing more serious coding?

        For an MCU like an AVR, assembly programming is less complicated than C++, and arguably less complex than C. ARM thumb isn’t much different. Once you’ve learned to be a half-decent assembly programmer, you’ll be a *much* better C or C++ programmer.

          1. That’s because…
            a.) Our Evil Overlords strip off the brackets to generate more followup comments by people who were offended.
            (and they also introdude speeling and gramma misteaks to encourage increased comment counts)
            b.) HaD readers are always sarcastic, so putting /Sarcasm=ON\ and OFF tags are redundant.
            c.) Like HTML tags, your brackets were recognized by WordPress and removed before printing.
            (or your browser software strips them off, so you are the only one who didn’t see them)
            d.) Anything that appears to be a sarcastic comment on HaD against their bread and butter Arduino gets editted.

            B^)

        1. Pat, a big part of the issue, and motivation for the article, is we are seeing a lot of coders who are not taught. They are picking up C/C++ through working with the Arduino. I mentioned seeing this during my career and one of the most interesting times was as an expert witness in a software copyright lawsuit. Basically, The Kid defendant had taken code from one company to another. One set of utility routines were the smoking gun because all that was changed were the function names. But one routine was re-written over and over again. It had this problem. The Kid was self taught. He actually did an good job overall but this problem was beyond his understanding.

          1. No, your original comment was “without covering some of the fundamentals like how a stack works.”

            You don’t need to know how a stack works to find this problem. You need to know that an object that falls out of scope no longer exists.

          2. Pat, back in my Compuserve days we’d say you and I are in violent agreement. We’re just approaching this problem, the explanation, and the debugging of it from different perspectives. Some of our disagreement is simply semantics. Peace.

    2. > When the function returns, that storage should be considered *gone*.
      For a local variable yes – but this is local STATIC, the storage remains valid even after the function exits. So no problem (i didn’t say it is beautiful).

    3. That’s what I noticed too. It’s a dangling pointer caused by that variable falling out of scope, not some subtle interrupt bug. Look at the first pseudo-code example on the dangling pointer wikipedia article, it’s pretty much the same deal.

    4. Pat is absolutely right. To expand on that a little:

      1) The reason this is wrong and bad has nothing to do with interrupts specifically. In a single-threaded all-interrupts-disabled context, you’re still boned if you call the first putInBuffer(), then call any other function from callingFunc() before using the result. (In other words, if code like the first example “works”, consider it accidental and temporary.)

      2) Any reasonable compiler (invoked with reasonable options) will smack you down for code that resembles the first example. (On gcc 4.8.4 with “-Wall”, I get “function returns address of local variable” which is pretty expository.) Ditto various static analysis tools.

      3) Compiling with warnings off is asking for trouble, a request which the universe will fulfill with zero hesitation.

      4) Somewhat surprisingly, valgrind didn’t catch the error (in the trivial test case I created equivalent to the first example).

      1. >Compiling with warnings off is asking for trouble, a request which the universe will fulfI’ll with zero hesitation

        It’ll only filly it with zeros if you compile in debug; in release it’ll be filled with random crap from something else ;)

      2. “you’re still boned if you call the first putInBuffer(), then call any other function from callingFunc() before using the result.”

        It’s worse than that: you could just end up boned, period. You could actually try using that data immediately, and you’re still screwed. That memory (allocated on the stack) is gone – so the compiler is completely free to reuse it, including creating its own temporaries. It just depends on the architecture itself.

        I mean, if you did something like “char c = *buffer” you could be completely boned if the compiler decided to put ‘c’ on the stack. I can’t think of an architecture that would do that offhand (maybe an 8051 or a PIC? Dunno) but obviously there’s lots of code you could put there that might look innocent but you’d still get screwed.

        1. Pat, sorry, but you just not correct for stack based processors. The memory is not deallocated. It’s still on the stack and reachable which is why this problem can be difficult to spot. Take a look at my stack diagrams. When the data is corrupted by a function call stepping through the code makes it obvious what is happening, i.e. the function call is corrupting the data. It’s more difficult to debug in a multi-tasking system and it reaches near impossibility when an interrupt intervenes. (Which may be the timer interrupt in a preemptive system.) The really nasty part is the intermittent nature. The code may work over 95% of the time, then “Bam!”.

          1. “Pat, sorry, but you just not correct for stack based processors.”

            I’m really not wrong. You’re talking about the hardware, but you’re not programming in assembly. You’re programming in C, so you’re not programming for the hardware. You’re programming for the *compiler*. And you’re not following the compiler’s rules. This is where you’re getting confused – here, you’re talking about “stack-based processors,” as if the *architecture* is causing the problem. You then say:

            “The memory is not deallocated. It’s still on the stack and reachable”

            which shows a confusion between the *architecture* and the *language* – effectively you’re talking about a different layer of abstraction. Hardware can’t ‘deallocate’ memory at all. Plenty of attacks take advantage of this – after code is run, its results are all still stored in RAM. Even if you reboot the system the results are still in RAM somewhere.

            ‘Allocation’ of memory occurs at the *language* level. And C and C++ say that when a local variable falls out of scope, it’s *gone*. It’s deallocated, from the compiler’s point of view. Which means the compiler is allowed to do *whatever it wants* with it!

            In your case, you’re looking at some compiler’s implementation of what you wrote, and you say “what, it should work fine.” But turn on optimization, and it might not. Switch to an 8051, and it might not. Switch to Weird Architecture That Still Has a C Compiler, and it might not.

            You are accessing unallocated memory. That’s architecture specific as to whether or not it will work.

            “The code may work over 95% of the time,”

            The code may also work 0% of the time. Even without interrupts. It’s totally compiler and architecture dependent. It’s just flat out wrong – it has nothing to do with the interrupt. You’re accessing a dangling pointer. The compiler didn’t know it had to have lifetime outside of the function, so it stored it on the stack.

            Once the function returned, the compiler can reuse that stack space all it wants. Some architectures use stack space to preserve registers for various reasons. Some of them do it well before they actually use it just because the code timing is better.

            You seem to believe that the stack is always preserved except for function calls. This is wrong.

          2. I just thought of an example which is illustrative. Note that I don’t know that any compiler actually works like this, but they could, easily.

            void callingFunc(unsigned char myChar) {
            unsigned char buf2[24];
            unsigned char *buf;
            for (i=0;i<24;i=i+1) buf2[i] = 0;
            buf = putInBuffer(myChar);
            buf2[0] = *buf;
            sendOutMyBuffer(buf2);
            }

            What happens here? It looks like callingFunc() allocates space on the stack and initializes it, then calls putInBuffer (which allocates more, and stores something there) and then callingFunc copies that other data over.

            The problem is that the compiler can happily recognize that you *didn't use* buf2 before calling putInBuffer except to initialize it, so it can move the initialization and stack allocation until *after* putInBuffer.

            So then putInBuffer's buffer, as *well* as callingFunc's buffer, are both overlayed on top of each other on the stack. And initializing callingFunc's buffer overwrote that data.

            Why would it do that? It saves stack space, and it's completely identical to what you asked it to do.

            This is exactly identical to using memory space on the heap (allocated through malloc) after free() has been called. It's a dangling pointer problem, not an interrupt problem.

    5. I found the article really confusing. I really couldn’t get what the author was on about and why the interrupt was the problem – it correctly leaves the stack as it found it.

      Pat’s comment totally nailed it though. Way better than the original article.

  2. “The function must be sure any input data does run past the end of the buffer.”
    I think you mean “… does not run…”, right?

    And now since I’ve picked at one thing: “hats”, not “hat’s” in the next sentence :)

  3. The C spec says that local variables don’t exist after the function (or block) exits. The lesson of this article is not “interrupts can cause subtle problems”. The lesson is “DON’T RETURN POINTERS TO LOCAL VARIABLES!!!!”

    1. >DON’T RETURN POINTERS TO LOCAL VARIABLES!!!!”
      unless they are static. Iirc this is used in the implementation of the C standard library that comes with GCC (and I suppose the guys who wrote this know what they are doing).

      1. Return pointers to local static variables can work, but is still a bad idea (despite the fact that some standard C library routines do it). It’s real easy for that memory to get overwritten by other calls to the function.

        1. Pointers to static variables that “sit inside” functions are even more opaque than shared global variables. If you find yourself doing that, consider a simple global instead. (Well-named, well-documented.)

          It’s a tremendous way to shoot yourself in the foot, or obfuscate code if you need to slip a trojan/back-door in on somebody. :)

  4. In my opinion the biggest problem is the code ,under the hood, with these higher level languages as C++, Delphi and C# where you as programmer have _no_ grip on as how and when it is done.

    I have done a lot of interrupt programming started with a 8086 (1981) and later (1985) on 8 bit 8051 based boards/systems what we in house developed to run realtime and semi realtime applications, think about fast weighing unit and also vision (using up to 24 dsp’s) for four sets of paired BW and color camera to look at a moving object.

    The most time i have spend (1997..2012) on a dos (MSDOS 6.22) to create a: user interface b: (semi) realtime calculations and IO handling with the 1 mSec bios interrupt as the high prio task, HwInts for measure values, keyboard as normal level and a background for low prio tasks as writing to a logfile.
    The whole system is done with BP 7.0 in protected mode (DPMI = DosProtectedModeInterface), as the 640 Kb realmode memory was way to small while with DPMI you got a max of 15 MB even with stacked programs (after startup the interrupt based processes start a new shell with a other program what also can use ,a other, up to 15 MB of memory).

    Hardware interrupts are in realmode while (most) code is run in protected mode so a lot of switching between these two modes with a OS what was_not_ written to do multitasking.

    A other pitfall with BP7 is/was the problem that the compiler did only know about 80486 instruction prefetch while we are running these “old” programs on Core Duo processors (after also hacking the default library to get rid of the runtime 200, div by zero, error on init video). The problem solved by using a relative jump to the next instruction, a jump clears the prefetch cache.

    At this moment i am using (ansi) C with vs2010 and Delphi (radxe2) for Windows XP (embedded) and Windows 7 on i7 hardware, where the W7 os gives a lot of trouble caused by design changes between XP and 7.

  5. […] stacks start at higher memory addresses and grow toward lower memory addresses.

    This is true only for some CPU-s, in particular some of those which provide instructions like push and pop (not all do). A buffer overflow on such machines indeed is more dangerous.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.