Embed With Elliot: Interrupts, the Bad

We love interrupts in our microcontroller projects. If you aren’t already convinced, go read the first installment in this series where we covered the best aspects of interrupts.

But everything is not sunshine and daffodils. Interrupts take place outside of the normal program flow, and indeed preempt it. The microcontroller will put down whatever code it’s running and go off to run your ISR every time the triggering event happens. That power is great when you need it, but recall with Spider-Man’s mantra: With great power comes great responsibility. It’s your responsibility to design for the implicit high priority of ISRs, and to make sure that your main code can still get its work done in between interrupt calls.

Put another way, adding interrupts in your microcontroller code introduces issues of scheduling and prioritization that you didn’t have to deal with before. Let’s have a look at that aspect now, and we’ll put off the truly gruesome side-effects of using interrupts until next time.

Starvation: Long Interrupts

Let’s go back to a concrete example. Imagine that you’ve got some input coming in from an accelerometer, and you’d like to do a bunch of math on it to figure out which way your board is pointing, and then maybe you’d like to send that tilt information out to a serial terminal for debugging.

The naïve approach is to handle all the math and serial stuff inside the interrupt. After all, you only need to process the incoming accelerometer data when it’s new, and the interrupt fires once for each new value. It seems like a perfect match, no?

 
ISR(INT1_vector){
	read_accelerometer();
	complicated_math();
	wait_for_serial_line_to_clear();
	send_results();
}

int main(void){
	while(1){
		do_stuff();
		if (emergency_shutdown_pressed()){
			turn_off_killer_laser();
		}
	}
}	

To make the example more drastic, we’ve also implemented an emergency shutdown for a killer laser by polling in the main() loop. Just looking at the main loop, we should be in good shape if do_stuff() runs very quickly, right? The emergency shutdown will get polled quite frequently, right?

Nope. We are hiding away a lot of work in our ISR, and because it’s in an ISR it preempts code running in the main body. So while the ISR is doing heavy math and queuing up data for output, the killer laser is burning through your lab. Clearly, we’ve at least got a priority mismatch here; there’s no way sending numbers to a screen is more important than user safety, right? But even without the priority problem, we’ve got an ISR that does too much.

The problem with heavy-weight interrupts is compounded when you have many inputs handled by interrupts. Your code may end up spending all of its time in ISRs, and have very little time left for the main routine, starving it. If, for instance, you have two long-running ISRs and the second is triggered while the first is still running, and then the first re-triggers while the second is running and so forth, your main loop may never see any CPU time at all.

The golden rule of ISR-writing is this: keep it short and sweet. ISRs are the highest-priority parts of your code, so treat them with respect. Do only the absolute minimum in the interrupt, and allow your code to return to normal as fast as possible.

Trimming your ISRs down to the bare minimum isn’t even very hard, but it requires declaring a few additional variables that can be passed between the ISR and the main flow. That way, instead of handling the once-per-update serial transmission inside the ISR, you can simply flag that it needs handling and let the main routine take care of it. For instance:

 
ISR(INT0_vector){  /* now handles killer laser shutdown */
	turn_off_killer_laser();
}

/* Some volatile variables to pass to/from the ISR */
volatile uint8_t raw_accelerometer_data;
volatile enum {NO, YES} has_accelerometer_data = NO;

ISR(INT1_vector){
	raw_accelerometer_data = read_accelerometer();
	has_accelerometer_data = YES;
}

int main(void){
	while(1){
		do_stuff();

		if (has_accelerometer_data == YES){
			complicated_math();
			wait_for_serial_line_to_clear();
			send_results();
			has_accelerometer_data = NO;
		}
	}
}

We’ve moved the laser kill switch off to the highest-priority interrupt. You may also be required by law to have a physical kill switch in real life but automated kill switches based on out-of-spec data are often included where danger to humans exists.

Now the accelerometer ISR doesn’t do more than it needs to — records the new data and sets a flag that lets the main body of the code know that it’s got something new to process. We’ve also made the complicated_math() preemptable by putting it in main() instead of an interrupt.

Those of you who followed the interrupt vs. polling debate from the last installment will recognize this as being a hybrid: the interrupt acquires the data, while the main routine polls for new data (in the form of the has_accelerometer_data flag) and does the slower and lower-priority serial handling. This splits up the accelerometer-handling task into low- and high-priority operations, and handles them in the appropriate places. All is well.

Finally, another viable pattern for this example would be to have a text buffer for handling data to be sent out over the serial interface, and poll that buffer each time around the main loop. That way, multiple subroutines can easily share the serial service. For simple situations, the ISR could even write directly to the text buffer without involving flags and so forth. This is one of the nicer services that you get with the Arduino libraries, for instance. We’ll run an article on this very common application soon.

Starvation: Frequent Interrupts

Interrupts are great for handling events fairly speedily, and lightweight ISRs help prevent main loop starvation. But “lightweight” is relative to how frequently the interrupts get called in the first place. No matter how little is going on inside the ISR, there’s always some finite call and return overhead. (Although there are clever, machine-specific ways to minimize this that are out of scope for this article, but that we’d love to cover later.) And the ISR has to do something after all.

If your ISR is going to be called very, very frequently, even a lightweight interrupt can block the main loop. Imagine that it takes a total of 40 cycles just to get into and out of an interrupt, and the interrupt-triggering event ends up happening every 50 cycles. You’re totally blocked if your ISR takes more than ten cycles to run, and your main loop code is reduced to a crawl no matter what. Even the shortest interrupts take some time for call and return.

A sneaky case of too-frequent interrupts can occur with push buttons. On one hand, it’s tempting to handle push button input through interrupts, responding to an infrequent input quite rapidly. But real-world push buttons often “bounce” — making and breaking the circuit many times over a couple of milliseconds as the metal plates inside settle into contact with each other. These glitches can trigger interrupts frequently enough that the ISRs can’t keep up, and thus the main loop gets blocked for a little while.

With a pushbutton, you can be pretty sure that it will only bounce for a few milliseconds, limiting the extent of the main loop starvation, so we view this case as technically flawed, but practically benign. But the same rationale goes for any noisy signal, and if noisy inputs can cause interrupt overload, you’ll need to think about how to deal with that noise. Imagine a loose wire on a model rocket guidance system preventing the main loop from getting its work done. For critical systems, some filtering on the input to keep it from bouncing around is probably a good idea.

Yo Dawg, I Heard You Liked Interrupts…

Compilers like AVR-GCC do The AVR hardware does you the favor of turning off the global interrupt mask bit upon entering an ISR. Why? Because you probably don’t want interrupts getting interrupted by other interrupts. If they’re intended to be short bits of code anyway, it’s more reasonable to finish up this ISR and then tackle the next. But there’s another reason you should be wary of interrupts inside your interrupts.

Each call to an ISR or any other function starts off with the compiler stashing variables and the current location of the program counter (which keeps track of which instruction the CPU is currently executing) in a memory location called the stack. Nesting functions inside functions pushes more and more variables onto the stack, and when the stack fills up to overflowing, bad things happen. Because microcontrollers often have relatively limited RAM, this can be a real problem.

If you’re the type of programmer who insists on writing recursive functions, you can at least estimate how many times your function will call itself and figure out if you’re likely to smash the stack. But since interrupts are triggered by external events, they’re out of your control, and allowing interrupts to interrupt each other leaves you hoping that they don’t end up nesting too deep. This is the rationale behind turning off the global interrupt enable bit by default when handling an ISR.

Of course, you may be running some lower-priority interrupts that you’d absolutely like to get interrupted by further interrupts. Even on AVR-GCC, there’s a provision for doing this, so it’s not like you can’t. Just be judicious when you let interrupts interrupt other interrupts on the smallest micros.

For instance in our accelerometer-with-killer-laser example above, since we only had two ISRs anyway and they had a very clear priority relationship, we could have either re-enabled the global interrupt bit from within the read_accelerometer() function, or defined the entire ISR to be interruptible. On the AVR platform, the “avr/interrupt.h” lets one define an interruptible interrupt like so:

 
ISR(INT0_vect, ISR_NOBLOCK) {
...
}

Now you know. Just use it responsibly.

Next Column: The Ugly.

In the next Embed with Elliot, we’ll tackle The Ugly: race conditions and failures of atomicity. The short version of the story is that interrupts will strike at the worst possible times, and you’ll have to adjust your thinking correspondingly, or else mayhem ensues: if() blocks can behave “incorrectly” and variables can take on entirely “wrong” values.

But for now, to avoid “The Bad” of using interrupts, just remember to keep your ISRs short and don’t be afraid to subdivide tasks that need different priority levels. Sounds easy enough, no?

28 thoughts on “Embed With Elliot: Interrupts, the Bad

  1. Excellent background material.
    This also gets very tied up with the choice of a multitasker: preemptive or cooperative.
    It took me many years to appreciate that proper use of interrupts on these “small” systems can allow you to use a faster and simpler cooperative (or round-robin) multitasker. There is no need of preemption if the background interrupts can queue the data stream(s).
    And race/lock conditions (multitasker problems) are far less with a cooperative system. But, of course, you need to design and use it properly.
    Another series of articles?

  2. Great articles. Seems like your suggestion for pushbutton interrupts is to use some kind of hardware-based debounce (such as a schmitt trigger + capacitor or something like that). How would you still debounce in firmware? Run simple debounce code in the ISR?

    1. If your pushbuttons work with interrupts, you could disable your pushed button interrupt, and then set a timer (if you have a spare one) to wait out some “debouncing delay”. And then, when that timer is done ticking, enable interrupt back again.
      Other approach may be is to set up a timer as a periodic “system tick”, and then do your button polling in that timer’s interrupt, and then store your button values somewhere. That system tick should be low enough so button bouncing is completely ignored between “ticks”. That’s how it was done in ZX Spectrum (and probably in most other home computers) for example. That machine only had a single “Vsync” interrupt (once after each video frame), and it’s ROM BASIC used to do exactly that – poll the keyboard and do some other housekeeping stuff in an interrupt 50 times a second. There were no other debouncing methods involved, I think.

      1. I like the system tick concept; I’ve got far more inputs to debounce than I have independent timers, and I’m trying to account for multiple buttons getting pressed concurrently. Seems like the “tick” method won’t take much overhead and should still be quick enough to produce good response time. Thanks!

      2. That’s what I usually do in my small project that uses 8-bit microcontroller. I have an interrupt that run at something like 200Hz (5ms). All the interrupt does is to set a up a flag that my main code can poll. I have my code divided up into smaller cooperative tasks. For my charger: check for over voltage/current, scaling ADC values, button task, charging algorithm, display, update to host, User interface etc.

        The advantage is that this can be written in C without doing some assembly code to save context/CPU registers and diving up the stacks for different tasks nor making sure that your libraries (e.g. math) have to be reentrant.
        .The bad part is that your code has to structure in such a way to always exit so that the next task get to run. So you can’t have an endless loop polling in the middle of your code. Having to save your states et in global so that the internal states for each of the tasks survives between runs.

        On larger projects, I would just use ARM chip and run a RTOS.

        1. Forgot to mention that my button task does the polling and keep track of the count for each pass that the key press remains the same. Once it passes the debounce threshold, then the button event is noted. Also it handles key repeats/long key pressed with another longer threshold.

      3. If you’re doing the system tick debounce, it’s not enough to consider one interval because a button may be bouncing just then. The interrupt is basically a single point measurement that cannot determine whether they key is up, going up, or down, or going down.

        The probability of getting a bounce at exactly the moment when you’re reading the key state is low, but you will get the inevitable glitch anyways. To decrease the probability you correlate two ticks. The probability that you get a bounce exactly two ticks apart is so small you can be pretty sure the key was really up or down between the interval.

    2. Without a hardware filter, using pin change interrupts on switches is asking for trouble. I usually prefer using a timer interrupt and polling the pin inside the ISR. Just make sure your interrupt period is greater than the “bounce period” of the switch, and it works out very well.

      1. Makes sense, thanks. I suppose if the switch bounce period is 20ms, and I set the ISR period to 50ms, I’d be in good shape (as an example); besides, I probably shouldn’t care about button pushes with a duration under 1/20 s anyway. I’ve got a decent analog ‘scope, but it seems like the best real-world way to characterize buttons is with a one-shot DSO, right?

      2. That’s what I thought as I usually use timer interrupt+polling in my main code as the chip I had don’t usually have that. It turns out that the pin change on one project works quite well for me. Or I might be lucky with the buttons I am using that have very little bounce on that project.

        On the Freescale Kinetis series of ARM chip, you can set up filtering on the I/O pins. :)

      3. Or you can have a little bit of logic in the interrupt if your tics are faster.

        There are a couple of main approaches. Suppose an input from a switch is high, and it goes low as the switch is changed – but maybe jumps high and low a few times, stabilizing within N microseconds.

        One approach is to delay recognizing the transition (in this case, high to low) for N microseconds (implemented as some number of clock tics). For example, after a transition, wait N microseconds and sample again to get the “real” value. Or keep sampling and don’t treat it as a change until the input is consistent for N microseconds (some number of consecutive tics).

        Another is to trigger the event immediately upon the first transition, but not to accept another transition for at least N microseconds. Just ignore that input for some number of tics before you sample it again to begin waiting for the next transition. I tend this way, because once you’ve seen a transition (after being stable), you know what the next state is, so there’s no need to delay detection – just delay long enough for the subsequent bounced before accepting the next transition.

        If however you have a switch which bounces when it’s NOT going to toggle to the other state (say from vibration) then one of the delayed trigger approaches may be good for filtering that out.

    3. I’d rather avoid doing anything with buttons in ISR. Poll your buttons on at the beginning of your main loop, note pushed ones in variable, test again at the end, compare with the variable, and then take action. This way you have software debounced buttons. Adding two resistors and capacitor to the button will solve the bounce problem too.

      1. If you’re testing the button at the beginning and at the end, isn’t that the same as just testing it twice in the beginning?

        The difference in time between the two readings is just a few clock cycles, which might be too short to let the button settle down.

        1. You are right, but it depends on the length and complexity of main loop and functions called inside it (I have this bad habit of making them big). When I know that loop will be rather short (or my oscillator is faster than 100kHz), I increment helper variable and use it as crude software timer. Whenever I use hardware timers, I tend to set them for 1us ticks, and then less critical calls, like user IO are done with software timers.
          Using interrupts for user input is just wrong, unless it’s emergency stop (I would avoid that too by making emergency stop implementation in hardware). I did use Interrupt On Change for user input in one project, but that broke the flow of the program, caused timing issues and loss of data.

          1. No it doesn’t, because the end of the loop just goes right back to the start. It’s the same if you read the buttons twice at the start.

            Your main function is structured like ABA and repeats ABAABAABAABA… instead of ABABABA as you might think it does.

          2. That’s why when second test fails, I reset the storage variable, Because first test follows immediately, it will probably have the same result as second test in last loop. If not, it still needs to be confirmed in second test at the end of the current loop. Delay between the tests will always be one loop long, before event is confirmed. So far it worked. Also if you are concerned, you might add empty delay at the end of the loop. It all depends on execution times of the main loop.

    4. One way would be to use a higher priority timer interupt routine that disables the key input interupt after a press until 5ms or so have passed. When the key detected interupt fires it sets a flag for the main loop, sets a value of 6 in a countdown register and disables its own interupt enable flag.

      The timer interupt is handling routine clocks, etc. and it also checks the key detect counter. If >0 then decrement it, if the new value is now =0 then enable the key detect interupt flag.

      An interupt routine like this is also nice for set and forget things like “turn this output on for 10 seconds.” The main loop (or another interupt) turns on the output and sets a value in a countdown register. The same timer interupt described above watches the register for non zero and decrements. When it hits zero it turns off the output.

    5. I never use interrupt to scan keyboard or button and from my point of view it’s a very bad idea.
      I use a timer (system timer, task, whatever…) that scan the hardware directly every -say- 10ms. You CAN’T have glitch with this method and you always debounce very simply the hardware. Why you can’t have glitch: imagine you press button just when you read register, you have two possibilities: button is pressed -> Ok, take it into account, button is not pressed -> it will be taken into account the next tick. That’s it. A keydown detection is:
      if (lastTickRead == 0 AND currentTickRead = 1) pressed++;
      You can even implement burst pressed with a counter
      if (lastTickRead == 1 AND currentTickRead = 1) counterKeyDown++
      if (counterKeyDown == someValue) { pressed++; counterKeyDown=0;}

      Code is just for demonstration, not verified, I use it in a task (FreeRTOS)

  3. This statement is not correct: “Compilers like AVR-GCC do you the favor of turning off the global interrupt mask bit upon entering an ISR.” It isn’t the compiler but the hardware which turns off the interrupts. This is quite important as it avoids race conditions where the next interrupt occurs before the software has turned them off.

    Quote from the ATMega328P datasheet: “When an interrupt occurs, the Global Interrupt Enable I-bit is cleared and all interrupts are disabled. The user software can write logic one to the I-bit to enable nested interrupts. All enabled interrupts can then interrupt the current interrupt routine. The I-bit is automatically set when a Return from Interrupt instruction – RETI – is executed.

    1. Thank you for that. I’ll fix the article. You’re totally right that it only makes sense to do in HW as well.

      (Double-embarassing b/c the next article is essentially on race conditions…)

  4. Wow, great article. I have traditionally avoided interrupts entirely, being wary of their unpredictable behaviors. This was great info on how to avoid the pitfalls to make interrupts less scary. Thanks!

  5. You can also use spare IO pins as an inexpensive profiling tool to see how much time you are spending in an ISR or other routine (and/or how frequently it is being called). Just set an IO bit on entry and clear it on exit. Watch the bits using an oscilloscope or logic analyzer. It’s also a nice way to visualize jitter that might affect some real-time operation.

  6. Just a comment that with real big dangerous things posing real big dangerous risks you don’t do safety interrupts in software. Safety gets its own hardware that directly stops the risk, and if you can’t avoid the software being in the loop then all sorts of stuff has to be done to be able to be able to prove that your design wasn’t at fault if there ever was an accident

    1. Even when they’re not real-big-dangerous, it never hurts to have a power kill switch.

      A friend of mine has this awesome video from when he was in high school, working on a FIRST team, and they had an out-of-control ISR that lead to their robot (which looked like a shopping cart) spinning around really fast, and they’re all trying to chase it to give it the shutdown command… Hilarious. They installed a remote kill switch thereafter.

        1. When your system is in the middle of a swamp you need a watchdog timer to restart everything. The customer cannot wait for you to find the root cause. They need the system running as well as it can until a permanent fix is installed. The system may run for days before the unique sequence of events causes it to lock up. Shutting it down and bringing it back up within minutes is critical.

          There is also the situation where the program gets corrupted in memory and gets into a loop. The watchdog can attempt a restart but system checks during the reboot should keep the system from coming up entirely.

  7. Ugliest interrupt fubar I’ve dealt with: Putting the watchdog service routine in a high-priority timer interrupt. Main code can crash as hard as it likes but the watchdog will never fail!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.