HALT In The Name Of Testing

“Did I forget something?” It’s that nagging feeling every engineer has when their project is about to be deployed – it may be a product about to be ramped into production, a low volume product, or even a one off like a microsatellite. If you have the time and a few prototypes to spare though, there are ways to alleviate these worries. The key is a test method which has been used in aerospace, military, and other industries for years – Highly Accelerated Life Testing (HALT).

How to HALT

The idea behind HALT testing can be summed up in a couple of sentences:

  • Beat your product to death.
  • Figure out what broke.
  • Fix it, and fix the design.
  • Repeat.

Sounds barbaric, and in many cases it is. HALT testing is often associated with giant test chambers which are literally designed to torture anything inside them. Liquid nitrogen shock cools the chamber as low as -100°C. The Device Under Test (DUT) can soak at that temperature for hours. Powerful heaters then blast the chamber, causing temperature rises of up to 90°C per minute, topping off at up to 200°C. Pneumatic hammers beat on the chamber table causing vibrations at up to 90 Grms and 10 KHz. Corrosive sprays simulate years of rain and humidity. These chambers are literally hell on earth for any device unlucky enough to be placed inside them. It’s easy to see why this sort of testing is often referred to as “Shake and Bake”.

A typical HALT chamber
A typical HALT chamber

Proper HALT isn’t just about turning the chamber up to 11 and seeing what melts or breaks. Any reliability engineer worth his or her salt will tell you there is a method to the madness. Aerospace and Military devices use statistics and apply methods such as the Eyring model
to get starting points for their tests. The National Institute for Standards (NIST) has a website which outlines these models and their equations.

From the hacker’s perspective though, common sense and back-of-the-napkin calculations will often get you into the ballpark. If your device is made of ABS plastic, cranking the heat to 250°C is more than enough to turn it into a misshapen melted mess. In that case it would be better to limit your maximum temperature to something below the melting point of your materials.

For the electronics in our projects, semiconductor manufacturers will often give us enough data to get started. This is the Mean Time Between Failure (MTBF) or lifetime estimates you sometimes find buried in the graphs at the end of a datasheet. Atmel has an FAQ entry for their AVR line of microcontrollers. At 65ºC, an AVR can be expected to run for 1929 years. At 85ºC, this drops to 509 years. Ramp the temperature up to 105ºC, and you’re only going to get 153 years out of your AVR.
These are all huge numbers, but you can easily see how much temperature plays a role in reducing the lifetime of electronics.

Things get even more complicated when one gets into ARM microprocessors. Motorola Freescale NXP has an application note dedicated to determining the life expectancy of their i.MX6 dual and quad-core ARM devices. Temperature is again important, but in this case clock speed also plays a large role.

In many cases it would be hard to get things hot enough for the semiconductors to fail in a reasonable amount of time for a HALT test. However, it’s not just the chips you’re testing. It’s the entire system – every resistor, capacitor, diode, and the board itself. I’ve seen projects that would work fine at room temperature, then fail miserably at 60ºC due to clock skew.

Did it break yet?

Instrumented boards inside a HALT chamber
Instrumented boards inside a HALT chamber

One problem with HALT testing is determining if the DUT is still working. You can’t exactly open the chamber doors and toss some scope probes on when your project is sitting at -100°C in a nitrogen atmosphere.

Commercial chambers often are coupled with multichannel data acquisition systems that continuously monitor the various outputs of the DUT. Thermocouples monitor the exact temperature of the chamber itself as well as your project. If (when) something goes wrong, the time, temperature, and data are all logged.

HALTing on a Shoestring Budget

Owning a HALT test chamber is probably outside the budget of most hackers. There are plenty of test companies out there who will run your device in their chambers for a fee. The resourceful hacker can replicate quite a bit of that test equipment at home.

Heating is easy – just find an old kitchen oven. The typical kitchen oven can easily hit 200°C. Self cleaning ovens can get close to 500°C. For small chambers, toaster ovens will do. The many reflow oven projects we’ve covered over the years show how easy it is to convert a typical oven to computer control.

A hacked oven used for baking powder coated parts

Cooling is a bit harder, but not impossible. The simplest cold chamber would be a styrofoam beer cooler and some ice packs. Dry ice will allow you to take things even cooler. The enemy with this sort of cooling is condensation. Commercial chambers replace humid air with nitrogen gas to keep this from happening. Nitrogen cylinders are available at most welding supply houses.

Vibration testing is where the big tools come in – power tools that is. A mains powered drill rotates around 2800 RPM. Adding an offset weight to the drill shaft can create a powerful vibration engine. The pneumatic hammers of HALT chambers can be replicated using low cost air tools.

For logging the data, the sky’s the limit. Data loggers can be as simple as an Arduino sending digital IO and thermocouple data to a local PC, or as complex as a Raspberry Pi driving GPIOs to run your project through various tests.

Troubleshooting problems that only happen at high or low temperature can be frustrating to say the least. The best tools in a hacker’s arsenal are a good heat gun and a can of freeze spray. Sections of the board can be heated or cooled to replicate the problem at the bench. Once a suspect part is isolated it’s time to start looking at the signals to see what exactly is happening.

Next time you’re in the design and early testing phases of a project, give HALT a try. It can literally shake some bugs out of your design.

16 thoughts on “HALT In The Name Of Testing

    1. You’re right. The numbers came from looking at various chamber specs. Some chambers can do 90 Grms and 10khz random, but I doubt they can do them both at the same time. I updated the article to reflect this.

    2. What seems funny? 90 g is roughly 900 m/s^2, which at 10 kHz comes to 900/10e3^2 = roughly 9 um of displacement. And if you’re testing electronics assemblies you may only have a few kilograms of mass (including the test table), which means only a few kN of force required. I used to work for a company that built actuators that could do that easily; we’d have considered that a fairly low displacement level at that frequency, actually.

  1. I think you have misconstrued MTBF a bit. The MTBF tells you how long it should take for half of your devices to fail, but it doesn’t (by itself) how soon you should expect the first failure out of some large number of devices. If a device has a high infant mortality rate then it’s possible that some significant percentage of parts will fail very soon and the rest will fail much later. Also, the relevant temperature is the temperature of the device die rather than the ambient temperature.

        1. I should clarify: Those standards allow one to *estimate* failure rates, from which MTBF can be calculated. The actual measurement of MTBF is well defined. However, most quoted MTBF figures will be estimates, rather than measurements.

  2. By the way: For anyone attempting to use Nitrogen in Halt- tests as described in the article, ensure that safety precautions / considerations are taken for the test area (e.g. a small room without ventilation would be a bad idea). In sufficient quantity, Nitrogen is a silent killer: It dilutes the atmospheric oxygen; you don’t smell it and your body isn’t good in sensing oxygen depletion.
    “Oxygen and Nitrogen are the most underestimated gases when it comes to safety hazards”; that’s no insight from me, but the message which we got in safety training by one of the worlds biggest gas- suppliers.

  3. Adam, this was an interesting read! HALT testing reduces design and product development time. It can also spot weaknesses in an item for improvement. This can increase reliability and customer confidence. You may also know the destruct limit. The ideal time for the testing procedure is generally when product prototypes first become available.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.