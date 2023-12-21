Data retention is a funny thing. Atmel will gladly tell you that the flash memory in an ATmega32A will retain its data for 100 years at room temperature. Microchip says its EEPROMs will retain data for over 200 years. And yet, humanity has barely had a good grasp on electricity for that long. Heck, the silicon chip itself was only invented in 1958. EEPROMs and flash storage are altogether younger themselves.

How can these manufacturers make such wild claims when there’s no way they could have tested their parts for such long periods of time? Are they just betting on the fact you won’t be around to chastise them in 2216 when your project suddenly fails due to bit rot.

Well, actually, there’s a very scientific answer. Enter the practice of accelerated wear testing.

Faster, Faster Now

EEPROM and NAND flash storage are both immensely important technologies. EEPROMs are used to store firmware for all kinds of devices, as well as things like cryptographic keys and other such largely-static data. Most EEPROMs have data retention ratings for many decades, if not centuries. Flash can be used in much the same way, but it’s also used as mass storage. It’s not quite as good at retention as EEPROM is. Some parts are rated for only a few years if left to sit, particularly at elevated temperatures. Other flash parts can hang on to data for much longer if designed to do so.

The question is, though, how we determine these numbers. Given the impracticality of real-time testing over a century, the industry instead relies on accelerated life testing methods. These techniques involve subjecting memory devices to heightened stress factors, such as elevated temperatures, to expedite the aging process. The underlying principle is based on the Arrhenius equation, which posits that the rate of chemical reactions increases exponentially with temperature.

The degradation of memory cells is, fundamentally, because of chemical reactions that take place over long periods of time. Over time spans of decades or centuries, plastics degrade, materials oxidise, and all kinds of other chemical reactions go on. These chemical reactions can damage the tiny structures of a silicon chip responsible for storing data as little electrical charges. It’s almost like when you leave an apple out and it rots away; the phenomenon is often referred to as bit rot.

As per the Arrhenius equation, you can thus model a long period of memory degradation in a much shorter time by elevating the temperature, because the reactions occur at a faster rate. If you’ve studied physics, this should all be pretty familiar. Temperature is really just the rate at which atoms are wiggling around. Thus, if they’re hotter, they’re wiggling around more and it makes sense that reactions would occur faster, what with all these atoms and molecules bobbling about. This explanation is simplified, of course, and won’t get me invited to any real science conferences, but it serves our purposes here.

Thus, to test a device for long-term data retention, you merely need to place it in a hotter-than-typical environment and run checks on how it holds data over time. Naturally, this is done with many samples with scientific rigor, enabling statistical insights to be made. Obviously, there are limits, too. There’s no point testing EEPROMs at 500 C, where they’ll melt and burn in mere seconds, retaining precisely zero data. However, within the realistic limits of the part, significant insights can be made.

By observing the effects of accelerated aging, predictions can be made about the long-term retention capabilities of these devices. After accelerated aging, the EEPROM and Flash memories undergo rigorous data retention tests. The outcomes of these tests are extrapolated to estimate how the devices would perform over extended periods at normal temperatures. This extrapolation, while scientifically grounded, is not without its uncertainties and relies heavily on sophisticated statistical models.

While temperature plays a pivotal role in accelerated aging, other factors like voltage variations and humidity are also considered to simulate various stress conditions. This holistic approach ensures a more comprehensive assessment of long-term data retention capabilities.

Due to inherent variations in memory cells, a statistical approach is employed in these tests. By testing a large batch of devices and analyzing the average behavior, more accurate predictions of long-term performance are made. This statistical analysis is crucial in understanding the overall reliability of memory technology.

Key to these longevity tests is the monitoring of specific failure mechanisms, such as charge leakage in memory cells. Understanding these failure modes is essential in predicting data loss and devising strategies to mitigate such risks. These specific failures will occur on their own timeframes and will be more subject to certain conditions than others.

It bears noting that accelerated aging methods aren’t just used for assessing flash memory and EEPROMs; the techniques are applied to everything from archival papers to inks and other such products. These methods aren’t without their negative points, though. Criticisms of these methods revolve around the fact that different chemical reactions can occur at different temperatures, which spoils the correlation between an accelerated aging procedure and what would naturally occur over time at a lower temperature. Correlation can at times be poor, and for many items, particularly those invented recently, we simply haven’t had the chance to compare accerated aging results with what occurs in real time. At the same time, with the rate that technology moves, it raises the question—will anyone in 2100 care if an ATmega can really store data on an EEPROM for 100 years?

Despite the rigor and sophistication of these testing methodologies, predicting the performance of memory devices over a century carries inherent uncertainties. Seemingly minor manufacturing changes or unforeseen environmental factors can impact the accuracy of these predictions. Understanding the physics and chemistry at play is key to accurately modelling long-term aging in more human compatible time frames. Even still, our best models are just that. Until somebody actually checks a given EEPROM or flash part in a century, we can’t know for certain how accurate these models really are.

At the end of the day, most of us don’t have to worry too much about storing data on centuries-long time frames. For those that do, accelerated aging techniques are a highly useful tool in understanding how best to preserve data on those time scales. If you take one thing away from all this, just remember that leaving your flash drives or microcontrollers on a hot surface is going to trash your data far quicker than if you left them somewhere cooler. If your PhD thesis is currently sitting on an old flash drive in a hot car, you’d be best advised to make multiple backups and store it somewhere wiser.