Expiration dates for computer drives? That’s what a line of HP solid-state drives are facing as the variable for their uptime counter is running out. When it does, the drive “expires” and, well, no more data storage for you!
There are a series of stages in the evolution of a software developer as they master their art, and one of those stages comes in understanding that while they may have a handle on the abstracted world presented by their development environment they perhaps haven’t considered the moments in which the real computer that lives behind it intrudes. Think of the first time you saw an SQL injection attack on a website, for example, or the moment you realised that a variable type is linked to the physical constraints of the number of memory locations it has reserved for it. So people who write software surround themselves with an armoury of things they watch out for as they code, and thus endeavour to produce software less likely to break. Firmly in that arena is the size of the variables you use and what will happen when that limit is reached.
Your Drive Is Good For About 3 Years And 9 Months
Sometimes though even developers that should know better get it wrong, and this week has brought an unfortunate example for the enterprise wing of the hardware giant HP. Their manufacturer has notified them that certain models of solid-state disk drives supplied in enterprise storage systems contain an unfortunate bug, in which they stop working after 32,768 hours of uptime. That’s a familiar number to anyone working with base-2 numbers and hints at a 16-bit signed integer in use to log the hours of uptime. When it rolls over the value will then be negative and, rather than the drive believing itself to be in a renewed flush of youth, it will instead stop working.
Egg on the faces of the storage company then, and an urgently-released patch. We suspect that if you own a stack of these drives you will already know about the issue and be nervously pacing the racks of your data centre.
This does raise a question as to how such an issue could manifest itself in 2019. We can forgive developers in the 1960s or 1970s using limited-size variables to store incrementing numbers because there was little experience of rollover bugs and the hardware of their day was often severely constrained. But as we approach the third decade of the 21st century we should have both the experience and the hardware to avoid the trap.
It’s hardly as though there have not been a series of widely publicised rollovers such as the Year 2000 so-called “Millennium bug” which have entered our culture to the extent that they’ve been parodied on the Simpsons and in countless other places. We’ve had jokes about the number of McDonald’s burgers sold rolling over, and on a more serious note we’ve seen space probes crash and as an industry we’ve got an eye towards the UNIX time rollover in 2038. For this still to be a thing today, where have we gone wrong?
How Should We Be Finding Our Firmware Developers?
It’s a question we have to ask ourselves then, does the effect of Moore’s Law breed complacency? When all the computing devices for which you code have effectively limitless resources, do you lose track of the constraints of the hardware?
This is written from a formative computing experience with very limited resources as a Hackaday scribe whose first machine was an 8-bit home computer with only 1k of memory. With that in hand, or perhaps as a more modern equivalent the experience of coding for one of the smaller microcontrollers, developing with a full awareness of the machine behind the code becomes second nature. When a variable requires two bytes, you know it requires two bytes, because you’ve had to make sure that there is a two byte space in memory for it. By comparison, it’s easy when declaring an integer variable in a modern IDE for a high-spec machine to forget that its real-world effect is to reserve two bytes, and thus it can only count up to 32,768 of whatever it is you are counting.
Maybe this will never be a problem that completely goes away. After all, each successive generation must learn about it the hard way, and the old-hands will nod sagely while another satellite crashes or an enterprise server fails. Meanwhile, as always, patch early and patch often.
Header image: Phrontis [CC BY-SA 3.0].