How Three Letters Brought Down UK Air Traffic Control

The UK bank holiday weekend at the end of August is a national holiday in which it sometimes seems the entire country ups sticks and makes for somewhere with a beach. This year though, many of them couldn’t, because the country’s NATS air traffic system went down and stranded many to grumble in the heat of a crowded terminal. At the time it was blamed on faulty flight data, but news now emerges that the data which brought down an entire country’s air traffic control may have not been faulty at all.

Armed with the official incident report and publicly available flight data, Internet sleuths theorize that the trouble was due to one particular flight: French Bee flight 731 from Los Angeles to Paris. The flight itself was unremarkable, but the data which sent the NATS computers into a tailspin came from two of its waypoints — Devil’s Lake Wisconsin and Deauville Normandy — having the same DVL identifier. Given the vast distance between the two points, the system believed it was looking at a faulty route, and refused to process it. A backup system automatically stepped in to try and reconcile the data, but it made the same determination as the primary software, so the whole system apparently ground to a halt.

It’s important to note that there was nothing wrong with the flight plan entered in by the French Bee pilots, and that early stories blaming faulty data were themselves at fault. However we are guessing that air traffic software developers worldwide are currently scrambling to check their code for this particular bug. We’re fortunate indeed that safety wasn’t compromised and only inconvenience was the major outcome.

Air traffic control doesn’t feature here too often, but we’ve previously looked at a much earlier system.

Header image: John Evans, CC BY-SA 2.0.

The Glitch That Brought Down Japan’s Lunar Lander

When a computer crashes, it usually doesn’t leave debris. But when a computer happens to be descending towards the lunar surface and glitches out, that’s a very different story. Turns out that’s what happened on April 26th, as the Japanese Hakuto-R Lunar lander made its mark on the Moon…by crashing into it. [Scott Manley] dove in to try and understand the software bug that caused an otherwise flawless mission to go splat.

The lander began the descent sequence as expected at 100 km above the surface. However, as it descended, the altitude sensor reported the altitude as much lower than it was. It thought it was at zero altitude once it reached about 5 km above the surface. Confused by the fact it hadn’t yet detected physical contact with the surface, the craft continued to slowly descend until it ran out of fuel and plunged to the surface.

Ultimately it all came down to sensor fusion. The lander merges several noisy sensors, such as accelerometers, gyroscopes, and radar, into one cohesive source of truth. The craft passed over a particularly large cliff that caused the radar altimeter to suddenly spike up 3 km. Like good filtering software, the craft reasons that the sensor must be getting spurious data and filters it out. It was now just estimating its altitude by looking at its acceleration. As anyone who has tried to track an object through space using just gyros and accelerometers alone can attest, errors accumulate, and suddenly you’re not where you think you are.

We know what you’re thinking: surely they would have run landing simulations to catch errors like these? Ironically they did, it’s just that after the simulations were run, the landing site for Hakuto-R was changed. Unfortunately, nobody thought to re-run the simulations, and now the Moon has a new lawn ornament,

We’ve previously written about why lunar landings are so hard. While knowing what led to the crash will hopefully prevent a similar fate for future missions, the reality is that remotely landing a robot on a dusty world without the help of GPS is fiendishly difficult and likely will be for some time.

Continue reading “The Glitch That Brought Down Japan’s Lunar Lander”

Parsing PNGs Differently

There are millions of tiny bugs all around us, in everything from our desktop applications to the appliances in the kitchen. Hidden, arbitrary conditions that cause unintended outputs and behaviors. There are many ways to find these bugs, but one way we don’t hear about very often is finding a bug in your own code, only to realize someone else made the same mistake. For example, [David Buchanan] found a bug in his multi-threaded PNG decoder and realized that the Apple PNG decoder had the same bug.

PNG (Portable Network Graphics) is an image format just like JPEG, WEBP, or TIFF designed to replace GIFs. After a header, the rest of the file is entirely chunks. Each chunk is prepended by a four-letter identifier, with a few chunks being critical chunks. The essential sections are IHDR (the header), IDAT (actual image data), PLTE (the palette information), and IEND (the last chunk in the file). Compression is via the DEFLATE method used in zlib, which is inherently serial. If you’re interested, there’s a convenient poster about the format from a great resource we covered a while back.

Continue reading “Parsing PNGs Differently”