Engineering For The Long Haul, The NASA Way

The popular press was recently abuzz with sad news from the planet Mars: Opportunity, the little rover that could, could do no more. It took an astonishing 15 years for it to give up the ghost, and it took a planet-wide dust storm that blotted out the sun and plunged the rover into apocalyptically dark and cold conditions to finally kill the machine. It lived 37 times longer than its 90-sol design life, producing mountains of data that will take another 15 years or more to fully digest.

Entire careers were unexpectedly built around Opportunity – officially but bloodlessly dubbed “Mars Exploration Rover-B”, or MER-B – as it stubbornly extended its mission and overcame obstacles both figurative and literal. But “Oppy” is far from the only long-duration success that NASA can boast about. Now that Opportunity has sent its last data, it seems only fitting to celebrate the achievement with a look at exactly how machines and missions can survive and thrive so long in the harshest possible conditions.

Fail Early, Then Stop Failing

Failure is always an option, and recognizing that fact is one of the prices of doing business in space. The early days of space exploration were punctuated with multiple catastrophes, mostly within the first few minutes of launch. That just reflects the difficulty of the endeavor; taming tons of volatile propellants and getting everything in the right place at the right time is a challenging business. Mistakes were made, and many missions were lost.

https://www.youtube.com/watch?v=zVeFkakURXM

But failures, especially high-profile and expensive ones, teach valuable lessons, and NASA is really good at figuring out what went wrong when anything happens. NASA has entire labs dedicated to failure analysis, from structural and materials failure to electrical issues and software. They take failure analysis very seriously, to the point of writing their own software, the Root Cause Analysis Tool, or RCAT, to track and evaluate undesired outcomes.

Learning from their mistakes has increased the success rate of missions steadily over the years. Losses of missions due to launch issues are few and far between now compared to the early days. NASA still suffers failures once payloads are in transit or on station, of course. For example, Galileo suffered a serious failure while deploying its high-gain antenna that almost ended its mission to study the Jupiter system. Failure analysis led NASA engineers to conclude that leaving an umbrella-style antenna stowed for four and a half years and not relubricating the system prior to launch is not a good idea.

Galileo had a wonky main antenna and still delivered amazing science. Source: Jet Propulsion Lab

Failure analysis doesn’t just look at problems with hardware; NASA is very serious about finding issues with their processes too. When communication with the Mars Climate Orbiter was lost as the spacecraft entered orbit, it joined a long list of missions that the Red Planet rebuffed. NASA discovered that the root cause was the use of non-SI units in ground-based software used to calculate the thrust of orbital insertion burns, rather than the SI units expected. It also found that warnings from two separate navigators that the spacecraft was not in the right position were ignored because they had not been reported according to policy.

Engineered to Last

Space exploration is an expensive business, mainly because of the cost of getting useful amounts of hardware out of the deep gravity well we all call home. But the spacecraft themselves are pretty pricey, thanks in part to the engineering that goes into them. When something is intended to operate for decades while traversing millions of miles of the most hostile conditions imaginable, close enough won’t cut it.

Workmanship counts: Spirit lasted for 8 years. Click to see the lacing in detail. Source: JPL

To make sure that interstellar probes, planetary explorers, and even the ground-based system that support them do not fail, or at least maximize the time until failure, NASA has developed a massive body of very specific and very stringent workmanship standards. As Gerrit Coetzee pointed out a few years back, the workmanship standards documents are themselves works of great beauty. They cover every conceivable kind of electromechanical assembly, showing the “NASA way” of doing it correctly. How to solder correctly, when to crimp instead, how to prevent PCB damage, how to prevent electrostatic discharge damage, and even how to properly tension wire ties are all covered.  For my money, though, the pièce de résistance is the section on lacing wiring harnesses. Pure engineering beauty.

Aesthetics aside, the NASA standards for workmanship and the general engineering principles it follows are a huge factor in favor of spacecraft lasting long past their “best by” date. The amazing success of Opportunity was only the latest in long-haul engineering wins for NASA, thanks in large part to principles like building in redundancies at every level of design. That saved the rover’s bacon a number of times, including in 2014 when “amnesia events” with the vehicle’s non-volatile memory led to several system resets. Controllers were able to reconfigure the rover to use only its RAM and continue the mission for another full year.

Sticking with seemingly outdated technology is another way NASA gets so much life out of its machines. We’ve covered a few examples of this before, like the use of orbital photo labs for lunar reconnaissance, or the 8-track tape decks used on Voyager and Galileo. Both were tried and true and offered reliability far beyond what could have been achieved with other means.

The computers that NASA chooses to fly into space are also decidedly behind the times compared to what is commercially available at the time the vehicle is built. Galileo, for instance, flew to Jupiter with six RCA COSMAC 1802 8-bit microprocessors, built on sapphire substrates for radiation resistance. Even New Horizons, built in 2006 and which recently visited Ultima Thule, was equipped with a radiation-hardened version of the MIPS R3000 CPU, a RISC chip that first hit the market in 1988. Old, slow, and working beats fancy, fast, and buggy any day of the week.

Moving the Goalposts

There’s another aspect to the success of long-term NASA missions, this one more of a social engineering approach than physical engineering. NASA design its missions very carefully, in terms of what science gets done, when it gets done, and how resources on a spacecraft are allocated. For Opportunity, NASA got a lot of mileage out of the oft-repeated “it was only supposed to last for 90 days” figure. I won’t quibble with that, but it’s a little unfair to NASA. The vehicle was obviously engineered to last much, much longer than 90 sols (Martian days), and if that dust storm hadn’t been as deep and as long as it was, the rover would probably still be running today. Rather, NASA planned for all the science to get completed within 90 sols, hoping that it would last longer. Every sol past the scheduled end of the science program was gravy.

The Voyager probes are 14 billion miles away now and still working. Source: JPL

This mission extension is something that NASA very much plans for – sending millions of taxpayer dollars out into space without a plan to maximize the return on investment doesn’t work. The Voyager program is a perfect example of this. Technically, the mission for Voyager 1 was over when it flew by Saturn, and Voyager 2‘s primary science was completed after its encounter with Neptune. But with the spacecraft still in good shape and with minimal budget needed to continue communicating with them, NASA began the Voyager Interstellar Mission (VIM) that continues to this day, gathering data from interstellar space.

For my part, as impressive as Opportunity‘s accomplishments were, and for as sad as I felt when that dust storm set in and we stopped hearing from the rover, the real benchmark of space engineering is the Voyager twins. Their RTG power systems should provide enough juice to keep the VIM going for another five years or so. That will be a truly sad moment for me, when the mission that I’ve followed from its launch in 1977 will finally be over. But I’ll take solace in the idea that perhaps someday, an alien civilization will find these exquisite machines and see just what kind of engineering their makers were capable of.

49 thoughts on “Engineering For The Long Haul, The NASA Way

    1. The TL;DR version. 1. If you ‘borrow’ the bolts holding a satellite to device that can dump it onto the floor, leave a very prominent note. 2. When someone says “Hey. There’s no bolts holding our expensive satellite to the device that can dump it on the floor.” PAY ATTENTION TO THAT PERSON.

  1. ” Old, slow, and working beats fancy, fast, and buggy any day of the week.”

    And yet how often such is ridiculed as following the “buggy-whip” because one doesn’t embrace progress with an almost religious fervor?

    1. It is often right to ridicule that approach, sometimes it is not. There’s always a balance between following every new technology, and actually getting something done.

          1. Not if you ever read ‘The Experts’, a book full of vignettes of presumed superiority complex of Gen. Westmoreland and the others running the Vietnam War for the U.S. And this, a mere 19 years after Dien Bien Phu.

    2. Respectfully, you are using that sentence out of context, to compare apples to oranges. The buggy whip metaphor isn’t directed to those who don’t embrace progress with a religious fervo, its directed toward thos who reject change out of hand, with a religious fervor.

    1. As a USAian, I wish the above were true. Still Imperial everywhere that counts. The only metric measurements are on imported stuff. It’s very frustrating, because there’s no good reason (except inertia) for us to keep Imperial measurements.

      1. Money’s the reason. No one wants to replace speed limit signs on some 4.1 million of road with Km and distance or exit numbers. The cost adds up. Replacing scales that only displays in Oz or Lbs (like my cheap postal scale), updating kitchen cabinet with metric utensils, and educating a significant portion of the 330 million Americans that don’t understand metric system.

        USA remains the largest country in the world that has not fully mandated metric system (along with 2 smaller countries that most people probably wouldn’t know where)

        1. The question is, how much does it cost to KEEP the US customary units? This damages the country in countless ways, especially in international trade, science and engineering.

          Problems keep piling up, and this is just one of them.

          1. Decals can moderate the cost of modifying roads signs. I suspect that’s what was used when the US 55 MPH national speed limit was repealed, because the changes where made so quickly.While doing so is as old as WW II the trend now is to use weight to measure dry ingredients. Kitchen measuring utensils have been marked with both units for decades now, so most home are equipped already. Heck the recently manufacture retro looking spring scale I purchase for decoration is marked with both units I graduated from I small rural high school in 1974 understanding the metric system. in the the majority of the US population, now doesn’t understand the metric system, or can’t learn in a four hour course, the USA has big problems. Nothing new about the points you make, likewise there is nothing new about my counterpoints.

          2. The cost of switching to metric would hardly be limited to road sign decals, and if Miroslav’s question is to be answered seriously, any subtly hidden costs are important to consider in a sober cost-benefit analysis. I have seen several purported cost-benefit analyses by metric boosters, but all I’ve seen so far seem far from disinterested (would welcome links to cost-benefit analyses that you find authoritative).

            Take, for example, US property records – would this mountain need to be totally upturned? Or do we go metric at some designated date? If the latter, how do we cope with transactions that span the turnover date? Who will be responsible for the cost of converting contracts designated in customary units? Who absorbs liability for inevitable errors made in converting documents, many of which are OCR of low-resolution scans of typewritten documents?

            How many government contracts and standards documents would need to be revised? Which federal office will resolve all issues and disputes related to this, and how will the work be funded and staffed?

            I could list many more potential costs, but I’m only trying to make the point that there would be many costs that you may not have considered against supposed benefits to trade, etc.

            As for …”science, and engineering” what exactly is the problem? Nothing now prevents scientists and engineers from calculating and publishing in using SI units without mandate from the government, as any glance at IEEE publications will confirm.

          3. Right. No one is allowed to have two sets of wrenches and no one knows how to convert numbers. Just like no one can speak more than one language. The old US metric was adjusted ages ago to have exact conversions to the French metric. Everything on my Ford truck is SI. You are behind the times. And what are these huge costs to the US?

        2. They were about to replace all the speed limit signs with metric when congress realized that it was such an unpopular move that it could cost many of thme re-election.

        3. “educating a significant portion of the 330 million Americans that don’t understand metric system.”

          thats the sad part. they already DO understand it but don’t realise it… their MONEY…..

          1. Money is measured in dollar and cents, not some magical metric dreamed up by a Frenchman. Comparing money to measurements is a good argument for KEEPING the Imperial system in use. After all, if the US can use dollars, the EU can use Euros, etc etc, why can’t we do the same with measurements?

          2. To james

            Erm money is a valid point. Dollars and cents are in base 10.

            Metric is in base 10.

            There for, if you can work with money, you can work with metric.

            Now, if your money was like the old UK system, pounds shillings and pence, I’d accept your objection. But, no, it’s decimal.

        4. Understood.

          But I submit that our failure to follow the measurement system used by the rest of the world, is costing us far more than a few speed limit signs (which nobody ever looks at anyway) and some postal scales (which are probably digital and can be easily switched to grams).

          We should just bite the bullet and do it. It’s not like we have a domestic auto industry to protect any more.

          1. Define “switch”. Does that mean “compel by force of law”, and if so, who would be compelled, under what penalties, and what compliance period? Certainly if you pass a law that compels corporations or government departments, there will be a cost of compliance. Any guess as to what that might be, and who will bear the costs? It’s not enough to blithely imply that cost of compliance is no issue, that it’ll pay for itself. It’s interesting that only “a few” speed limit signs and postal scales are what enter your imagination, and not say, gaging equipment, property records, military documents, computer code, existing inventory of material stock, large equipment, to enumerate a tiny fraction of what would really need to be paid for if it was forced to happen. Road signs (though not insignificant in number) are but a tiny fraction of officially designated units that would need to be changed, and fixation on these bespeaks inexperience and limited imagination. We have an industrial ecosystem in which U.S. customary units are deeply, deeply embedded.

            Next, let’s think more closely about the supposed benefits. Corporations are in no way inhibited from manufacturing products with whole number metric units, should it be profitable to do so. Do you believe there is a vast, untapped “metric market” that has somehow gone unnoticed by those who design, fabricate, and market products for foreign trade? Or that they cannot afford to convert dimensions in documentation when necessary?

            I do not argue that there would be no benefits. I’m an engineer who has designed products in both SI and U.S. customary units (the distinction is increasingly less important with modern CAD software) and who has published papers using S.I. units, and have never felt particularly encumbered by the existence of the U.S. customary system. Occasional inconveniences result from existence of dual systems, but these are near the very bottom of the list of things that could help me better do my job.

        5. This meme is really an oversimplification and not very true. Most other countries including Canada, England, and even Japan still use not-insignificant amounts of non-metric units whether via Fiat or common usage. England still uses miles and MPH, as one example. Many countries have significant industries and products that still use non-metric units almost exclusively. Plumbing generally uses imperial standards. And several other standard measurements worldwide are imperial. Especially in electronics, many distance standards are imperial. Pin header spacing Or SMT sizes for example. I have my suspicion that computer monitors are usually sold in inches and use DPI, as I don’t think I’ve ever come across one of those listed in metric, though I can’t confirm that.

          Our metrication laws aren’t as strong as other places, however most states (48 of them) allow metric only labeling, and federal law since the 90s disallows imperial only labeling on most packaged goods.

          As usual the truth is somewhere in the gray area, however the meme that America is the last real country that doesn’t use metric is misleading at best.

          1. Car wheel diameters, still in inches everywhere. The Michelin TRX attempt at changing that in the 1980’s was a failure. Tire selections were limited and expensive, so were the wheel designs. If you happen to have a car with TRX wheels (IIRC Ford was the only company to offer TRX in the USA) Cooper Tire now makes TRX spec tires – but they’re no better than the 1980’s originals. Cooper’s primary focus is reproductions, not applying newer innovations to older vehicles. Would be interesting to see tires made for 1920’s cars using decades newer materials, construction techniques and tread designs.

        6. These “but changing to metric is so haaaaaard” posts from the U.S. make me laugh.
          Every other country [modulo the couple of exceptions noted below] has done this. Yes it wasn’t painless, but we did it. Yes the U.S. is larger than most countries, but its richer than most too, and lets face it there’s far less stuff actually being made in the U.S. anymore…

          The fact is, the U.S. has lost the ability to do the things that are difficult but necessary.

      2. Don’t fix what ain’t broken. It doesn’t matter if the street signs are in kilometers of miles, or whether you buy your beef in pounds or grams – so why should you go through all that trouble? The issues that are attributed to the USCS, like the Mars probe incident, aren’t because of the system itself but because NASA was foolish enough to use mixed measurements.

        There’s an old Russian story about a village idiot who got tired of being treated as the fool, so at the advice of a sage, he started criticizing everything and everyone. By disagreeing with everyone, other people started to doubt themselves and think the idiot might actually be on to something, and changed their opinion about him. He became the village sage, and everyone started coming to him for advice.

        In the same way, pining over the metric system is just a form of xenocentrism/elitism. People disagreeing for the sake of disagreeing, demanding change for the sake of change, because deep down they simply want to count themselves as part of the intelligentsia.

        1. “The issues that are attributed to the USCS, like the Mars probe incident, aren’t because of the system itself but because NASA was foolish enough to use mixed measurements.”

          Let’s be clear – mixing customary units and metric units isn’t the problem. Mixing *any* units is the problem. Metric literally has the same problem: they’ve got tons of units that end up being close, but not quite close enough. There’s no way you should have units that are only a factor of 10 apart, and having time units not base 10 means you’re going to have mixed units which are going to be close enough to be a problem, too. Meters/second is only a factor of 3.6 away from kilometers/hour.

          Believing that the metric system just magically shields you from having to worry about units is insane. The Mars probe was lost because 1 computer was transmitting values to another computer, and they didn’t agree on what those values were, and there were no checks on that interface beforehand. The fundamental problem has nothing to do with units. It’s a failure in interface specification.

          1. Technically speaking, “an hour” isn’t a SI unit, but a convenience unit that is merely accepted to be used alongside of the SI. The official metric unit for time is the second and nothing else.

            If you were really sticking to the Metric System as it is defined by International System of Units, you wouldn’t buy soda in liters either, you’d buy them in decimal fractions of a cubic meter, or in cubic centimeters which is equivalent to a milliliter. You wouldn’t pump your bike tires up to bars, you’d use kPa, and you wouldn’t drive your car a 100 kph but 27.777… m/s (though we could round that up to 30).

            https://en.wikipedia.org/wiki/Non-SI_units_mentioned_in_the_SI

            By and large, there would be roughly equivalent units, but they would be very inconvenient to use because you’d have to use prefixes and long names for just about anything. Many of the units that people actually use are actually convenience units, which makes the situation with the US customary units rather curious: the US is officially metric – only the convenience units are different. Arguing that you shouldn’t be using the the gallon, foot, mile, inch, pound… would be a double standard unless you also argue that you can’t use the hour, liter, bar, decibel, tonne, degree… it’s just an arbitrary choice.

          2. What actually caused the probe to crash was ignoring the people who had reported problems when there was still time to fix the error. NASA insisted that proper procedures had to always be followed, even for reporting *extremely urgent* emergencies. When someone says “HEY! I found a problem that if we don’t fix it NOW it could destroy the probe!” PAY ATTENTION and @#%@^%^ proper procedure.

            Any project as complex as a spaceflight must include some flexibility to bypass the rule book when taking the time to follow the rules will take to long to handle a problem.

    1. Maybe it’s nervous laughter? I think I remember seeing an early V2 rocket test where a photographer just barely made it out of the blast zone when the thing came back down and broke apart. They didn’t quite have the “Far enough? Nope, farther…” thing figured out yet.

  2. The “NASA uses super old stuff because they work” is really overstated at this point. It’s not like NASA says “hey we can’t use this new thing! We have to use stuff that’s ten years old!” It’s that NASA’s long mission timelines and specialized environments mean it takes that long to launch something.

    New Horizons was launched in 2006, but it was proposed around 2000-2001. And it didn’t fly a MIPS R3000 ripped from a PlayStation or something. It flew a Mongoose-V, which is a *version* of an R3000 which appeared well after 1988. Rad testing was still being done on it in 1997. It didn’t fly until 2000.

    So think about it this way: New Horizons- a *9 year flight mission*- was designed with a CPU that, at the time, had only accumulated at most a small number of years of usage in space.

    That’s not what you would consider using old, established technology. It just takes a lot of time to do radiation tests and a lot of time to build a spacecraft.

  3. Any public links for the NASA workmanship standards ? the web page seems to be unfinished, and not that complete, and the standards seems to be .. non public documents ? WTF ?!

  4. “But I’ll take solace in the idea that perhaps someday, an alien civilization will find these exquisite machines and see just what kind of engineering their makers were capable of.”

    Fast forward a couple hundred years. V’ger comes back, kills some Klingons and, if not for the heroics of a certain Starship crew, the earth would be destroyed. :)

  5. I wonder how SpaceX performs to NASA in the long run… My guess would be, it does what it is supposed to, at least eventually, but not more. After 90s the warranty is gone and the device just breaks…

  6. Just found this article. Dan refers to the Standards document used by NASA (and more specifically JPL) in building their vehicles. I was fortunate to work with the JPL folks while assigned to an Air Force project in 1991 and learned a tremendous amount about spacecraft wiring. I too learned why you spot tie wiring harnesses instead of using zip ties and do to this day. When my project was complete and the JPL folks left they gifted me with a copy of that book. I still have it and refer to it periodically just to remind myself why these folks are the best in the business.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.