Fail of the Week: NASA Edition

There’s a reason we often use the phrase “It ain’t Rocket Science”. Because real rocket science IS difficult. It is dangerous and complicated, and a lot of things can and do go wrong, often with disastrous consequences. It is imperative that the lessons learned from past failures must be documented and disseminated to prevent future mishaps. This is much easier said than done. There’s a large number of agencies and laboratories working on multiple projects over long periods of time. Which is why NASA has set up NASA Lessons Learned — a central, online database of issues documented by contributors from within NASA as well as other organizations.

The system is managed by a steering committee consisting of members from all NASA centers. Public access is limited to a summary of the original driving event, lessons learned and recommendations. But even this information can be quite useful for common folks. For example, this lesson on Guidance for NASA Selection & Application of DC-DC Converters contains several bits of useful wisdom. Or this one about IC’s being damaged due to capacitor residual discharge during assembly. If you ever need to add a conformal coating to your hardware, check how Glass Cased Components Fractured as a Result of Shrinkage in Coating/Bonding Materials Applied in Excessive Amounts. Finally, something we have all experienced when working with polarized components — Reverse Polarity Concerns With Tantalum Capacitors. Here is a more specific Technical Note on polarized capacitors (pdf): Preventing Incorrect Installation of Polarized Capacitors.

Unfortunately, all of this body of past knowledge is sometimes still not enough to prevent problems. Case in point is a recently discovered issue on the ISS — a completely avoidable power supply mistake. Science payloads attach to the ISS via holders called the ExPRESS logistics carriers. These provide mechanical anchoring, electrical power and data links. Inside the carriers, the power supply meant to supply 28V to the payloads was found to have a few capacitors mounted the other way around. This has forced the payloads to use the 120V supply instead, requiring them to have an additional 120V to 28V converter retrofit. This means modifying the existing hardware and factoring in additional weight, volume, heat, cost and other issues when adding the extra converter. If you’d like to dig into the details, check out this article about NASA’s power supply fail.

Thanks to [Jarek] for tipping us about this.

34 thoughts on “Fail of the Week: NASA Edition

    1. The capacitors were originally supposed to be nonpolarized, but the designs were later changed. It ended up being assembled with the proper capacitors but with the old design. Also, the capacitors haven’t failed yet. They only figured out there was a problem when the stand in on earth failed, and found out they only deteriorate above 77f, which they only reach in space under heavy loads. That is why they are using the 28V supply as little as possible.

      1. Yes the test stand on earth failed… during testing while they already had a bunch in space. They should have tested them thoroughly before launching any… .which apparently they didn’t as 77F is basically room temp on earth.

        I’m aware how these things happen… It’s happened to me… but it shouldn’t have made it into production with adequate testing…. also the person that designed the original design with non polar capacitors probably did so to avoid anything like this happening…

          1. This is actually paricullary why I try to avoid them wherever I can. I’ve had more than a few Tantalum caps blow up on me because I neglected to mark the polarity on the PCB and just assume what the orientation is…and well you know what they say about assuming things ;)

    2. There’s lots and lots and lots and lots, many thousands, of things need to be done correctly for a space mission to succeed. Only takes one or two things going wrong to break it. And it’s mostly done and planned by humans, whose brains really aren’t optimised for this sort of thing. So it’s impressive things go so well so often.

    3. In the text was written, that the capacitors are so heavily derated, that they survived this condition for 100s or 100s of hours. So they easily survived the basic powerup tests.

      1. Then obviously a test should have been designed to stress the PSU… I mean seriously. Accelerated testing is a thing.. run it well above normal heat limits and at maximum output… and there you go it probably would have failed within a day. Powering it up and then turning it right off would be a “china export” level of testing… this is NASA we are talking about.

  1. There is a similar system run by NASA for self-reporting aviation incidents. The ASRS (Aviation Safety Reporting System) is handled by NASA rather than the FAA to allay fears that reports will be used in enforcing violationn. NASA claims that confidentiality has been maintained for over a million reports and the data helps highlight problems and areas of concern.

      1. I once asked my Army battalion for assistance in setting up and maintaining a comm shack, soething I had no experience doing. What I got was a senior NCO visiting and writing a “gig list” of every thing in the shack that didn’t conform. And I was later chewed out by my platoon leader for the gigs.

    1. The BIG one where the satellite was bolted to the test stand adapter? But the adapter wasn’t attached attached to the stand? It’s a good thing some manager got that done over the weekend, with people who never did the job before, without going over the procedure or making sure there was one. Nothing like a real go-getter.

      1. What happened was people working on another project needed some bolts, and the closest source was the tilt table under the NOAA N-PRIME.

        #1 fail. Borrowing hardware from another team’s project. #2 fail. Not leaving a “Hey, we borrowed your bolts.” note. #3 fail. N-PRIME team not checking to see if anyone had borrowed anything before tilting the table.

        The takeaway from that incident is never move anything without first checking to make sure all is as it should be, even if you know there’s no reason for someone to remove essential pieces from it without leaving a note.

        What they needed was strict compartmentalization of tools, tooling, parts etc where absolutely nothing on one project gets touched by anyone working on another project without notification and verification.

        Fortunately it was the final satellite of its series and Lockheed Martin had many extra pieces, needing to make only a few new parts to replace damaged ones. I assume they made extras in case of failures during testing.

    1. Skylab had a fairly large-scale fuckup on launch, but they did an amazing job of fixing it. One problem was a lot of the insulation was ripped off. They came up with solutions on the ground (first a parasol, then a sort of blanket), and the astronauts went EVA to install them. Long, exhausting EVA, and I think the first EVA to actually achieve something useful.

      Skylab was a massive success. Especially when the whole thing started because they had a few extra Saturn V’s and Apollo vehicles hanging around, and were trying to think of a use to put them to. Skylab itself was built from a Saturn V rocket’s third stage, converted into a habitat. It did some amazing stuff. Including the first occurence of astronauts going on strike, in space, for several days. THAT taught them a lot of useful stuff on the psychological side.

      Skylab went wrong on launch but what a recovery!

      1. Particularly you wouldn’t want tiny drops of solder floating around the place. Eventually, as Skylab learned, all lost objects turn up stuck on the air intake filters. But a little particle of solder, even cooled, could get into someone’s eye or lungs or some electronic circuit. You’d have an easier job smuggling cocaine up than you would a soldering iron.

  2. They for sure spent a lot of time testing the 110-28 converters, or did the team go out and dug up silicon to make their own transistors for it? ;-) From the article in Wired:

    “In just 11 months, the team developed a device that could convert the ISS-supplied 120 volts to 28 volts

  3. I have to give NASA credit, most engineering organizations don’t want to air their dirty laundry. They have at least formalized the first part of the process (implementation is the next part). In smaller organizations it is a face to face, one on one informal process. NASA is far to big (both number of people and sites) for this to work. I highly recommend the Apollo 13 accident report for a good read.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s