Teardown Of Boeing 777 Cabin Pressure Control System

Modern passenger airliners are essentially tubes-with-wings, they just happen to be tubes that are stuffed full with fancy electronics. Some of the most important of these are related to keeping the bits of the tube with humans inside it at temperatures and pressures that keeps them alive and happy. Case in point the Boeing 777, of which [Michel] of Le Labo de Michel on YouTube recently obtained the Cabin Pressure Control System (CPCS) for a teardown.

The crucial parts on the system are the two Nord-Micro C0002 piezo resistive pressure transducers, which measure the pressure inside the aircraft. These sensors, one of which is marked as ‘backup’, are read out by multiple ADCs connected to a couple of FPGAs. The system further has an ARINC 429 transceiver, for communicating with the other avionics components. Naturally the multiple PCBs are conformally coated and with vibration-proof interconnects.

Although it may seem like a lot of hardware just to measure air pressure with, this kind of hardware is meant to work without errors over the span of years, meaning significant amounts of redundancy and error checking has to be built-in. Tragic accidents such as Helios Airways Flight 522 involving a 737-300 highlight the importance of these systems. Although in that case human error had disabled the cabin pressurization, it shows just how hard it can be to detect hypoxia before it is too late.

24 thoughts on “Teardown Of Boeing 777 Cabin Pressure Control System

  1. I wrote the software for a different brand of cabin pressurization system on large aircraft.

    It had two pressure sensors, one for inside the aircraft and one for outside. A pressure “valve” of sorts sat between the cabin and the outside and adjusted the cabin pressure. Essentially it was a balloon in a hole the size of a cup saucer, and the pressure in the balloon would seek to be 1/2 the pressure difference between inside and outside. If you want cabin pressure to go down deflate the balloon a little, and vice versa.

    Cabin pressure is reduced to an equivalent 10,000 ft while in flight. This reduces stress on the airframe with acceptable comfort for the passengers.

    (This is why babies cry on an aircraft: the pressure goes down and they haven’t yet learned how to clear their eustachian tubes, so their ears ache.)

    The algorithm for cabin pressure is fairly complex, it needs to gently lower the pressure as the aircraft rises (at a rate of less than 200 ft/min for comfort), then raise it up again during final approach and landing, and throw the valve completely open while on the ground. Given a target pressure, a PID loop controls both the current pressure and the velocity of change.

    Add to this the need for hysteresis and note that some airports are more than 10,000 feet AGL. Could be source or destination airport, or both.

    I was told that if the cabin is depressurized (at 40,000 ft) you go unconscious within 15 seconds. This is a different mechanism from holding your breath, so even if you can hold your breath for two minutes it won’t help. This is why you put the mask over your own face before attending to passengers. Also, I’ve been told that the masks keep you alive, but will not keep you conscious.

    If passengers exit the cabin (at 40,000 ft) they go unconscious quickly, and tumble as they fall, with the result that most of their clothing is blown off. The bodies are found in various states of undress, including completely nude.

    Enjoy your afternoon :-)

      1. The rule is that you have to test (by executing) each and every line of code explicitly, and the function of every line of code has to be traceable back to one of the requirements documents.

        You cannot have “dead code”, which is code that cannot be executed. For example, the stdio library, printf() has formatting modes for floating point and pointers, which are typically not needed, and would result in dead code in that library, so you can’t use it. Any print statements have to be serviced by code that you write.

        The reasoning is that if a glitch happened and code starts executing somewhere, it has to be code required for the device. All legitimate functions are heavily guardrailed and any deviation from correct behaviour will trigger a fault of some type, which will either reboot the system or lock it into a fault mode. The crew is trained to deal with devices that go offline, and as my manager once pointed out: an altimeter that goes blank is much safer than an altimeter that freezes, or an altimeter that shows the wrong reading.

        Point of reference: the Therac-25 failed in exactly this way: due to a series of errors, execution jumped into a calibration mode that was supposed to close the shields and perform a series of power measurements. The shields weren’t closed, and the system burned completely through a patient due to the error.

        https://en.wikipedia.org/wiki/Therac-25

        All subroutines in the cabin pressurization software check their assumptions at every step, including that their called parameters are in range, that the loop never drops through, that execution never gets to specific points, and so on. This was the lesson of the Therac: If the shields are not closed before you actually turn the power on, you fail hard.

        At that time (late 90’s) we couldn’t use C++ because it’s too easy to hide the execution path using overrides and such, but I believe that’s been changed. We also didn’t use any realtime libraries, because again you don’t have the source code to validate.

        And apropos of nothing, the system had a lot of spare CPU time, so it used that to constantly check for stack overflow, random memory overwrites, calculation consistency, and all the hardware register settings. For example, unused memory was filled with a pattern, and this pattern was checked continuously during execution. No thrown pointer error could change anything and get away with it. For another example, all enums in the system used different ranges (enum { start = 100, … }) so that an out-of-range value in the enum could be detected. Zero, one, and minus one are not valid values anywhere.

        1. Thanks for the really interesting comments – engineering of critical stuff like this is fascinating and really shows just how many failure modes & edge cases there can be, and often this stuff is written in blood from bitter experience and failures that “shouldn’t” have been possible.

          I wonder, with your background, what your take is on self-driving cars where the approach seems to be far more relaxed?

    1. Cabin pressure altitude is kept closer to 2400m (8000ft) and 1800m (6000ft) in more comfortable aircraft.

      Altitude sickness can kick in above 2500m and with many flights over 6 hours people would get sick even though they are inactive if the cabin pressure was that low.

      That surprises people but one of the most noticeable effects of altitude sickness is the impact on sleep. Not just while you’re climbing!

      Fascinating stuff though, do you know much (and can you tell us) about software/hardware safety features?

      1. I can tell you all about it, none of the safety features is proprietary. I’ve put some in a post above.

        The watchdog timer makes sure that the program is not caught in an infinite loop. The code has to periodically “feed” the watchdog (put a value in a register), this resets the timer. If the timer ever counts down, your code is caught in a loop so the watchdog resets the system.

        The watchdog timer on the micro didn’t have enough reliability for the FAA, so the company used an external watchdog timer chip on the board. This didn’t have enough reliability either, so it had to be tested at boot, so the boot sequence set a pattern in memory, waited until the watchdog rebooted the system, then erased the pattern and continued.

        When you test the watchdog as part of boot the MTBF becomes the MTBF of the timer chip times the MTBF of the processor chip (both have to fail to be a problem).

        FAA regulations come in “levels” of A, B, or C. Level A would be something highly critical, such as the audible stall alert, level C is something like the cabin pressurization that would cause an emergency landing, and level E is the microwave in the galley. Different levels get different levels of scrutiny.

        The hardware of the cabin pressurization system was level B, but the software was level C (IIRC – it’s been awhile). The hardware had two of the above-mentioned valves for redundency, and was overspec’d in just about every way possible. Software couldn’t use more than 85% of CPU time or memory (we had to measure this) and so on.

        The testing team was completely separate from the lead coder, they didn’t talk to me at all. They had to set a breakpoint and explicitly trigger the execution of every line of code, looking for conformance to the requirements doc and dead code.

        There were a couple of instances of dead code that were deemed “OK” by analysis. I hooked the “unexpected interrupt” vector as well as all the I/O vectors that weren’t being used and routed it directly to a FATAL() call. The hardware “unexpected interrupt” vector is nigh impossible to trigger :-)

        Basically, what you do is sit around a table and brainstorm ways that the program can go wrong, as well as your own experience of how programs have gone wrong in the past, then write code to detect those cases. Look for thrown pointers in memory, look for overwritten stack, keep track of CPU time, check assumptions everywhere, recheck values and registers that should never change, and so on.

        The program was extremely “stiff”, in that it could not take even a single step out of line without being caught.

        1. Interesting, I work in a totally unrelated area of firmware and this is all very familiar. We go a bit further than these methods, to get better redundancy. I’m a bit surprised that there’s not a pair of redundant systems. Also, how do the valves fail safe? If one valve fails open then having another doesn’t help, surely?

    2. One major problem is that you simply can’t hold your breath very long during a rapid depressurization event without exhaling, or at least you absolutely don’t want to – the internal pressure of the gas in your lungs will cause serious internal trauma if you don’t equalize it with the outside pressure within a fraction of a second.

      I’m pretty sure there are others, as you allude to.

      It’s also very cold up there.

      1. Yes! The max difference could be about 1atm in your chest to 0atm around you (literally in space).

        That’s equivalent from going at 10m below water to the surface instantly. I believe the proper ascent is to take 10m (1m/min). With a pause midway

        1. You can easily dive to the bottom of a 10 meter deep swimming pool and ascend up again instantly without any issues.
          Decompression sickness from high altitude usually only happens above 7,500 m (24,600 ft)

          1. I did check and apparently 10m descents can be done in a minute.

            However, your example is a totally different matter. The issue isn’t can you go down to 10m under water and return to the surface instantly that is certainly fine.

            The issue is can you breath air pressurized to 2atm and then surface to 1atm holding it in. Certainly that’s a bad idea!

            When you dive down to 10m in a pool you take air at 1atm (the surface) down to 2atm(compressing it) then and returning it to the volume you originally took in at the surface when you come up.

            If on the other than you take a lungful at 2atm and hold it until you get to 1atm that air will expand and cause injury.

        2. What you describe is a simple “bounce” that requires no pause.

          If you were staying down for an hour on scuba at 10m, you would take a 3m pause at 3m if conditions permit. Older guidelines didn’t require any pause from that depth for less than about three hours.

          The bigger issue is what happens if you hold your breath after breathing compressed air while submerged. It expands on the way up, and even only a foot or two can cause mild damage. Without scuba, the bounce can be done without exhaling.

          If a plane depressurizes, it can cause damage if it happens quickly while holding your breath. Fortunately, most people gasp in such an event, and any opening of the airway will save you, even if you are attempting to breath in.

          The person who will experience trouble is the one who lowers their chin to their chest and holds their breath. That closes the airway so hard that damage from decompression is likely.

          I’m a former editor of the NASA life support journal.

    1. Airbus and Boeing have approx the same numbers of issues, crashes.

      Also MCAS, was not a problem, pilots were in the end the problem.

      People should stop spreading uneducated information, even when it is supposed to be “funny”.

      1. Where did you get the “information” that ” MCAS was not a problem, pilots were”? Absolute nonsense that statement. Greed ( hiding MCAS from the customers) and bad engineering (no sufficient redundancy in probes for starters) were the main causes.

      2. Roeland Jansen made a quite absurd statement. You can (probably) read official documentation, but Mentour Pilot is a quite good channel about aviation accidents.

        And in the meantime, I wonder if much changed at boeing. In the end, the management cutting corners and valuing “shareholders” above their core business was the main cause.

        I think the proper solution would have been to fine boeing into bankruptcy, then replace the management, and put the fined money back into boeing to make a restart. But I don’t have the overview / insight to think this through completely. Such a drastic measurement would surely have some nasty side effects that have to be managed carefully.

        But as we all know, in the USA, rich people always protect each other, or they weasel out by bribing some key person or whatever.

  2. Interesting. I know it’s a much more modern aircraft, and commercial, but looking at the C-130 equivalent, I think it’s all mechanical/pneumatic, no electronic control (I think). I wonder if electronic or mechanical ends up being more reliable.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.