Apple Kernel Code Vulnerability Affected All Devices

Another day, another vulnerability. Discovered by [Kevin Backhouse], CVE-2018-4407 is a particularly serious problem because it is present all throughout Apple’s product line, from the Macbook to the Apple Watch. The flaw is in the XNU kernel shared by all of these products.

This is a buffer overflow issue in the error handling for network packets. The kernel is expecting a fixed length of those packets but doesn’t check to prevent writing past the end of the buffer. The fact Apple’s XNU kernel powers all their products is remarkable, but issues like this are a reminder of the potential downside to that approach. Thanks to responsible disclosure, a patch was pushed out in September.

Anatomy of a Buffer Overflow

Buffer overflows aren’t new, but a reminder on what exactly is going on might be in order. In low level languages like C, the software designer is responsible for managing computer memory manually. They allocate memory, tagging a certain number of bytes for a given use. A buffer overflow is when the program writes more bytes into the memory location than are allocated, writing past the intended limit into parts of memory that are likely being used for a different purpose. In short, this overflow is written into memory that can contain other data or even executable code.

With a buffer overflow vulnerability, an attacker can write whatever code they wish to that out-of-bounds memory space, then manipulate the program to jump into that newly written code. This is referred to as arbitrary code execution. [Computerphile] has a great walk-through on buffer overflows and how they lead to code execution.

This Overflow Vulnerabilty Strikes Apple’s XNU Kernel

[Kevin] took the time to explain the issue he found in further depth. The vulnerability stems from the kernel code making an assumption about incoming packets. ICMP error messages are sent automatically in response to various network events. We’re probably most familiar with the “connection refused’ message, indicating a port closed by the firewall. These ICMP packets include the IP header of the packet that triggered the error. The XNU implementation of this process makes the assumption that the incoming packet will always have a header of the correct length, and copies that header into a buffer without first checking the length. A specially crafted packet can have a longer header, and this is the data that overflows the buffer.

Because of the role ICMP plays in communicating network status, a closed firewall isn’t enough to mitigate the attack. Even when sent to a closed port, the vulnerability can still trigger. Aside from updating to a patched OS release, the only mitigation is to run the macOS firewall in what it calls “stealth mode”. This mode doesn’t respond to pings, and more importantly, silently drops packets rather than sending ICMP error responses. This mitigation isn’t possible for watchOS and iOS devices.

The good news about the vulnerability is that a packet, malformed in this way, has little chance of being passed through a router at all. An attacker must be on the same physical network in order to send the malicious packet. The most likely attack vector, then, is the public WiFi at the local coffee shop.

Come back after the break for a demonstration of this attack in action.

So far, the vulnerability is only known to crash machines, as seen above. Because of the nature of the problem, it’s likely that this vulnerability will eventually be turned into a full code execution exploit. [Kevin] informed Apple of the issue privately, and they fixed the issue in September updates of macOS and iOS.

81 thoughts on “Apple Kernel Code Vulnerability Affected All Devices

  1. >>> So far, the vulnerability is only known to crash machines, as seen above
    If you can follow an exact sequence of events that will cause an OS to crash, that is the foot in the door to owning that machine. Depending on the OS it should get far more complex, you then probably need to predict the address space layout randomization (ASLR). But provided there is enough entropy in the randomness pool at boot time, that should be extremely difficult.

      1. In paper yes, but on an embedded system without a hardware source of high quality entropy the first few pages on initial power on, or even keys generated at the very first power on when the device is being installed can have issues.

    1. In this specific case the reason the machine’s crashing is because he literally is destroying the heap. There’s almost certainly a way to get remote code execution using this, which is why they’re calling it an RCE bug.

    2. i doubt that you can get into the machine because once a panic happens the cpu can no longer execute instructions

      easy way to find out is if you can have a movie or an mp3 playing and crash it the movie or mp3 stops.

      apple designed that panic explicitly to ensure the mac gets rebooted.

      in the old days on classic a type 11 system error was the equiv. and pushing the debugger button and entering g finder or go finder would allow you to get back to the desktop .

      by having the panic the way it is there is no way to break out of the panic and continue working the session was finished and you had to force a restart.

      1. Maybe my wording was unclear. I did not say that after a panic you could control the machine. What I was trying to say is that a panic is a indication that with the right sequence of bytes the panic can be avoided and the machine can be taken over. A panic is a indication that the OS has be jumped off the predicted railway tracks, it indicates that it should be possible to jump the train on to a new track before it crashes(panics).

  2. I don’t understand why we’re still using C-style pointers and buffers. Haven’t we learned, through decades of vulnerabilities mostly based on this same inane programming practice, that the performance gains aren’t worth it? Maybe they were in the era of wire-wrapped backplanes clocked at sub-MHz speeds, when nothing important ran on computers.

    There are languages (most of us learned on one) which explicitly store the length of a string as a separate value, and explicitly check that it fits somewhere before trying to put it there. When better methods exist, what’s making us cling to the bad methods?

    This whole thread applies:

    And I have it in my head that there was another article in the last few months, about programming languages being intentionally elitist, intentionally dangerous, intentionally obtuse, to feed programmers’ convictions that they are exceptional and powerful and can avoid all these traps that the language sets for them. I can’t find that link right now. Anyone?

    1. its not the language that is the problem its poor coders. The coders they pump out today are mostly useless.
      They should make it mandatory in schools to learn assembly first and code in that well, then and only then move them onto other languages. Then and only then do you truly understand how to write effective code.

      1. No.

        If the languages are so dangerous that human coders are incapable of writing good code, then the languages are not appropriate for humans to use. You’re demanding superhuman coders, which isn’t realistic.

        We’ve proven, over decades of experience and hundreds of thousands of security vulnerabilities, that superhuman coders are not realistic. Stop insisting that we just need better coders, and fix the languages instead.

          1. For those who are curious about a summary of his answer, his answer is basically “yes, Rust could do this but you lose the safety benefits.” But once you have that “escape hatch” (as he put it) you still have the overall problem: a coder has to know when to use the escape hatch, how to use it properly, and when it can be avoided. And that *still* is going to lead to subtle bugs. The problem is always going to be in the coder whenever you’re in ‘bare metal’ land.

            Of course… it’s even *worse* than all of that because at the bare metal level, *the processor isn’t even the only one accessing the buffers*. You could have a buffer overflow interaction leveraging a DMA engine, for instance, and no amount of language checking will ever help you there. Device access controls help somewhat there in controlling the attack surface, but it’s super hard to predict what smart hackers will manage to figure out.

        1. I think his point was that some programmers have no clue of the effect of their code and thus lead to unoptimized code.
          It’s like if you try to learn to a physicist physics without the math behind it. Sure he can learn it in the approximated way that physics work with, and he don’t need on a day to day basis the rigid math behind his formula, but it can help to understand the tool you are using.

          Every guy I know/knew that was really good at something, could explain me in detail how everything worked from top to bottom. I don’t say you have to be an expert in assembly to program any other language, that is another job, but know at least what you are doing. Don’t look at a computer as if it was a black box where you throw your program in and wait of a response.

          As for everything, there is an history behind the language and due to retro compatibility we always built on top of the last known technology, even if it is outdated, so there is probably room for improvement in this area too, but my knowledge are insufficient to have any opinion.

        2. “If the languages are so dangerous that human coders are incapable of writing good code, then the languages are not appropriate for humans to use. You’re demanding superhuman coders, which isn’t realistic.”…

          No, you don’t need “superhuman coders”. Look about you there is so much stuff out there that doesn’t crash and works well. This hasn’t been written by aliens. The problem is that good programmers only become good with a lot of dedication. They need to study, practice and be disciplined. Just learning to write a 10 line program in BASIC doesn’t make you (one) a programmer. Managers tend to treat programmers as interchangeable cogs so there is no incentive to do the job right – just hack it and get it done.

          “Stop insisting that we just need better coders, and fix the languages instead.”…

          It really isn’t the language that is the problem. Often (especially these days) programmers get a spec and just don’t get the big picture. The buffer overrun problem that this article is about focuses on the “overrun” and totally misses the fact that the original programmer didn’t even consider that it was possible to get corrupt packets when processing ICMP packets. The correct fix here is not fix this “one” problem but to ensure that such corrupted packets can’t get through in the first place.

          1. “The buffer overrun problem that this article is about focuses on the “overrun” and totally misses the fact that the original programmer didn’t even consider that it was possible to get corrupt packets when processing ICMP packets. ”

            Not trusting what one doesn’t control. The outside world is hostile. Paranoid programming mode on.

          2. Yeah but if you’ve got lots of programmers working on a system, where do you put the protection? You can’t expect every function to run every passed variable past a series of checks. It would be hugely wasteful. You just need someone in charge to decide to put the safety checks in at the appropriate places, and for everyone to what they can rely on.

            A lot of programming problems are really management problems, once you’re talking about medium-sized systems. I’m sure for insanely large stuff like phone networks they have their own entire paradigms. The US military originally invented ADA as a response to the risks of programming problems. Including actual software bugs, but not just that.

            This is one reason Linux is such a pain in the arse. The people doing the coding on systems don’t communicate nearly enough, and are usually managed completely ad-hoc.

            That said, important stuff like OSes should be left to superhuman programmers. And every one of them should speak asm. Though if you’re doing OS stuff I can’t imagine you wouldn’t know it. There’s a huge amount of common ground.

        3. People are not incapable, they are just not willing to learn how to design and avoid said pitfalls.
          We have perfect languages for writing and speaking, people abuse those also.

      2. This is more market forces than coders. Sure there are some crap coders out there. There are also some very good coders.

        It’s upper management that knows nothing about code that chooses who get the job and they aren’t willing to pay a little extra for quality code.

        1. To compound that we get ‘agile’ development cycles or other styles that focus on being first past the post while ignoring that their product is in flames as it passes the post.
          If you don’t have the time to do it right when will you have the time to do it again.

      3. Because overflows don’t exist in anything coded in assembly. Back in your day, when men were real men and real programs were punched out of cardboard, nothing ever went wrong and software was perfectly secure. Coding one instruction at a time makes it impossible to forget a boundary check, misunderstand a specification, or cut corners to rush something out the door.

        Sure, bud. Sure.

        1. Sure you can fuck up in asm, but a person who knows asm has a good understanding of systems and how it all fits together, how everything works. How everything *really* works!

      4. Agreed to point many of the newest generation have little to no understanding of what’s happening ona low level.
        But asm first that would be a very steep learning curve maybe go back to the 80s basic, then c and some asm and followed by moving to OO languages.
        Java and Javascript should never be first languages as they teach too many bad habits such as using an entire browser engine for a simple UI.

    2. “I don’t understand why we’re still using C-style pointers and buffers. Haven’t we learned, through decades of vulnerabilities mostly based on this same inane programming practice, that the performance gains aren’t worth it?”…

      Some stuff can be written using a very inefficient language that generates very slow executables and the end user doesn’t care. But other stuff is so fundamental and so heavily used that it needs to run as efficiently as possible. How would you feel if someone said “hey from tomorrow you wont be able to use the internet unless you upgrade you PC to one that runs 100 times faster”?

      1. ” How would you feel if someone said “hey from tomorrow you wont be able to use the internet unless you upgrade you PC to one that runs 100 times faster”?”

        Or stop using XP. ;-)

    3. “I don’t understand why we’re still using C-style pointers and buffers.”

      Ooh! Ooh! I know why!

      Because that’s how the processor *actually works*.

      Think about what you’re saying: yes, there are languages that check lengths and try to protect against things, but fundamentally *they* then need to convert those language primitives into addresses, buffers, etc. Which means the risk of a buffer overflow will always exist – if you protect against it in the language, the risk then moves into the compiler/runtime of the programming language. You’re not eliminating the attack surface, you’re just moving it.

      Now, you might say “yes, but that now means we only need to harden *1* thing – the compiler/runtime of the language,” which is true. But because the compiler/language would then need to become *everything* the computer does, it’d have to have tons of performance optimizations and differing prototypes. Casts, copies, indexing – all of those things could be *very* expensive performance-wise (especially for small objects, like network packets) without pointers.

      And once you have lots of “stupid code tricks” for manipulating things, there’s just no way that the language can protect against all the combinations by design. You’re just going to hit the same problem again.

      Your hypothetical “magic language” is presenting an *abstraction* of the way the code should run, but it won’t actually run that way. That disconnect will generate an attack surface, and it will be *very* hard to find because it’s an interaction between two black boxes: the code and the language. That’s the same reason why Spectre and Meltdown existed, and why so many other super-subtle bugs exist (like Rowhammer, etc.). Hiding the way the computer actually works from the programmer doesn’t protect against bugs, it just makes them harder to find.

      Note that I’m not arguing against “safer” languages. You just don’t want to use them everywhere. Kernels, specifically, can be super-dangerous places for languages like that, because they’re a layer that’s present for *every* code that runs on the device. So adding another black box (the language/runtime) to the system is another attack surface for *everything*.

        1. Makes a strong argument for verified and clean libraries that can be shared regardless of license (BSD ;-)). A network stack (as ubiquitous as they are) that has been gone over carefully, and changed infrequently (less “oh shiny” getting in).

          1. To be honest, I think actually the problem with network stacks is the fact that for *most* cases, we really should just be using dedicated hardware for protocol handling entirely. Imagine an ASIC that handles (only) TCP, ICMP, UDP, ARP, DHCP, etc. For the vast majority of consumers that would be completely fine (and then you could also have raw packet forwarding and a software stack for the few people who need custom protocols). If you’ve ever used a WIZnet IC for interfacing with a microcontroller, basically something like that on steroids.

            This isn’t a new thing, obviously, people have been thinking about a TCP offload engine for years for performance reasons, but I think it could have significant security benefits as well: again, it’s just the removal of an attack surface. You might find a bug in the ASIC that allows you to mess it up (but not gain control) with a corrupted packet, but the system remains unaffected (obviously you wouldn’t want a dedicated processor for this, that just moves the attack surface). TCP offload engines died an ignominious death because, well… pretty much everyone implemented it badly.

            The problem, of course, is that at this point the barrier to entry for something like that is so high that it basically has no chance (although who knows, it could possibly win on power savings for mobile devices if done properly).

          2. The magic in an ASIC implementation is that there’s nothing you can do *with* a bug. It wouldn’t be a generic processor. It might not operate quite right, but it can’t cause a security issue. You can’t execute code.

            I’ve got a UDP implementation for an FPGA which presents verified packets to outbound ports. Is it totally bug free? No, probably not: but there’s literally nothing that a bad packet can do to the downstream processors. Worst thing that would happen is the thing locks up, and a few microseconds later, it resets itself.

    4. I’ll start with this, most of our OS kernels are really old. They were written in the days before even the ISO9899:1999. In these times, there was not anything better.
      IIRC, the Apple kernel traces its linage through the BSD kernel which was first released in 1977.
      Linux got started in 1991.
      It’s hard to say how much of the Windows 10 kernel is still based on Windows NT, but consider that was released in 1993.

      The two big new OS kernels are Haiku and Fuchsia. Both look to be C++ after taking a peek at their repos, and the Wikipedia article on Fuchsia make it sound like it shares a lineage with Haiku (though not a fork).

      Besides that, there are still some good reasons to use C as it provides a reasonable alternative to assembly. The compiler could do more and it in fact can do more in a number of cases. Part of the issue is that legacy code thing and the sheer amount of code out there.
      But that would not have helped in the case of the exploit disclosed here because the length field being checked was being provided from outside the program.

      Could the compiler create runtime checks? Sure and I have seen compilers with a debug mode that generates runtime checks that could catch issues like this. Those checks significantly impact runtime and that performance really does matter for the kernel. Just look at the impact mitigating Meltdown and Specter have at the kernel level.

      Something else that can help are using coding guidelines such as MISRA C and SEI CERT in concert with static analysis tools to help catch errors in the code.

      As for why C, there are still valid reasons that usually don’t become apparent until someone has tried to do low level work in a higher level language.

    5. We are talking about a kernel driver. IMHO having a bug here is not too different from saying, oops, in my high end compiler that explicitly checks the length of strings forgets to check the length of a string destination in some weird scenario. The carelessness which allowed such a bug to go through should not have happened in the first place.

      Second of all, on the kernel layer (especially for networking), it’s likely there are actual genuine performance hits when you store information you don’t need to store or do boundary checks which you don’t need to do (perhaps because your structure when designed correctly does not require them). In fact, doing some of this unnecessary checking may in some instances worsen or allow for DoS attacks, especially when you’re talking about a networking stack. You probably want a language with the flexibility to force the compiler to not always operate with these constructs – and once your language has the flexibility to use pointers, well, it’s about as dangerous as using C if you’re not careful, so is it really worth it?

      I think if anyone who’s written something in a high level language and found the performance lacking (probably not a lot of web applications or scripts) must understand that these “unsafe” constructs are not entirely inane. More often than not the answer is a rewrite in a language with those unsafe, unchecked constructs accessible. when appropriate. This certainly applies in a networking stack where taking extra time could mean increased vulnerability to perhaps some DoS attacks, slower performance (when it needs to be compatible with even the fastest server networking interfaces), and reduced battery life in portable devices. These losses can be pretty nontrivial.

      Ultimately, I don’t think this is a simple problem and it really isn’t people being elitist about their code. A sufficiently good language and compiler do not exist yet where you can communicate exactly what you want to do in a way that allows compilers to know how to perfectly optimize that code in a meaningful way on most platforms. An approach some people take is to use an inherently unsafe language like C, and use code analysis tools and perhaps extra-language markers to try to figure out when it’s doing genuinely unsafe tasks.

    6. “Haven’t we learned, through decades of vulnerabilities mostly based on this same inane programming practice, that the performance gains aren’t worth it?”

      Maybe TVTropes should have a “Coding Tropes” section? Buffer Overflows certainly qualifies.

  3. Seriously? An overflow vulnerability in this day and age? Have the coders been given no training on how to handle external data? Are there no scrutiny checks for data handling?

    In the ’90s age of naivety it would be reasonable, but today *on a network stack* that’s just bad practice and bad processes.

    1. Having been working in an environment adopting MISRA C and SEI CERT coding guidelines (along with static analysis tools), I agree.
      Just allocating a local for taking data in from an external interface is a terrible idea.

    2. Terrible code, it’s true. Then again, on the hardware side I’ve been complaining about the lack of Decoupling capacitors for 20 years. Everyone either hears the warning, or makes the mistake. New coders will repeat old mistakes. One thing that may have changed, is that in the past you spent more time with your project. Errors that occurred were cleaned up by those that made them. Nowadays it’s rent a coder, dump a coder, and when the bug finally rears it’s head, some new guy will be sent in to clean it up. The best learned lessons are when we clean up after ourselves.

    1. I once worked with a guy that was clearly having problems with his project. there was cursing, jumping up out of his chair and storming off, there was slamming of notebooks on desks. after a few hours of this I offered to help. he had been trying to debug some code he had written using a system monitor. he explained that every time he uploaded his executable it would behave strangely. I asked him to show me exactly what he was doing and we went through the whole process. built the code, uploaded it, entered some test data, ran it, inspected the RAM. “there” he said “see, it should be this…” “Ok” I said “lets check your test data” and sure enough the test data had mysteriously changed. Now the funny thing was that it was out by what looked to be the difference between ASCII upper and lower case. So I suggested there might be a problem with the system monitor and it might be touchy about which case hex was entered in. It took this guy three days to finally declare to the world that he had fixed the problem. And guess what, it was exactly as I had predicted…

      I must admit I’m kind of glad when I see a backseat dev posting on a forum – at least it means he’s taking a break from hacking code (and I mean that in the most derogatory sense possible).

  4. I think this shows why it’s probably not a good idea to use common code base for everything.
    Maybe if the Apple watch had a super efficient OS like QNX or TRON it might actually have a useful battery life.

    1. While I’m a fan of microkernel in general and QNX in particular that’s not exactly true – QNX have higher overheads than many other systems by design. It uses synchronous message passing with data copying so normal communication have a context switch and copying overheads (sender -> kernel copy -> receiver), a monolithic system can often avoid copying data and generally use a cheaper user kernel mode switch.

  5. what nobody mentions is how a hijacked account that takes over a device is easily able to set up a pairing and instigate a network to which becomes shared thus making many of the items you all mention very easy to execute.

    im no dev or programer but this is exactly what i am subject too and ironically came across the reporter just the other day, to also seems to be in collabaration with one possible party responsible

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.