The big news this week was that OpenSSH has an unauthorized Remote Code Execution exploit. Or more precisely, it had one that was fixed in 2006, that was unintentionally re-introduced in version 8.5p1 from 2021. The flaw is a signal handler race condition, where async-unsafe code gets called from within the SIGALARM handler. What does that mean?
To understand, we have to dive into the world of Linux signal handling. Signals are sent by the operating system, to individual processes, to notify the process of a state change. For example SIGHUP
, or SIGnal HangUP, originally indicated the disconnect of the terminal’s serial line where a program was running. SIGALRM
is the SIGnal ALaRM, which indicates that a timer has expired.
What’s interesting about signal handling in Unix is how it interrupts program execution. The OS has complete control over execution scheduling, so in response to a signal, the scheduler pauses execution and immediately handles the signal. If no signal handler function is defined, that means a default handler provided by the OS. But if the handler is set, that function is immediately run. And here’s the dangerous part. Program execution can be anywhere in the program, when it gets paused, the signal handler run, and then execution continues. From Andries Brouwer in The Linux Kernel:
It is difficult to do interesting things in a signal handler, because the process can be interrupted in an arbitrary place, data structures can be in arbitrary state, etc. The three most common things to do in a signal handler are (i) set a flag variable and return immediately, and (ii) (messy) throw away all the program was doing, and restart at some convenient point, perhaps the main command loop or so, and (iii) clean up and exit.
The term async-signal-safe describes functions that have predictable behavior even when called from a signal handler, with execution paused at an arbitrary state. How can such a function be unsafe? Let’s consider the async-signal-unsafe free()
. Here, sections of memory are marked free, and then pointers to that memory are added to the table of free memory. If program execution is interrupted between these points, we have an undefined state where memory is both free, and still allocated. A second call to free() during execution pause will corrupt the free memory data structure, as the code is not intended to be called in this reentrant manner.
So back to the OpenSSH flaw. The SSH daemon sets a timer when a new connection comes in, and if the authentication hasn’t completed, the SIGALRM signal is generated when the timer expires. The problem is that this signal handler uses the syslog() system call, which is not an async-safe function, due to inclusion of malloc()
and free()
system calls. The trick is start an SSH connection, wait for the timeout, and send the last bytes of a public-key packet just before the timeout signal fires. If the public-key handling function just happens to be at the correct point in a malloc()
call, when the SIGALRM handler reenters malloc()
, the heap is corrupted. This corruption overwrites a function pointer. Replace the pointer with an address where the incoming key material was stored, and suddenly we have shellcode execution.
There are several problems with turing this into a functional exploit. The first is that it’s a race condition, requiring very tight timing to split program execution in just the right spot. The randomness of network timing makes this a high hurdle. Next, all major distros use Address Space Layout Randomization (ASLR), which should make that pointer overwrite very difficult. It turns out, also on all the major distros, ASLR is somewhat broken. OK, on 32-bit installs, it’s completely broken. On the Debian system tested, there’s literally a single bit of ASLR in play for the glibc library. It can be located at one of two possible memory locations.
Assuming the default settings for max SSH connections and LoginGraceTime, it takes an average of 3-4 hours to win the race condition to trigger the bug, and then there’s a 50% chance of guessing the correct address on the first try. That seems to put the average time at five and a quarter hours to crack a 32-bit Debian machine. A 64-bit machine does have ASLR that works a bit better. A working exploit had not been demonstrated as of when the vulnerability write-up was published, but the authors suggest it could be achieved in the ballpark of a week of attacking.
So what systems should we really worry about? The regression was introduced in 8.5p1, and fixed in 9.8p1. That means Debian 11, RHEL 8, and their derivatives are in the clear, as they ship older OpenSSH versions. Debian 12 and RHEL 9 are in trouble, though both of those distros now have updates available that fix the issue. If you’re on one of those distros, particularly the 32-bit version, it’s time to update OpenSSH and restart the service. You can check the OpenSSH version by running nc -w1 localhost 22 -i 1
, to see if you’re possibly vulnerable.
Polyfill
The Polyfill service was once a useful tool, to pull JavaScript functions in to emulate newer browser features in browsers that weren’t quite up to the task. This worked by including the polyfill JS script from polyfill.io. The problem is that the Funnull company acquired the polyfill domain and Github account, and began serving malicious scripts instead of the legitimate polyfill function.
The list of domains and companies caught in this supply chain attack is pretty extensive, with nearly 400,000 still trying to link to the domain as of July 3rd. We say “trying”, as providers have taken note of Sansec’s report, breaking the story. Google has blocked associated domains out of advertising, Cloudflare is rewriting calls to polyfill to a clean cache, and Namecheap has blackholed the domain, putting an end to the attack. It’s a reminder that just because a domain is trustworthy now, it may not be in the future. Be careful where you link to.
Pack It Up
We’re no strangers to disagreement over CVE severity drama. There can be a desire to make a found vulnerability seem severe, and occasionally this results in a wild exaggeration of the impact of an issue. Case in point, the node-ip project has an issue, CVE-2023-42282, that originally scored a CVSS of 9.8. The node-IP author has taken the stance that it’s not a vulnerability at all, since it requires an untrusted input to be passed into node-ip, and then used for an authorization check. It seems to be a reasonable objection — if an attacker can manipulate the source IP address in this way, the source IP is untrustworthy, regardless of this issue in node-ip.
The maintainer, [Fedor] made the call to simply archive the node-ip project in response to the seemingly bogus CVE, and unending stream of unintentional harassment over the issue. Auditing tools starting alerting developers about the issue, and they started pinging the project. With seemingly no way to fight back against the report, archiving the project seemed like the best solution. However, the bug has been fixed, and Github has reduced the severity to “low” in their advisory. As a result, [Fedora] did announce that the project is coming back, and indeed it is again an active project on Github.
Bits and Bytes
[sam4k] found a remote Use After Free (UAF) in the Linux Transparent Inter Process Communication (TIPC) service, that may be exploitable to achieve RCE. This one is sort of a toy vulnerability, found while preparing a talk on bug hunting in the Linux kernel. It’s also not a protocol that’s even built in to the kernel by default, so the potential fallout here is quite low. The problem is fragmentation handling, as the error handling misses a check for the last fragment buffer, and tries to free it twice. It was fixed this May, in Kernel version 6.8.
CocaoPods is a dependency manager for Swift/Objective-C projects, and it had a trio of severe problems. The most interesting was the result of a migration, where many packages lost their connection to the correct maintainer account. Using the CocaoPods API and a maintainer email address, it was possible for arbitrary users to claim those packages and make changes. This and a couple other issues were fixed late last year.
“This Week In Security” is my favorite weekly post. I look forward to next week. :)
I get my weekly cringe reading it.
The cringe is what is being reported about the bad actors.
This OpenSSH RCE is terrifying. Not for servers, servers run 64bit, have ASLR, get updates. But, there are a ton of embedded devices that have SSH on them these days, and run 32bit ARM and will never see updates.
Silver Lining: It gives us a way to get root on those old IoT devices.
I thought / assumed most embedded devices use Dropbear or some other small/minimal SSH server instead of the full fledged OpenSSH.
The exploit specifically mentions glibc on (32bit) x86, whereas the majority of embedded distributions use ulibc, which by nature of omission sounds like it’s not affected.
no, it’s just that the *exploit demonstrated* is glibc-specific. The underlying issues aren’t glibc-specific.
You should mention that this is not exploitable on OpenBSD.
True. It’s sorta implied by omission, and the “why” is yet another rabbit hole.