An issue was discovered in libarchive
through Google’s ClusterFuzz project. Libarchive
is a compression and decompression library, widely used in utilities. The issue here is how the library recovers from a malformed archive. Hitting an invalid header causes the memory in use to be freed. The problem is that it’s possible for file processing to continue even after that working memory has been freed, leading to all kinds of problems. So far an actual exploit hasn’t been revealed, but it’s likely that one is possible. The problem was fixed back in May, but the issue was just announced to give time for that update to percolate down to users.
Of note is the fact that this issue was found through Google’s fuzzing efforts. Google runs the oss-fuzz project, which automatically ingests nightly builds from around 200 open source projects and runs ClusterFuzz against them. This process of throwing random data at programs and functions has revealed over 14,000 bugs.
PDF Documents and Embedded Fonts
Adobe recently released a security update for Reader. There are a bunch of vulnerabilities in this update, but the one that caught my eye was CVE-2019-8196. This too was discovered through fuzzing, though in this case it was randomly manipulating a valid PDF document that produced the malformed file. Changes to an embedded font led to reading from an uninitialized memory location. It’s worrying that after nearly ten years of font related vulnerabilities, crashes like this are still possible.
Trusted Platform Module Timing Attack
The Trusted Platform Module is a piece of hardware embedded on many motherboards that allows offloading cryptographic operations. The purpose of moving tasks like signing and encryption off the CPU isn’t for speed benefits, in this case. Instead, the TPM is intended to be a trusted platform even if the operating system has been compromised. A new study, aptly named TPM Fail, has illustrated a timing attack that reveals one of the secret keys embedded in the TPM.
Timing attacks turn normal programming paradigms on their head. For instance, when using grep
to look for the first instance of a string in a large file, you want the operation to complete as quickly as possible. If the data you’re searching for is found in the top 10% of the file, it’s obvious that the search program should stop searching and return an answer as soon as it’s found. What’s less obvious is the information leaked by the amount of time it takes grep
to find your pattern. Imagine running several such searches on the same file, all looking for different strings. Given the timing data and source file, it would be possible to make an educated guess at the search strings.
A typical example of this in the real world is a string comparison routine. If the first characters of two strings are different, the routine can return immediately, but if the first 10 characters are the same and the strings don’t diverge till the 11th character, that routine will take that much longer to run the comparison. Given enough time to observe comparisons, an attacker could work out the secret string one character at a time. The longer the comparison time, the more characters are correct. The standard solution to this dilemma is using constant-time functions. In our string comparison example, this would mean comparing every character of the strings, regardless of how different they are.
The researchers in question discovered that certain TPM 2.0 implementations failed to use constant-time algorithms when doing elliptic curve routines. So how bad is the result? On a vulnerable system, the secret TPM key can be extracted in 20 minutes at the longest, assuming the attacker could run code on that machine. Even worse, in the case of a VPN like StrongSwan powered by a TPM, the secret key could be extracted in 5 hours of network traffic.
Software implementations can be patched, but hardware TPMs are much harder to fix, as they aren’t flashable by design. The math of how the secret key maps to the timing variations is a bit complicated, but it’s all in the paper, so go take a look.
Zombieload Returns From the Dead
Very appropriately, considering the name, Zombieload has risen once again in the form of Zombieload V2. This time the problem is the Transactional Synchronization Extensions (TSX) in modern Intel processors. Existing mitigations weren’t enough to prevent the TSX Asynchronous Abort (TAA) vulnerability, so this attack wasn’t disclosed with the rest of the Zombieload details. What makes TAA particularly problematic is that a process doesn’t need to use the TSX instructions, all that’s needed is for an attacker to have access to them. For a vulnerable processor, the ultimate solution is to disable TSX instructions altogether. The Linux kernel documentation has some of the most useful information I’ve found.
Windows Attacks
A pair of attacks against Windows systems came to light recently. First is Ghost Potato, an NTLM reflection attack. NTLM is the authentication solution built into Windows. There have been a bunch of attacks against NTLM over the years, and mitigations against those attacks. One of the side-effects of Windows being closed-source is that it’s not always clear how exactly those mitigations work. Ghost Potato is an example of how trivial those mitigations can be to bypass, once an attacker understand how they work.
The essence of NTLM reflection attacks is that if an attacker can trick a user into trying to perform an NTLM authentication with a malicious machine. That authentication is then replayed back at the connecting machine in real time, essentially causing the victim to authenticate with their own machine. Once authentication finished, the attacker can take over the session and act as the authenticated user. One of the ways this attack was mitigated was a record of the cryptographic challenges. Incoming challenge messages are compared to those recently sent, and a collision is probably an attack. The problem is in how quickly challenges are aged out of that cache: 5 minutes. It’s possible to stretch an NTLM authentication session past that 5 minute mark. At that point, sending a second bogus connection attempt will push the targeted message off the end of the cache, and the reflection attempt will succeed. It’s clever, and it was fixed in the November Patch Tuesday update to Windows.
Bitlocker is the disk/folder encryption solution that’s been baked into Windows ever since Windows Vista. A group of researchers just published a paper detailing the state-of-the-art password cracking techniques applied to Bitlocker. Their Bitcracker program contains a couple of novel performance improvements. First, they realized that part of the Bitlocker decryption algorithm could be partially pre-computed. A 256 Mb lookup table was built, which shaves a bit of time off of each password attempt.
The second improvement is a way to quickly determine if the decryption attempt was successful before doing all the calculations normally required. Bitlocker decryption normally ends with calculating and comparing a Message Authentication Code. If the calculated MAC matches what is expected, then the password was correct. It was noticed that the output of a proper decryption was recognizable without calculating the MAC. In essence, they were able to quickly determine whether the result of a given password attempt was giving a plausible output without completing the process. This shortcut can result in some false positives, but the number of false positives are low enough to make the approach well worth the effort.
When using high-end CUDA enabled GPUs, they were able to test well over 100 million passwords in a day. This isn’t really enough performance to brute-force a good password, but certainly makes a large dictionary attack trivial. Their conclusion was that it’s still important to use good passwords that don’t appear on password lists.
I always look forward to this column, in a non-twisted way of course, thanks!
Waiting to see if your attack vector gets exposed? ;D
Ah, the venerable old timing based attack.
Another solution to timing attacks is to have a secondary system that adds an unknown and random amount of time to each request, this wouldn’t technically “stop” timing based attacks, only make the answers of said attack “a bit” more fuzzy.
Adding random delay is though not a “perfect” solution, preferably one should use functions that always complete in a constant amount of time. And this is laughably easy. Just have a timmer running in parallel to the task, the timmer is always long enough for the task to finish, and the answer is never returned before the timmer is up. And in an ASIC, this is really easy to implement.
But if the task can take anywhere from an instant all the way to many hours to process, then it might be a bit of a drag to always wait hours….
Another mitigation towards timing attacks is to lower the resolution of the time itself. As an example, if we only send over answers once every millisecond, then we can’t tell if the task took 1.2ms or 1.8ms to finish, since they are both returned after 2ms. This is though not an “ideal” solution, since it does still leak information, it just leaks less precise information.
To make this work better and leak even less, we can make our functions (where applicable) go through the data in a random order. For an example, if one compares strings to a list of strings, then we don’t need to start with the first string, or scan through the list in a fully sequential manner, we can jump around a bit. This effectively means that we “scramble” the order of the listed items, except without loosing the actual order. And yes, we will need to keep track of what parts of the list we have scanned to not waste time comparing them again…
This means that the time it takes for a function to complete isn’t governed by the input and the length of a static list, but also on the output of a random number generator. (And the average time it takes to find a specific item will actually be unaffected, statistically speaking.)
In the end, if the function is always able to complete the task rather quickly, then use a timmer to have it return answers within a constant amount of time.
If the completion time can vary with an exceptionally large amount and you desire at least some performance, then take care to eliminate sources of timing attacks. Only sending over answers at specified time intervals is one solution, adding some additional time on top is yet another solution that works well in conjunction with the prior. (unless your random number generation is producing clear patterns. But since the attacker likely doesn’t know the original time then these patterns should be a lot harder to see.) And if one includes some randomness to the order one executes things, then timing attacks starts becoming rather hard.
Having a fixed response time is a good defense against remote attacks, but the method the author mentioned of always doing a full comparison is better against local attacks because power consumption and RF leaks can also leak this same time of branching information because of the differences in doing a comparison and just idly waiting for a timer to expire so you can return the results when it does expire.
This is indeed also a thing to take into considerations. And if there is few enough electrical disturbances on one’s portion of the power grid, then these power variations/noise can travel very far, but with good enough equipment, tempest is always possible.
Though, a TPM usually doesn’t consume huge amounts of power, and is a separate chip out on the motherboard, “far” away from the CPUs, in an enclosed server, in a steel cabinet in a data center full of other computers. So it is likely rather hard for the CPUs of that server to even know what load the TPM is under at any given moment. And any external entities outside the data center will have a lot of noise to dig through….
So using a simple timmer should be sufficient, after all, the difference in power consumption is rather negligible after all.
For more power hungry systems, or things running locally on the same CPU, then yes, idly waiting for a timmer is not making things all that more secure as far as malicious code running on the same CPU goes.
Thought I might chime in as someone who’s worked on a few of these… in short, they’re much harder to mitigate than they first appear.
Adding a random delay or noise doesn’t change the exposure as much as it appears, it just requires the attacker to collect more samples to average it away, something they’re doing anyway to account for the randomness introduced by cache misses, scheduling, etc. Frequently, you’re already collecting thousands or millions of samples per collected bit.
Even constant time versions of the functions don’t reliably obfuscate the key, as things like thermal throttling on the CPU will expose how hard it “worked” during the operation at a pretty granular level.
The best solutions involve writing the function in such a way as to require the same “work” no matter what the outcome is, but that can be incredibly difficult to do without breaking performance horribly.
Yes, for stuff running on the same CPU it gets a lot harder.
But for a separate system, on its own chip, then the CPU is rather oblivious to the efforts of that other chip.
I’ve always thought it really boils down to shared environment… the more you share, the easier it is to communicate. Assuming one process wishes to monitor another:
Two segmented processes on two CPUs sharing a motherboard will have high bandwidth covert channels and timing attacks using the everything from the NICs, to accessing memory in a separate NUMA domain. The bandwidth between them can be gigabits per second.
Two separate physical machines sharing a rack can communicate more slowly, using things like sound, temperature and drive access. Now the bandwidth drops into the dozens of bps.
Two machines on opposite sides of the same room can communicate by thermal and electrical load… getting into bits per minute or bits per hour.
Enforcing Mandatory Access Control and High/Low Security is hard!
“This process of throwing random data at programs and functions has revealed over 14,000 bugs.”
Which Google immediately copyrighted as Intellectual Property.
B^)
Good informative article!
I always smile when Tempest is mentioned.