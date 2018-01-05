This week we’ve seen a tsunami of news stories about a vulnerability in Intel processors. We’re certain that by now you’ve heard of (and are maybe tired of hearing about) Meltdown and Spectre. However, as a Hackaday reader, you are likely the person who others turn to when they need to get the gist of news like this. Since this has bubbled up in watered-down versions to the highest levels of mass media, let’s take a look at what Meltdown and Spectre are, and also see what’s happening in the other two rings of this three-ring circus.
Meltdown and Spectre in a Nutshell
These two attacks are similar. Meltdown is specific to Intel processors and kernel fixes (basically workarounds implemented by operating systems) will result in a 5%-30% speed penalty depending on how the CPU is being used. Spectre is not limited to Intel, but also affects AMD and ARM processors and kernel fixes are not expected to come with a speed penalty.
Friend of Hackaday and security researcher extraordinaire Joe Fitz has written a superb layman’s explanation of these types of attacks. His use of the term “layman” may be a little more high level than normal — this is something you need to read.
The attack exploits something called branch prediction. To boost speed, these processors keep a cache of past branch behavior in memory and use that to predict future branching operations. Branch predictors load data into memory before checking to see if you have permissions to access that data. Obviously you don’t, so that memory will not be made available for you to read. The exploit uses a clever guessing game to look at other files also returned by the predictor to which you do have access. If you’re clever enough, you can reconstruct the restricted data by iterating on this trick many many times.
For the most comprehensive info, you can read the PDF whitepapers on Meltdown and Spectre.
Frustration from Kernel Developers
These vulnerabilities are in silicon — they can’t be easily fixed with a microcode update which is how CPU manufacturers usually workaround silicon errata (although this appears to be an architectural flaw and not errata per se). An Intel “fix” would amount to a product recall. They’ve already said they won’t be doing a recall, but how would that work anyway? What’s the lead time on spinning up the fabs to replace all the Intel chips in use — yikes!
So the fixes fall on the operating systems at the kernel level. Intel should be (and probably is behind the scenes) bowing down to the kernel developers who are saving their bacon. It is understandably frustrating to have to spend time and resources patching these vulnerabilities, which displaces planned feature updates and improvements. Linus Torvalds has been throwing shade at Intel — anecdotal evidence of this frustration:
“I think somebody inside of Intel needs to really take a long hard look at their CPU’s, and actually admit that they have issues instead of writing PR blurbs that say that everything works as designed.”
That’s the tamest part of his message posted on the Linux Kernel Mailing List.
Stock Sales Kerfuffle is Just a Distraction
The first thing I did on hearing about these vulnerabilities on Tuesday was to check Intel’s stock price and I was surprised it hadn’t fallen much. In fact, peak to peak it’s only seen about an 8% drop this week and has recovered some from that low.
Of course, it came out that back in November Intel’s CEO Bryan Krzanich sold off his Intel stock to the tune of $24 Million, bringing him down to his contractual minimum of shares. He likely knew about Meltdown when arranging that sale. Resist the urge to flame on this decision. Whether it’s legal or not, hating on this guy is just a distraction.
What’s more interesting to me is this: Intel is too big to fail. What are we all going to do, stop using Intel and start using something else? You can’t just pull the chip and put a new one in, in the case of desktop computers you need a new motherboard plus all the supporting stuff like memory. For servers, laptops, and mobile devices you need to replace the entire piece of equipment. Intel has a huge market share, and silicon has a long production cycle. Branch prediction has been commonplace in consumer CPUs going back to 1995 when the Pentium Pro brought it to the x86 architecture. This is a piece of the foundation that will be yanked out and replaced with new designs.
CPUs are infrastructure and this is the loudest bell to date tolling to signal how important their design is to society. It’s time to take a hard look at what open silicon design would bring to the table. You can’t say this would have been prevented with Open design. You can say that the path to new processors without these issues would be a shorter one if there were more than two companies producing all of the world’s processors — both of which have been affected by these vulnerabilities.
12 thoughts on “Let’s Talk Intel, Meltdown, and Spectre”
“Branch preditors”
Well, that’s a new term for me, I keep picturing velociraptors when I see it though…
My ‘c’ key must be on holiday. Fixed, thanks!
Velociraptors that wait in tree branches for their prey ?
Hahahaha so true!!
Please put a Twitter warning with the link!
-signed, a non-twit
Second that vote.
Clicked it anyway (you guys carry small bit of trust now)
Hmm, string of something about a Liberian and a book.
A proper blog or web page is far better manner for dispensing any read-worthy info.
A narrow strip of fragmented sentences
is horribly off-putting to try reading and difficult to absorb anything meaningful from.
I am more surprised it took a decade to find an exploit.
AMD still doesn’t admit they are vulnerable to this class of exploit
ARM64 is a far bigger issue as most mobile phones will never see a kernel patch
Oracle is probably thinking they should keep SPARC64 around another few years. ;-)
Joe Fitz’s tweetstorm did help me understand this a bit. But I don’t see how this helps you get data that’s not in your process. Sure, you can find out if the memory from a certain address has been loaded into the cache, but you can’t see that memory’s data … can you? Fitz’s example relies on the “address” (the title of the book) containing a piece of data (the letter of the alphabet). That’s not the case IRL.
(And BTW, why do people write tweetstorms? If you have an entire paragraph to write, why not write it on Facebook? Tweetstorms are contrary to the whole point of Twitter. But I digress.)
I was confused by this in Joe Fitz’s example at first too. But his analogy says that you ask for a book whose title starts with the same letter as the first letter on the first page of the restricted document. You don’t know what that letter is, but it’s in memory and the vulnerability will apparently act on that request (pointer to the memory) even though you don’t have access to that memory space.
Whatever book gets returned is the letter in memory. You didn’t get access to that memory space, but you were able to make a comparison based on what is stored there.
“you ask for a book whose title starts with the same letter as the first letter on the first page of the restricted document”: But what’s the computer analogy of this? The only thing I can ask for is a memory address. I can’t ask for something complicated like “the Sue Grafton novel that corresponds to the first letter of page 1 of that book”.
Oh, wait. That sounds like indirect addressing. So maybe I know that the kernel stores a password at address 0xabcdef00. So I load that into a register and use it with indirect addressing. That means the data at that address was itself treated as an address and *that* address and its data were loaded into the cache. Now, I just have to try loading every possible address until I see a response that’s fast enough to indicate that the address was already in the cache.
Could that be what’s happening? If so, it seems completely impractical. While I’m loading every possible address, I’m bumping data out of the cache and replacing it with my attempts. Pretty quickly, I’ll bump that precious address I indirectly loaded. (Well, maybe not that quickly: we have large caches these days. If I can narrow down my search to a small number of values – e.g. ASCII characters – it might not be so bad.)
If I’m understanding this correctly, it sounds really difficult to exploit this vulnerability.
Mike, I don’t think your understanding and explanation of meltdown is correct. Most OSs keep their private pages mapped into the virtual address spaces of every process – but protected from rwx access. This helps speed up syscalls. The issue isn’t about prediction. It’s about speculative execution.
Suppose you have a variable P that represents some protected non-readable memory location. You create two variables A and B and place them in cache lines that are separate. Then you ensure the cache lines for A and B are not loaded – though any variety of means such as reading a cache size of other data. Then you execute the following pseudo code:
if (bit 0 of P) print A; else print B;
The speculative function of the branch predictor will fetch and evaluate the ‘if’ conditional without consideration of protection bits. It will then pre-fetch the value of A or B depending on the value of bit 0 of P. When it does, it will load the cache line containing the resulting variable. Once the execution point catches up to the predictor, the process will experience an exception.. as any attempt to ready P is illegal. But the damage has been done. Either the cache line containing A or B but not both will be loaded. If you then measure the access time of reading variable B vs A, you can infer the value of bit 0 of P based on the difference. Rinse and repeat for all bits in all values of protected memory.
Not an easy to code or execute exploit… but an potential attack vector all the same.
The current pursued OS work-around is to not map kernel pages into all VM spaces – even if they are protected.
This was very helpful. Thank you.