Let’s Talk Intel, Meltdown, and Spectre

This week we’ve seen a tsunami of news stories about a vulnerability in Intel processors. We’re certain that by now you’ve heard of (and are maybe tired of hearing about) Meltdown and Spectre. However, as a Hackaday reader, you are likely the person who others turn to when they need to get the gist of news like this. Since this has bubbled up in watered-down versions to the highest levels of mass media, let’s take a look at what Meltdown and Spectre are, and also see what’s happening in the other two rings of this three-ring circus.

Meltdown and Spectre in a Nutshell

These two attacks are similar. Meltdown is specific to Intel processors and kernel fixes (basically workarounds implemented by operating systems) will result in a 5%-30% speed penalty depending on how the CPU is being used. Spectre is not limited to Intel, but also affects AMD and ARM processors and kernel fixes are not expected to come with a speed penalty.

Friend of Hackaday and security researcher extraordinaire Joe Fitz has written a superb layman’s explanation of these types of attacks. His use of the term “layman” may be a little more high level than normal — this is something you need to read.

The attack exploits something called branch prediction. To boost speed, these processors keep a cache of past branch behavior in memory and use that to predict future branching operations. Branch predictors load data into memory before checking to see if you have permissions to access that data. Obviously you don’t, so that memory will not be made available for you to read. The exploit uses a clever guessing game to look at other files also returned by the predictor to which you do have access. If you’re clever enough, you can reconstruct the restricted data by iterating on this trick many many times.

For the most comprehensive info, you can read the PDF whitepapers on Meltdown and Spectre.

Update: Check Alan Hightower’s explanation of the Meltdown exploit left as a comment below. Quite good for helping deliver better understanding of how this works.

Frustration from Kernel Developers

These vulnerabilities are in silicon — they can’t be easily fixed with a microcode update which is how CPU manufacturers usually workaround silicon errata (although this appears to be an architectural flaw and not errata per se). An Intel “fix” would amount to a product recall. They’ve already said they won’t be doing a recall, but how would that work anyway? What’s the lead time on spinning up the fabs to replace all the Intel chips in use — yikes!

So the fixes fall on the operating systems at the kernel level. Intel should be (and probably is behind the scenes) bowing down to the kernel developers who are saving their bacon. It is understandably frustrating to have to spend time and resources patching these vulnerabilities, which displaces planned feature updates and improvements. Linus Torvalds has been throwing shade at Intel — anecdotal evidence of this frustration:

“I think somebody inside of Intel needs to really take a long hard look at their CPU’s, and actually admit that they have issues instead of writing PR blurbs that say that everything works as designed.”

That’s the tamest part of his message posted on the Linux Kernel Mailing List.

Stock Sales Kerfuffle is Just a Distraction

The first thing I did on hearing about these vulnerabilities on Tuesday was to check Intel’s stock price and I was surprised it hadn’t fallen much. In fact, peak to peak it’s only seen about an 8% drop this week and has recovered some from that low.

Of course, it came out that back in November Intel’s CEO Bryan Krzanich sold off his Intel stock to the tune of $24 Million, bringing him down to his contractual minimum of shares. He likely knew about Meltdown when arranging that sale. Resist the urge to flame on this decision. Whether it’s legal or not, hating on this guy is just a distraction.

What’s more interesting to me is this: Intel is too big to fail. What are we all going to do, stop using Intel and start using something else? You can’t just pull the chip and put a new one in, in the case of desktop computers you need a new motherboard plus all the supporting stuff like memory. For servers, laptops, and mobile devices you need to replace the entire piece of equipment. Intel has a huge market share, and silicon has a long production cycle. Branch prediction has been commonplace in consumer CPUs going back to 1995 when the Pentium Pro brought it to the x86 architecture. This is a piece of the foundation that will be yanked out and replaced with new designs that provide the same speed benefits without the same risks — but that will take time to make it into the real world.

CPUs are infrastructure and this is the loudest bell to date tolling to signal how important their design is to society. It’s time to take a hard look at what open silicon design would bring to the table. You can’t say this would have been prevented with Open design. You can say that the path to new processors without these issues would be a shorter one if there were more than two companies producing all of the world’s processors — both of which have been affected by these vulnerabilities.

150 thoughts on “Let’s Talk Intel, Meltdown, and Spectre

    1. Second that vote.

      Clicked it anyway (you guys carry small bit of trust now)
      Hmm, string of something about a Liberian and a book.
      A proper blog or web page is far better manner for dispensing any read-worthy info.

      A narrow strip of fragmented sentences
      is horribly off-putting to try reading and difficult to absorb anything meaningful from.

      1. Yeah seriously. If you need to number the paragraphs so readers understand what order they are supposed to go in you should really reconsider if that’s the best way to put out your information.

    2. Aye… I agree. Twitter play favourites with whom they enforce their terms of service, otherwise @POTUS would have seen the ban hammer months ago.

      For that reason, I do not use or support Twitter. If Joe Fitz’s discussion is worth reading, then it should be posted to a venue which is more appropriate.

    3. I’m not part of the Twitterati… but I could still read Joe Fitz’s explanation just fine.

      It’s an excellent summary, and everyone should read it, regardless of whether you like Twitter or not.

  1. I am more surprised it took a decade to find an exploit.

    AMD still doesn’t admit they are vulnerable to this class of exploit

    ARM64 is a far bigger issue as most mobile phones will never see a kernel patch

    Oracle is probably thinking they should keep SPARC64 around another few years. ;-)

    1. Reread the article, because you are mistaken: Meltdown IS Intel specific, and AMD (along with basically every other CPU maker) has advised that their CPUs are vulnerable to Spectre.

      My understanding is that Spectre will not get kernel patches, because it’s almost impossible to patch – we’re looking at a programming paradigm shift necessitated by Spectre.

          1. Thanks, PreferLinux, that’s what I was looking for – didn’t see your response until I posted, though..

            Ryan, that’s really not very helpful: not only was it ‘he said, she said, you didn’t specify whether it was Meltdown, and exactly how many Apple cores utilise A75 anyway?

        1. I’m quite fond of RISC-V and hope to see further success there; but (to the degree I’ve been able to understand it) these weaknesses aren’t really tied to ISA but to implementation.

          The current state of RISC-V is safe; but largely because RISC-V implementations you can actually get in physical form are in their relative infancy and don’t implement the techniques commonly used for extra speed on bigger cores(just as most of ARM’s vulnerability is with their very newest design, not the zillions of older cores aimed at lower power and complexity; and even Intel, the worst affected of the lot, was safe with some of their earlier parts).

          Hopefully all RISC-V implementations will take advantage of being able to observe others learning hard lessons before they commit to similar mistakes; but it won’t be the ISA that saves them.

        1. So Spectre yes; Meltdown no, if I’m reading their rather vague description correctly(they mention ‘all microprocessors’ rather than the much more specific list for Meltdown).

          Probably won’t sell anyone who wasn’t at least considering; but they probably don’t mind, at all, that Xeon benchmarks are going to regress, potentially substantially, in some of the database and similar workloads that their customers care about.

    2. Any processor that doesn’t flush branch predictors and/or caches at a context switch is vulnerable to “spectre” at least theoretically.

      Only Intel processors are vulnerable to “meltdown” as it is due to a bug specific to their processor, some claim that some ARM processors have the same kind of design bug though.

      It’s good that people are waking up to the fact that shared resources means shared secrets – and that there are actually real exploits possible using this sharing. But it isn’t news.

  2. Joe Fitz’s tweetstorm did help me understand this a bit. But I don’t see how this helps you get data that’s not in your process. Sure, you can find out if the memory from a certain address has been loaded into the cache, but you can’t see that memory’s data … can you? Fitz’s example relies on the “address” (the title of the book) containing a piece of data (the letter of the alphabet). That’s not the case IRL.

    (And BTW, why do people write tweetstorms? If you have an entire paragraph to write, why not write it on Facebook? Tweetstorms are contrary to the whole point of Twitter. But I digress.)

    1. I was confused by this in Joe Fitz’s example at first too. But his analogy says that you ask for a book whose title starts with the same letter as the first letter on the first page of the restricted document. You don’t know what that letter is, but it’s in memory and the vulnerability will apparently act on that request (pointer to the memory) even though you don’t have access to that memory space.

      Whatever book gets returned is the letter in memory. You didn’t get access to that memory space, but you were able to make a comparison based on what is stored there.

      1. “you ask for a book whose title starts with the same letter as the first letter on the first page of the restricted document”: But what’s the computer analogy of this? The only thing I can ask for is a memory address. I can’t ask for something complicated like “the Sue Grafton novel that corresponds to the first letter of page 1 of that book”.

        Oh, wait. That sounds like indirect addressing. So maybe I know that the kernel stores a password at address 0xabcdef00. So I load that into a register and use it with indirect addressing. That means the data at that address was itself treated as an address and *that* address and its data were loaded into the cache. Now, I just have to try loading every possible address until I see a response that’s fast enough to indicate that the address was already in the cache.

        Could that be what’s happening? If so, it seems completely impractical. While I’m loading every possible address, I’m bumping data out of the cache and replacing it with my attempts. Pretty quickly, I’ll bump that precious address I indirectly loaded. (Well, maybe not that quickly: we have large caches these days. If I can narrow down my search to a small number of values – e.g. ASCII characters – it might not be so bad.)

        If I’m understanding this correctly, it sounds really difficult to exploit this vulnerability.

        1. It IS really difficult to exploit this vulnerability, but it’s in silicon and can’t be readily fixed by the manufacturer (aside from the aforementioned kernel changes) so it’s sort of a big deal.

          Being able to leak protected memory is a problem, no matter how convoluted the method is by which you do it. Hackers have a lot of free time, remember.

          1. From what I’ve read, Meltdown is worked around completely in the OS. One variant of Spectre (abusing speculative execution of indirect branches) is dealt with in microcode.

          2. Intel has made claims that they could release some microcode updates to “eliminate” some (or all) of the issues raised by Spectre (see what I did there)

            However, the list of processors they will fix is short, and their claims stretch believability anyway. Also, because Intel microcode updates are cryptographically secured and only unpacked within the processor, it would be near impossible for outsiders to see what the hell they actually fixed within the chip.

        2. It’s pretty easy to exploit Meltdown. The paper on it is fairly readable in my opinion.

          Basically, the data is read, despite the fact that you’re not actually allowed to. This shouldn’t be a problem, because the CPU will raise an exception and clear the data. But before the CPU does so, you can use that data due to speculative execution. If you have a variable that is not in the cache, loading it will put it in the cache. You can later check whether it is in the cache or not. If you only load it if a particular bit of the data is a 1, you can later (after the CPU has raised the exception) tell what that particular bit of the data is. It is easy to specify what data you want to read, which bit of it you want, and it is easy to repeat the process indefinitely.

          1. Actually, the exception is never raised. You put the illegal operation after an “if 0” command so that it never happens. However, the speculative part assumes that the command MIGHT happen and runs it. However, it is later discarded, so the operation “never happens” except that now the memory addresses in the cache have changed…

          2. That’s the case with Spectre, but generally not with Meltdown. With Meltdown, the exception can be caught, exit another process, or be suppressed through memory transactions – or do as you say, but that’s much more complex and also unnecessary.

  3. Mike, I don’t think your understanding and explanation of meltdown is correct. Most OSs keep their private pages mapped into the virtual address spaces of every process – but protected from rwx access. This helps speed up syscalls. The issue isn’t about prediction. It’s about speculative execution.

    Suppose you have a variable P that represents some protected non-readable memory location. You create two variables A and B and place them in cache lines that are separate. Then you ensure the cache lines for A and B are not loaded – though any variety of means such as reading a cache size of other data. Then you execute the following pseudo code:

    if (bit 0 of P) print A; else print B;

    The speculative function of the branch predictor will fetch and evaluate the ‘if’ conditional without consideration of protection bits. It will then pre-fetch the value of A or B depending on the value of bit 0 of P. When it does, it will load the cache line containing the resulting variable. Once the execution point catches up to the predictor, the process will experience an exception.. as any attempt to ready P is illegal. But the damage has been done. Either the cache line containing A or B but not both will be loaded. If you then measure the access time of reading variable B vs A, you can infer the value of bit 0 of P based on the difference. Rinse and repeat for all bits in all values of protected memory.

    Not an easy to code or execute exploit… but an potential attack vector all the same.

    The current pursued OS work-around is to not map kernel pages into all VM spaces – even if they are protected.

      1. Shit yeah! If a metaphor is more complex than the problem itself… Not to mention vague. It’s OK, we’re geeks, we understand speculative execution. Nice explanation, Alan, right to the point and no mess. It’s the first one I actually understand!

        So, seems like any CPU with speculative execution is vulnerable. Maybe future CPUs need to re-jigger the order of things, so you check page permissions before you load them to speculatively execute.

      1. Link? From real benchmarks I’ve seen, instead of just wild speculation, you only get that sort of drop with synthetic IO benchmarks (either network loopback or file with fast SSDs).

    1. Like you said at first, Meltdown isn’t due to branch prediction, but rather just to the fact that the access to P (protected data) is speculatively allowed before consideration of the protection bits. Spectre is much more devious. It takes advantage of branch prediction and speculative execution of the predicted branch, all done in a different context. It shares with Meltdown the fact that it is taking advantage of speculative execution and using the effect on the cache to recover the data it cannot see directly.

    2. Thank you, this is the most clear explanation I have read so far. Surprising that this trick has remained unknown (at least to CPU designers) for this long.
      I don’t really see why this would be difficult to code or exploit, even without an accurate way to measure the access time, it would be easy to average many measurements together and have a pretty good guess of the bit you are trying to read.

    3. So can’t they fix this in future CPU designs by introducing a table in the CPU that holds process IDs and separate things that way even for speculative stuff? Although it would limit the amount of processes and would need a cache added.

  4. “Branch prediction has been commonplace in consumer CPUs going back to 1995 when the Pentium Pro brought it to the x86 architecture. This is a piece of the foundation that will be yanked out and replaced with new designs.”
    I doubt it. Branch prediction and speculative execution are far too critical to performance to throw out. And any replacement would be so major I would guess it’d probably take 10+ years.

    My guess is they’ll change the way caching works so that speculative loads aren’t visible until the instruction loading the data is retired, eliminating the state change caused by speculative execution and the side-channel that gets data out. It would fix both Meltdown and Spectre, and also protect against other cache-based side-channel attacks.

    1. You are right, and I agree with you. I didn’t make myself very clear there so I appended that sentence:

      “This is a piece of the foundation that will be yanked out and replaced with new designs that provide the same speed benefits without the same risks — but that will take time to make it into the real world.”

      What I meant is this predictive behavior has been around a long time because it delivers a big speed benefit. I said it’s part of the foundation meaning the rest depends on it and this is hard to just remove without putting something in its place.

    2. “My guess is they’ll change the way caching works so that speculative loads aren’t visible until the instruction loading the data is retired”

      I think Meltdown/Spectre actually showed that the whole idea of speculative execution is really, really hard to secure. Cache lines are pretty much the easiest way for data to get out, but there are things that you can do that would be really nasty. Speculative execution takes up functional units inside the processor, for instance. You could imagine trying to leak data on speculatively-executed code based on how many functional units or what kinds were in use, completely avoiding the cache issue altogether. That code would be a hell of a lot harder and probably slower, but it’d probably work too.

      In the Meltdown paper, for instance, they of course pointed out that in the worst case, you’d still have EM/power sidechannels to try to get data out. This situation isn’t pretty.

      1. Very good points.

        I doubt speculative execution will go away, but they’re sure going to have a bit of work to do to get things like this in better shape. I’m glad it’s not my job…

        1. Even if you get rid of the ability to measure time, you can substitute more subtle things for that. On a multicore system, use an intentional race condition to determine if one core executes something faster than the other. The Meltdown paper points out, for instance, that even in the worst case, you’ve still got the power/EM sidechannel that you basically can never get rid of.

    3. Meltdown seems easy enough to fix in hardware: check the protection bits before doing the speculative load.

      Spectre also seems easy enough to fix in hardware + software; provide a way to clear (or swap out) the BTB during context switches.

      Both of these bugs point out the dangers of speculative execution and make clear that walls between process spaces need to be sturdier.

      1. For Meltdown it’d probably be adequate to check the protection bits after doing the load but before handing the data over to the instruction. I’m not sure why they don’t, because as far as I can guess they should have them at least as soon as they’ve got the data…

      2. For the first variation of Spectre (the one based on an out-of-bounds array access), the fix does seem more difficult. Sure, you can restore the cache to its pre-speculative condition, but, as others (as well as the paper itself) have pointed out, the cache is just one possible means of getting data out of speculative execution.

    1. I may be a bit off, but my understanding is Spectre is more what Mike was talking about in the article. It focuses more on the prediction algorithm. If a pre-execution unit assumes a branch will take path A over B based on taking path A 90% of the time over the past quantum time unit, looking at any variance in cache hit/miss rates, odd line fetches, and other phenomenon can infer things about the data going through the processor. Especially if you train that 90% through other means like non-privileged code execution and the 10% remainder is the code you are interested in. It’s a bit more ethereal in concept. And it encompasses a larger category of attack rather than a specific exploit.

      I liken it more to Colin O’Flynn’s research on hardware based security through power analysis. You can always find some aspect of processor execution that shows some faint harmonic of the data passing through it. The challenge for both is seeing a signal through a sea of noise.

      Just my $.02

    2. Neither are “patchable”, because both are design flaws in modern CPUs… Meltdown has a workaround (don’t give it the (kernel) memory it isn’t allowed to access in the first place instead of just forbidding access to it). Spectre is targeting other programs, using methods that are normal for communicating with it, but tricking the CPU to speculatively execute something application-specific that will leak data via the same cache side-channel.

      1. If microcode can ensure flushing of branch predictor state on a context switch “spectre” can be patched.
        If the OS can do the same it can be patched.

        And I don’t see it as a design flaw just the consequence of the same flawed thinking that assumes Unix is the crown of secure design.

        1. Another way to patch “spectre”: flush caches when switching context. This is trivial:
          WBINVD added to the context switch code.

          It is a bit of a heavy hammer as it is specified to write back all modified cache lines (on any level) to main memory and then invalidate the cache content. It’s extremely slow. But it would make any type of cache-based side-channel impossible to create.

    3. Meltdown takes advantage of the fact that the access to protected data is speculatively allowed before consideration of the protection bits. Not only the read, but several instructions after it are speculatively performed before being rolled back after the protection violation is revealed. However, those instructions can alter cache state in a way that lets you determine the value that was read.

      Spectre takes advantage of branch prediction for an indirect branch and the speculative processing of instructions following the prediction. You have to find a “jump [address pointer]” branch in the victim’s code, as well as a couple instructions that will read a value (based on an address in a register), construct an address with it, and read another value. (Both of these are common things, easy to find.) You have to train the branch predictor to point at those instructions. You then load the register with the desired address and then perform the call to the victim’s code. The OS does a context switch and runs the victim’s code, but as soon as it sees the branch, it speculatively jumps to the “trained” target address and starts executing that code, which can perform perfectly legal lookups. The CPU soon sees that the branch was mis-predicted and rolls back those instructions, but the effects can still be seen in the cache.

      With either method, you had to set up the cache appropriately beforehand, and then see what changed after (by timing accesses to maybe-cached data, for instance) in order to read back the data which cannot be seen directly.

      1. There are actually a couple variations of Spectre, and I described the second one above.

        The first one involves finding victim code that does a bounds check followed by a double-indirect array access based on a value you provide. You first train the branch predictor with in-bounds data, then give it some data that will cause an out-of-bounds access. The out-of-bounds access is done speculatively, and since it was double-indirect, you can see the effect of the out-of-bounds data that was read in the data cache.

        The example given is: if (x < array1_size) y = array2[array1[x] * 256];
        You control 'x'. This code is run in the victim's address space, not your own.

  5. Being the dominant player (and taking user software security under black box control by their “management engine”), Intel is easy to blame.
    But the fact that AMD and ARM are exposed too (to some extent, still unclear) shows that the problem is not that of a company, but rather of a whole ecosystem, where the competition toward realizing Moore’s performance increase prophecy has kept security as an afterthought.
    I was wondering if RiscV open implementations are exposed to Spectre and Meltdown attacks (or variants or) ?

  6. ARM holdings, AMD or Intel, whoever has the fist chips to market with Spectre gone (and Meltdown for Intel) will make an absolute fortune, with people replacing hardware. It will be super extra bonus time for the all the executives at that company! They have all had six months to work on this in secret behind the embargo so the more secure chips should be here in six months to a year and a half at a guess.

    Maybe a RISC-V will rise to new heights.

      1. It’ll be pretty old to be resistant to these… It is thought that everything since Pentium, with the exception of Itanium and pre-2011? Atom, is vulnerable. You’re probably better off with some high-performance microcontrollers than that!

        1. No, he’s waiting for Intel to patch this in the silicon next re-spin (or AMD to gain quick favor), and then for Google / Facebook / et al to dump a million “slow” servers on the secondhand market – which we can snap up on the cheap at a negligible real-world performance loss.

          1. I’m fairly sure Google and Facebook’s servers only ever run their own code. So they’ve nothing to worry about, no sneaky processes in there trying to look up forbidden stuff.

            If anything, this is going to end up in malware for PCs. Of course, Windows PCs are ridiculously insecure anyway, and as ever the users are the easiest attack vector.

            Of course it’s a potentially serious problem and will need fixing. But I think for most people it’s a theoretical threat. Serious data users, banks etc, already have policies against using unapproved software. Un-serious users already leak like sieves.

            And this is a read-only leak AIUI, doesn’t allow execution of code at any higher level.

    1. I think the big problem is the too many eggs in one basket problem we’ve become too dependent too few architectures from too few vendors.
      Intel has a near monopoly on the server space and we’re paying for it now.
      But maybe this might cause RISC-V to rise in popularity and Power to return to the mainstream.

      1. Maybe, but with this problem crossing multiple architectures it’s more like “common problems have similar solutions”. How many different ways are there to do an OOSE while avoiding side-channel attacks?

      2. The problem have nothing to do with architecture and little to do with microarchitecture. It isn’t (theoretically) limited to processors with out of order execution (though in practice it is due to alternatives being less advanced).

        The problem is resource sharing.

        The solution is the same no matter if the processor is x86 or an obscure stack machine – limit sharing over protection domains.

    1. reworked, not revoked

      you can have spare cache lines dedicated to speculative execution, not visible from normal code
      you can have per process BTB (Branch Target Buffer), incidentally this would give you a noticeable IPC boost

  7. Out of curiosity, does anyone know exactly why the AMD and ARM micro-architectures are not vulnerable to Meltdown? The whitepaper ( https://meltdownattack.com/meltdown.pdf ) says “Meltdown exploits side effects of out-of-order execution on modern processors to read arbitrary kernel-memory locations,” but my understanding is that all modern micro-architectures use out-of-order and speculative execution in order to speed up program execution. Do AMD and ARM manage caching of data from out-of-order instructions in such a way that outside instructions are just unable to see data from instructions that have not been committed yet, or does the processor somehow check r/w permissions even on speculative instructions?

    1. My guess, and remember that it isn’t anything more than that: they probably check the permissions bits before passing the data to the instruction. The data has to be retrieved before doing anything with it to leak it, and I’m guessing they check the permissions bits as soon as they’ve looked up the virtual-to-physical address mapping, either before or after retrieving the data but almost definitely before processing the instructions using it. That makes the most sense to me…

  8. Realistically speaking though, just how much of a concern is this? All this does is leak data right? It doesn’t force your CPU to execute malicious code right? And for this exploit to be taken advantage of, you have to download and run a piece of malicious code on your system right? So, therein lies the answer. It seems to me the biggest vulnerability is users who are careless with what they download and run on their system. The key to security, as I see it, is teaching the users to be smarter in their computing activities and not just download and install any piece of software they run across on the net. The biggest weakness in just about any security system is always the human weakness.

    Where I see these types of attacks might be most exploited would be where someone is trying to crack DRM and such as that. In that case the user knows and is deliberately executing a piece of software that’s trying to find protected data, the keys and such used to protect the content. So, this could actually be a good thing for those who support “fair use” and would like to be able to circumvent protection schemes that limit the “fair use” of content they legally purchased.

      1. We can’t blame the browser-makers for a hardware flaw; nonetheless they have a duty to insure that the client-side code of a website cannot be used to attack the user’s system. I believe they are rolling out patches quickly.

      2. But things like javascript and ActiveX have always been a security issue. Allowing websites to execute code on your system is always a risky idea. This just makes the potential for danger a little bit higher. Security and convenience have always been a trade off. Users have just got into a bad habit of expecting convenience and taking security for granted.

        A good analogy, I think, is driving. You can argue that if your car has seat belts and air bags those devices should save your life in a crash and perhaps they will. However, is it smart and safe to assume that and driving 50mph over the speed limit, or to do so on bald and worn out tires that could blow out at any time, or to not stop for red lights, railroad crossings and stop signs? Then if you do operate your car in these kinds of reckless manners and wind up having a fatal crash, would it be fair for your friends and family to ignore all of this and claim your death was the result of the seat belts and air bags not doing enough to protect you from your own wanton recklessness?

        Security has to be a multilayered approach. That way if one layer should fail, you aren’t rendered completely vulnerable. In the real world, unexpected things do happen from time to time and sometimes things do fail. If that one thing is the only thing protecting you, you’re in trouble. To use the driving analogy again, hitting a spot of black ice on the highway might not be such a big deal if you’re driving 35Mph, but if you’re driving 85mph in a 35mph speed zone and hit that same patch of ice, the results might be a lot uglier. Perhaps we should see this as a wake up call, that we can’t be lazy or reckless and rely solely on the protection features of our CPU alone.

    1. Yeah, and how do you get on with virtual servers, shared hosting, etc., where the whole purpose of the machine is to run code for and from various people, without letting them interact in any way? If you’ve got a service hosted on the same physical server as someone else, they could find stuff of yours, for example your TLS and SSH private keys, client details (including passwords), and anything else of yours that happens to be in memory at the moment. That’s where it is the biggest problem.

      For desktop machines, you’re mostly right. But it still lets an unprivileged program access arbitrary memory, for instance your password manager, or defeating ASLR and such like. And the code could be coming from a browser code-execution exploit or something, either letting them do something without an OS-level privilege-escalation exploit or making it easier to do something else (like said privilege-escalation).

  9. I read Intel’s FAQ yesterday (https://www.intel.com/content/www/us/en/architecture-and-technology/facts-about-side-channel-analysis-and-intel-products.html), and man, what a trip. Linus is right on: the marketing bullshit is insanely thick. Check out this one:

    “Is this a bug in Intel hardware or processor design?”
    “No. This is not a bug or a flaw in Intel products. These new exploits leverage data about the proper operation of processing techniques common to modern computing platforms, potentially compromising security even though a system is operating exactly as it is designed to. Based on the analysis to date, many types of computing devices — with many different vendors’ processors and operating systems — are susceptible to these exploits.”

    WTF. How can they, with a straight face, claim that this is *not* a bug? And if it’s not a bug, why are they scrambling to patch it?

    1. They claim it’s not a “bug” because the processor is doing exactly what they designed it to do. But you could claim it is a bug in the design, as opposed to the operation of the processor.

      1. You could, and that makes a lie of “or processor design” above.

        Of course it’s a bug! Unless they did it on purpose. The modern world, where people lie like bastards and get away with it. Makes me feel old, remembering a time when people had shame.

          1. The effect of this is, put most briefly, a big security hole. I don’t think Intel put a security hole in on purpose. The actual workings of the hole don’t matter, the mechanism doesn’t matter. Any more than the whys and wherefores of the FDIV or F00F bugs.

            It was never thought to be OK. It simply didn’t occur to anyone until just now. An oversight, a mistake, a bug, just like a software bug, where a program does what it’s told, not what you wanted it to do.

            Yes it functions according to it’s design, it fucks up like it was designed to, but that’s not what they intended it to do. Any more than those Fords that kept exploding when you changed the radio station or whatever it was.

            People care about the security hole. Intel are being ridiculously pedantic, even for a company whose product is logic. Just admit it’s a bug. It obviously is one. Why split hairs, it’s not going to help, it just insults anyone who knows what’s going on. I suppose it might fool some unwitting normies who read about the story, “Phew! It’s not a bug, that’s a relief!”

          2. That’s debatable, but security* of whatever kind has always had a cost, be it efficiency, or economic, and has a reputation for being a hard-sell, except to those who value it more than anything else.

            *Minimization of side-channel vectors in any design for one.

          3. “Of course they did it on purpose. No one thought 20+ years ago that it could possibly be a problem, so why ever not?”

            And until just recently, within the last year or so no one knew it was a problem right? also to be fair, Intel never thought they would be continuing to extend a legacy ISA all these years and trying to continue pushing it’s performance envelope while maintaining legacy backward compatibility to the very beginning of it’s lineage. Remember Intel expected x86 to be dead by now and to replace it with IA64. Netburst was going to be their final iteration of x86 architecture and then they were going to move to IA64 and dispense with x86 compatibility. However, users decided they wanted to have the best of both worlds, continued performance gains and continued backward compatibility. AMD stepped in and extended an already extended instruction set with AMD64 and Microsoft released XP 64. So how do you keep pushing the performance envelope on an architecture like this without resorting to things like branch prediction and speculative execution? It’s interesting that IA64 isn’t vulnerable to Spectre. I know this “vulnerability” goes back to the P6 core, but if things had gone as Intel planned x86 would be a dead architecture by now and we’d have moved to a different architecture that isn’t affected.

      2. Agreed. However, I wouldn’t call it a “bug” in the design either, because it is not a mistake in the design – it was deliberate, and until now was thought to be perfectly OK. But it is definitely a design flaw.

  10. @Mike Szczys :

    “Branch prediction has been commonplace in consumer CPUs going back to 1995 when the Pentium Pro brought it to the x86 architecture.”

    Nope : Intel’s Pentium (P53) brought the first branch prediction to Intel’s x86 line, in about 1992/1993.

    I’m not sure if competitors had some form or another in their x86 chips earlier.

    The PPro brought OOOe.

  11. A little off-topic, but any suggestions of a good model/diagram of a modern personal computer in terms of the processors involved?

    Used to be a PC had a CPU, maybe a few peripheral processors (keyboard controller, disk controller, …). Various peripherals might have their own processors (e.g. hard disk). [I am thinking in like the IBM PC, or PET, etc.]

    Now a PC is more a network of processors, and you program an abstraction several layers above.
    [CPU has microcode processor, Intel management engine or similar, various peripheral device controllers have their own processors.]

    Various security problems (such as these, or some of the attacks on USB controllers, etc.) arise because you can sometimes get around the abstractions.
    (e.g. the cache machine doesn’t roll-back speculative execution correctly)

    So where is a good map of what the players even are in a modern PC?
    (Not the virtual players, but the real players that act to make that virtual device.)

    Or, where is a good place to ask that question?

    Thanks

  12. We can turn on or off, various “turbos,” to gain speed or battery longevity.
    Why can we not turn off branch prediction, or security?
    ~We can get there faster, ma, if’n I just get up to 85 (mph / 130kph) or so… We will be okayyyy. What are the odds that them revenooers will notice a nobody like me?”

    1. we can – this is what Microsoft patch is doing among other things, Intel gave them access to long disabled MSR, probably this is what Intel will enable in the “totally 100% fixed, move along idiots” Microcode Update

      https://twitter.com/aionescu/status/948753795105697793 :

      “Alex Ionescu‏
      So looks like the Windows patches add “Speculation Control” if your CPU supports it, based on some new (old?) MSRs from Intel to control the Branch Target Buffer (BTB).
      7:11 PM – 3 Jan 2018″

  13. ” Intel is too big to fail ” — Hey, I’m a big fan of Intel products, but I do think the US should have been enforcing anti-trust laws so there could have been more diversity and competition– here’s hoping for RISCV to gain some footing.

  14. I read somewhere else that the malicious code can read up to 500 KB/s of protected memory.
    If each bit read triggers an exception, that means 4 millions (!) of exceptions per seconds.
    It would have been better just to kill any task that triggers too many exceptions per second.
    An update of the OS supervisor or anti-virus should be enough.
    No performance impact involved here.
    just my two cents.

      1. From Alan Hightower’s comment :
        if (bit 0 of P) print A; else print B;
        The access to P generates an exception, print A or print B is the speculative execution.
        If A is in cache, bit 0 is set, if B is in cache bit 0 is cleared.
        Am I missing something ?

          1. Holly sh**, I did not think this was possible.
            Intel CPU might have a very deep pipeline (last time I checked, it was 20+ stages) to go that far in the speculative execution.
            By design, with only 5 – 6 stages, RISC CPUs are less prone to this kind of issue.

    1. so you started an anonymous blog with only one article, filled with spelling errors, pushing what seems like a conspiracy theory, and in the about section you claim to be an infamous security researcher without any prior work, and only a reference to a handle with zero results in a google search. fake news anyone ?

    2. Sure, the Meltdown & Spectre disclosure is a hype. Several media click-bait on the average user’s anxiety and powerlessness. The information leak may not be that sever, because the flaw is quite hard to actually exploit. I haven’t heard of concrete malware exploiting the flaw. Yet.

      However, the hype is (partly) deserved, because the vulnerability is significant in several aspects.

      1. The fact that a security flaw is discovered in the hardware, not the software, is quite rare. And it’s much harder to patch hardware than it is to patch software. Actually the best way is to rely on software to work around the vulnerability, so the vulnerability is here to stay in our machines for decades. And the OS patch will cost us a few % of usable performance (up to 30% according to some sources, but I don’t count on that much for the average user).

      2. The bug is present by nearly all product lines from all major vendors. It doesn’t affect only one product line by one vendor. Maybe 90% of all PCs, smartphones and other devices in use today are affected. Certainly a pretty significant fraction.

      3. It surely is not the first example of hardware-induced covert channels for information leaks (see e.g. https://www.theregister.co.uk/2017/03/31/researchers_steal_data_from_shared_cache_of_two_cloud_vms/) ; but with full read access at 500KB/s over the whole memory range (speaking of Meltdown) this is not a toy example.

      I am glad the vulnerability was discovered and patched beforce it was exploited in the wild, like the infamous OpenSSL (by the way, Heartbleed did deserve a name and logo, too).

      I hope the media hype raises awareness (there is no absolute security, whatever the vendor says), and reminds people to PATCH THEIR SOFTWARE!

  15. Most processors unaffected (e.g. Risc-V Rocket) just don’t use speculative execution. I am surprised that all three Intel, AMD and ARM (even low range procs like ARMv7-R) are affected.

    It should be possible to implement speculative execution without information leak. I mean, when a memory access instruction violates access rights, it should either not be executed at all (traditional in-order flow) or speculatively attempted then *fully* reverted once the exception is asserted, i.e. without leaving a trace in the cache or other covert architectural state. In these days of increased cyber vigilance, I was expecting a deep information flow analysis to be performed before tapeout.

    But the fact that all major players didn’t secure their implementation reveals either that they all share common design patterns (reference designs?), or that they share the same state of mind, in which security is not a primary focus. Either case is bad news for software security. I bet they know how much profit they can yield from an additional 1% in performance benchmarks, but can’t estimate how much they could lose due to fragile security implementation.

    Nevertheless, security flaws found in hardware remain relatively infrequent, and most software flaws can’t remain dormant for a whole 15 years. Or I should say “undisclosed”.

    1. Somewhat ironically hardware fixes for those flaws in form of:
      -additional small L1 cache exclusively for speculative execution
      -separate per process BTBs

      could result in better performance

  16. Been reading every source I could find on all 3 attack vectors (1 for Meltdown, 2 for Spectre) since news dropped last week. I am not a programmer, so it was difficult at first, but I follow the reasoning after reading about 10 separate explanations now.

    I realize this is indeed a hardware flaw, something at the chip level. So it’s easy to assume that a future chip from anyone, including the bungling Intel, could conceivably remove this issue.

    My personal, very real problem- I literally just replaced my 13 yr old Inspiron 9300 with a brand new HP Envy 17 with an 8th gen i7 intel processor AS news broke. As in, I ordered it right after xmas, and as it was en-route, not to me- news broke on these flaws. My timing is just as bad as my first smartphone- which was the first batch of Note 7s. 3 days later- the battery exploding problem came out on those. My luck is crap.

    So for me, what to do? Return this $1500 item, and wait for another 2 years, until this is maybe is fixed at chip level?

    For over 16-20 articles I’ve read in detail on Meltdown and Spectre, no one has been able to say in definitive terms if all 3 attack vectors can actually be truly fixed- meaning I have a 30 day countdown to perhaps being stuck with a permanently compromised brand new pc right off the bat. I see everything from microcode updates to OS level updates, and some OS ones are bricking PCs with Microsoft Update right now. This *will* use Linux if I keep it (just Ubuntu), but for now, it’s just Win 10. I am afraid to even turn the thing on in case M$ forces something onto it and bricks it against my will…

    What the hell do I do now? Can someone tell me if it is even conceivably safe in some instance to keep this? My old laptop would be compromised too, but I already long paid for it… I need an answer before my return window is up, and I have no idea where to get a straight answer.

      1. That’s what I’ve come to conclusion of too- as worst case scenario. Only thing is- almost no one makes a decent Ryzen laptop yet! I could hedge my bets and wait till after CES 2018… but only thing that is decent out there right now is a 15″ touchscreen HP Envy 360 that just came out. I’d lose the 4k screen I waited years for in my laptop (wanted for 3D modeling and a 4k macro camera equipped microscope I have), and 2″ of real estate. And to keep a 1 TB NVMe SSD, It would almost be more money.

        Spectre affects AMD Ryzen, but while the patches for AMD don’t seem to give the ridiculous up to 30% performance drop, I still haven’t ascertained whether or not the microcode “patches” completely secure the chips. If it doesn’t, then I’m not sure anything is worth buying now because nothing would ever be fundamentally securable at this point. I’d like to keep the Chinese and the NSA out of my damn pc, thanks much.

        I know I’m not the only one screwed now, but I just bought this damn thing after months of research. It’s a smack in the face. Any specific laptops you can recommend with a 4K screen and Ryzen?

    1. First, relax. You are not threatened specifically, i mean, not more than 99% of Internet users. And there is no sign of concrete attack based on this flaw yet. Anyway this flaw allons only information leaks, no code injection or data alteration (i.e. user impact is less severe than e.g. cryptolockers a-la Wannacry).
      So do what all of us do: wait for workarounds from your favorite OS vendor, keep installing recommended updates. You should probably perceive no difference once patched.
      Inbetween keep being especially vigilant on untrusted third-party code (e.g. executable attached files, scripts from suspicious websites, etc.)

      1. Linux has already addressed this issue with patches, so I’m not too worried about people trying to fix it- what I can’t figure out is IF the issue can be fundamentally mitigated by software. I am thinking any fixes may ultimately leave one of the 2 vectors of Spectre effectively unfixable, and thus, there would be a permanent exploit at chip level on my brand new laptop. That’s what honestly worries me. Why should I keep something permanently compromised, and crippled to boot as a result of the other “fixes”? It’s ridiculous

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s