Sushi Roll Helps Inspect Your CPU Internals

[Gamozolabs’] post about Sushi Roll — a research kernel for monitoring Intel CPU internals — is pretty long. While we were disappointed at the end that the kernel’s source is not exactly available due to “sensitive features”, we were so impressed with the description of the modern x86 architecture and some of the work done with Sushi Roll, that we just had to post it. If the post gets you wanting to actually try some of this, you can check out another [Gamozolabs] creation, Orange Slice.

While you probably know that a modern Intel CPU bears little resemblance to the old 8086 processor it emulates, it is surprising, sometimes, to realize just how far it has gone. The very first thing the CPU does is to break your instruction up into microoperations. The execution engine uses some sophisticated techniques for register renaming and scheduling that allow you to run instructions out of order and to run more than one instruction per clock cycle.

The purpose of Sushi Roll is to reduce uncertainty in timing so that measurements can reveal short microoperation durations. The kernel does not use locking, nor does it use interrupts, timers, threads, or processes. This allows code to run without a lot of extraneous things affecting timing like cache evictions or interrupts. Combined with the Intel performance monitoring registers allows you to make some very specific measurements.

Like we said, we were sorry you can’t get the kernel source to do your own measurements. However, the work is impressive and the background information is still a good read, too.

A lot of this internal trivia seemed unimportant until it became the subject of security exploits. We just can’t get enough of CPU internals.

8 thoughts on “Sushi Roll Helps Inspect Your CPU Internals

    1. [Megol] I think you’ve missed his point. [Al] did not say that modern Intel CPUs emulate x86; he said specifically that they emulate the 8086 processor.
      I agree. Most obvious is that Intel’s modern x86 cores execute micro-ops, not 8086 opcodes. That abstraction is what allows most all of the advanced features of a modern x86 CPU, including out-of-order exec and reg-renaming to name just a couple.

      Also, ‘then’ is used for progression, not comparison…

      1. Funny enough, saying that modern CPUs emulate x86 is actually not far off the money at all.

        From 1988 to 1995, AMD produced RISC CPUs based on the 29K architecture. This architecture is a RISC architecture that AMD developed, which was based on Berkeley RISC and was a cousin of the Sun SPARC. For a little while they were the most popular RISC chips available and made their way into a lot of laser printers.

        AMD proceeded to produce an x86-compatible processor that was developed completely in-house, and thus no longer relied on a microcode licence from their direct competitor, Intel. To do so they made some simple improvements to the floating point circuitry, adding some nanocode to implement x86 operations missing from 29K. They also added instruction translation circuitry to the front end of the CPU to translate x86 instructions into 29K on the fly.

        This processor was sold as the AMD K5.

        While it was architecturally a dead-end (the popular K6 was based on the Nx686 developed by NexGen, acquired by AMD) many modern CPUs are internally very RISC-y. they merely translate x86 instructions (from a growing array of extensions to the x86 instruction set) into micro-operations that would otherwise be the instruction set of a RISC processor.

    1. Just revel in the fact that you don’t have IME, but I’m sure there’s something somewhat equivalent I guess. I’m actually not too versed on AMD CPUs, but I haven’t heard of as many side-channel vulns. Whether that’s just an effect of lower adoption I also don’t know.

  1. I wonder if there’s some way to use this to automate detection of a thread trying to do a timing-based attack on another thread? Like there should be some signature behavior associated with that. Maybe.

    Would be a good layer of protection to have some way of scoping if a thread is cupping its ear to the wall, so to speak, trying to listen in on a neighboring operation—and shutting that shit down. Not as good as actual secure computing of course, but it looks like we’re going to have a hard time giving this architecture up in the near future. I mean I have my hopes but trying to be realistic.

  2. Anybody know of a similarly detailed explanation of the architecture of current PCs? Something that would show where ALL the programmable execution units (of various sorts) are. (USB controllers, management engines, fan controllers, battery management units, what have you.) Would be interesting to see where all the attack surfaces even are in a modern computer.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.