Modern CPUs Are Smarter Than You Might Realize

When it comes to programming, most of us write code at a level of abstraction that could be for a computer from the 1960s. Input comes in, you process it, and you produce output. Sure, a call to strcpy might work better on a modern CPU than on an older one, but your basic algorithms are the same. But what if there were ways to define your programs that would work better on modern hardware? That’s what a pre-print book from [Sergey Slotin] answers.

As a simple example, consider the effects of branching on pipelining. Nearly all modern computers pipeline. That is, one instruction is fetching data while an older instruction is computing something, while an even older instruction is storing its results. The problem arises when you already have an instruction partially executed when you realize that an earlier instruction caused a branch to another part of your code. Now the pipeline has to be backed out and performance suffers while the pipeline refills. Anything that had an effect has to reverse and everything else needs to be discarded.

That’s bad for performance. Because of this, some CPUs try to predict if a branch is likely to occur or not and then speculatively fill the pipeline for the predicted case. However, you can structure your code, for example, so that it is more obvious how branching will occur or even, for some compilers, explicitly inform the compiler if the branch is likely or not.

As you might expect, techniques like this depend on your CPU and you’ll need to benchmark to show what’s really going on. The text is full of graphs of execution times and an analysis of the generated assembly code for x86 to explain the results. Even something you think is a pretty good algorithm — like binary search, for example, suffers on modern architectures and you can improve its performance with some tricks. Actually, it is interesting that the tricks work on GCC, but don’t make a difference on Clang. Again, you have to measure these things.

Probably 90% of us will never need to use any of the kind of optimization you’ll find in this book. But it is a marvelous book if you enjoy solving puzzles and analyzing complex details. Of course, if you need to squeeze those extra microseconds out of a loop or you are writing a library where performance is important, this might be just the book you are looking for. Although it doesn’t cover many different CPUs, the ideas and techniques will apply to many modern CPU architectures. You’ll just have to do the work to figure out how if you use a different CPU.

We’ve looked at pieces of this sort of thing before. Pipelining, for example. Sometimes, though, optimizing your algorithm isn’t as effective as just changing it for a better one.

Where Are All The Cheap X86 Single Board PCs?

If we were to think of a retrocomputer, the chances are we might have something from the classic 8-bit days or maybe a game console spring to mind. It’s almost a shock to see mundane desktop PCs of the DOS and Pentium era join them, but those machines now form an important way to play DOS and Windows 95 games which are unsuited to more modern operating systems. For those who wish to play the games on appropriate hardware without a grubby beige mini-tower and a huge CRT monitor, there’s even the option to buy one of these machines new: in the form of a much more svelte Pentium-based PC104 industrial PC.

Continue reading “Where Are All The Cheap X86 Single Board PCs?”

64-bit And A Display: Minecraft Computers 10 Years Later

Some people build their own computer to play games, while others play games to build their own computer. Minecraft is the prime candidate for the latter, and while you can certainly arrange the blocks to make them look like a computer, we’re of course talking about replicating the actual functionality of a CPU or parts thereof, and/or external components within the game. Many such creations have spawned in the decade since the first Minecraft-built ALU surfaced, and [Rockfarmor] built a 64-bit specimen to add to that list — and made a video to showcase it.

Instead of emulating a common architecture, [Rockfarmor] went for a more home-made approach, and re-used the architecture from an old school assignment (in Swedish) as basis. The result is a simple yet fully functional 64-bit CPU with 32 registers, 32kB main memory and a separate 16kB stack. The instruction set mostly contains ALU and branching operations, but also a few special opcodes to control an additional 64×64 pixel blocks, 64-color display — including drawing circles, lines, and color fills.

More details on the architecture can be found in its documentation and in an older video (with subpar audio circumstances unfortunately). An additional time-lapse video of the initial build is also available, and you will find all of them after break. To simplify development, [Rockfarmor] also wrote a desktop app to program the computer in assembly and upload it straight to the Minecraft version.

As with all computers built in Minecraft, the driving force is redstone, which essentially allows circuit design within the game, and [Rockfarmor]’s is no difference here. He also uses command blocks to avoid the laboriously and slow “wiring” required otherwise, turning it more into a “wireless redstone” circuit.

No doubt, purists will consider this cheating, but another angle would be to see it as Moore’s Law applied to Minecraft computers, considering the computer’s size and speed compared to the first Minecraft ALU. Or maybe as the equivalent of microcode in real-world CPUs? Or then, maybe we should just accept and embrace different options and preferences.

Continue reading “64-bit And A Display: Minecraft Computers 10 Years Later”

Sushi Roll Helps Inspect Your CPU Internals

[Gamozolabs’] post about Sushi Roll — a research kernel for monitoring Intel CPU internals — is pretty long. While we were disappointed at the end that the kernel’s source is not exactly available due to “sensitive features”, we were so impressed with the description of the modern x86 architecture and some of the work done with Sushi Roll, that we just had to post it. If the post gets you wanting to actually try some of this, you can check out another [Gamozolabs] creation, Orange Slice.

While you probably know that a modern Intel CPU bears little resemblance to the old 8086 processor it emulates, it is surprising, sometimes, to realize just how far it has gone. The very first thing the CPU does is to break your instruction up into microoperations. The execution engine uses some sophisticated techniques for register renaming and scheduling that allow you to run instructions out of order and to run more than one instruction per clock cycle.

Continue reading “Sushi Roll Helps Inspect Your CPU Internals”

Unlocking God Mode On X86 Processors

We missed this Blackhat talk back in August, but it’s so good we’re glad to find out about it now. [Christopher Domas] details his obsession with hidden processor instructions, and how he discovered an intentional backdoor in certain x86 processors. These processors have a secondary RISC core, and an undocumented procedure to run code on that core, bypassing the normal user/kernel separation mechanisms.

The result is that these specific processors have an intentional mechanism that allows any unprivileged user to jump directly to root level access. The most fascinating part of the talk is the methodical approach [Domas] took to discover the details of this undocumented feature. Once he had an idea of what he was looking for, he automated the process of checking every possible x86 instruction, looking for the one instruction that allowed running code on that extra core. The whole talk is entertaining and instructional, check it out after the break!

There’s a ton of research poking at the instruction level of complication processors. One of our favorites, also by [Domas], is sandsifter which searches for undocumented instructions.

Continue reading “Unlocking God Mode On X86 Processors”

Pipelining Digital Logic In FPGAs

When you first learn about digital logic, it probably seems like it is easy. You learn about AND and OR gates and figure that’s not very hard. However, going from a few basic gates to something like a CPU or another complex system is a whole different story. It is like going from “Hello World!” to writing an operating system. There’s a lot to understand before you can make that leap. In this set of articles, I want to talk about a way to organize more complex FPGA designs like CPUs using a technique called pipelining.

These days a complex digital logic system is likely to be on an FPGA. And part of the reason we can get fooled into thinking digital is simple is because of the modern FPGA tools. They hide a lot of complexity from you, which is great until they can’t do what you want and then you are stuck. A good example of that is where you are trying to hit a certain clock frequency. If you aren’t careful, you’ll get a complaint from the tool that you can’t meet timing constraints.

Continue reading “Pipelining Digital Logic In FPGAs”

8-Bit Breadboard Computer Is Up To 8 Hours

[Ben Eater] posted some videos of an 8-bit computer with no CPU chip that he built completely on a breadboard a few years ago. After being asked for schematics, he finally admitted that he didn’t have any. So, instead, he decided to rebuild it and keep a video log of each step in the process. You can see his kickoff video, below, but you can also find 30 more recent videos covering topics from the ALU design and troubleshooting to the decimal LED display. He even uses an Arduino to program a EEPROM that he uses to replace a lot of logic.

You probably want to wait until you have some free time as there are around eight hours of videos so far. The videos start off with a simple 555 timer and work up from there. Each piece gets a test separate from the whole, so with luck you won’t have an impossible job trying to troubleshoot the whole thing at the end.

Continue reading “8-Bit Breadboard Computer Is Up To 8 Hours”