Close-up of a CPU

Register Renaming: The Art Of Parallel Processing

In the quest for faster computing, modern CPUs have turned to innovative techniques to optimize instruction execution. One such technique, register renaming, is a crucial component that helps us achieve the impressive multi-tasking abilities of modern processors. If you’re keen on hacking or tinkering with how CPUs manage tasks, this is one concept you’ll want to understand. Here’s a breakdown of how it works and you can watch the video, below.

In a nutshell, register renaming allows CPUs to bypass the restrictions imposed by a limited number of registers. Consider a scenario where two operations need to access the same register at once: without renaming, the CPU would be stuck, having to wait for one task to complete before starting another. Enter the renaming trick—registers are reassigned on the fly, so different tasks can use the same logical register but physically reside in different slots. This drastically reduces idle time and boosts parallel tasking. Of course, you also have to ensure that the register you are using has the correct contents at the time you are using it, but there are many ways to solve that problem. The basic technique dates back to some IBM System/360 computers and other high-performance mainframes.

Register renaming isn’t the only way to solve this problem. There’s a lot that goes into a superscalar CPU.

Continue reading “Register Renaming: The Art Of Parallel Processing”

What Happens If You Speedrun Making A CPU?

Usually, designing a CPU is a lengthy process, especially so if you’re making a new ISA too. This is something that can take months or even years before you first get code to run. But what if it wasn’t? What if one were to try to make a CPU as fast as humanly possible? That’s what I asked myself a couple weeks ago.

Left-to-right: Green, orange and red rectangle with 1:2 aspect ratio. Each rectangle further right has 4x the area of its neighbor on the left.
Relative ROM size. Left: Stovepipe, center: [Ben Eater]’s, right: GR8CPU Rev. 2
Enter the “Stovepipe” CPU (I don’t have an explanation for that name other than that I “needed” one). Stovepipe’s hardware was made in under 4 hours, excluding a couple small bugfixes. I started by designing the ISA, which is the simplest ISA I ever made. Instead of continuously adding things to make it more useful, I removed things that weren’t strictly necessary until I was satisfied. Eventually, all that was left were 8 major opcodes and a mere 512 bits to represent it all. That is far less than GR8CPU (8192 bit), my previous in this class of CPU, and still less than [Ben Eater]’s breadboard CPU (2048 bit), which is actually less flexible than Stovepipe. All that while taking orders of magnitude less time to create than either larger CPU. How does that compare to other CPUs? And: How is that possible?
Continue reading “What Happens If You Speedrun Making A CPU?”

Why X86 Needs To Die

As I’m sure many of you know, x86 architecture has been around for quite some time. It has its roots in Intel’s early 8086 processor, the first in the family. Indeed, even the original 8086 inherits a small amount of architectural structure from Intel’s 8-bit predecessors, dating all the way back to the 8008. But the 8086 evolved into the 186, 286, 386, 486, and then they got names: Pentium would have been the 586.

Along the way, new instructions were added, but the core of the x86 instruction set was retained. And a lot of effort was spent making the same instructions faster and faster. This has become so extreme that, even though the 8086 and modern Xeon processors can both run a common subset of code, the two CPUs architecturally look about as far apart as they possibly could.

So here we are today, with even the highest-end x86 CPUs still supporting the archaic 8086 real mode, where the CPU can address memory directly, without any redirection. Having this level of backwards compatibility can cause problems, especially with respect to multitasking and memory protection, but it was a feature of previous chips, so it’s a feature of current x86 designs. And there’s more!

I think it’s time to put a lot of the legacy of the 8086 to rest, and let the modern processors run free. Continue reading “Why X86 Needs To Die”

Minecraft In Minecraft On The CHUNGUS II

Minecraft is a simple video game. Well, it’s a simple video game that also has within it the ability to create all of the logic components that you’d need to build a computer. And building CPUs in Minecraft is by now a long-standing tradition.

Enter CHUNGUS II. The Computational Humongous Unconventional Number and Graphics Unit by [Sammyuri] is the biggest and baddest Minecraft computer that we’ve ever seen. So big, in fact, that it was finally reasonable to think about porting a stripped-down version of Minecraft to the computer itself. Yes, that’s right, Minecraft running in Minecraft. (Video embedded below.) Writing the compiler and programming the game brought two more hackers to the party, [Uwerta] and [StackDoubleFlow], and quite honestly, we’re amazed that a team as small as three people pulled this off.

Anyway, once you’ve picked your jaw up off the floor, also check out [Sammyuri]’s video on just the CHUNGUS II computer itself. (Also embedded below.) Seeing the architecture is interesting, even if you don’t speak Redstone as fluently as our heroes here. We love that the assembler creates a block of ROM – out of Minecraft blocks – that you can then cut/paste into the game’s reality.

For a “simple” game about breaking blocks and punching trees, Minecraft has inspired hackers to make the game better both inside and outside of the real world. For instance, for the latest in performant open-source Minecraft servers, check out Folia. Maybe, one day, they’ll build CHUNGUS II in the real world. It could happen.

Thanks [dbcdr] for the tip!

Continue reading “Minecraft In Minecraft On The CHUNGUS II”

History Of The SPARC CPU Architecture

[RetroBytes] nicely presents the curious history of the SPARC processor architecture. SPARC, short for Scalable Processor Architecture, defined some of the most commercially successful RISC processors during the 1980s and 1990s. SPARC was initially developed by Sun Microsystems, which most of us associate the SPARC but while most computer architectures are controlled by a single company, SPARC was championed by dozens of players.  The history of SPARC is not simply the history of Sun.

A Reduced Instruction Set Computer (RISC) design is based on an Instruction Set Architecture (ISA) that runs a limited number of simpler instructions than a Complex Instruction Set Computer (CISC) based on an ISA that comprises more, and more complex, instructions. With RISC leveraging simpler instructions, it generally requires a longer sequence of those simple instructions to complete the same task as fewer complex instructions in a CISC computer. The trade-off being the simple (more efficient) RISC instructions are usually run faster (at a higher clock rate) and in a highly pipelined fashion. Our overview of the modern ISA battles presents how the days of CISC are essentially over. Continue reading “History Of The SPARC CPU Architecture”

Modern CPUs Are Smarter Than You Might Realize

When it comes to programming, most of us write code at a level of abstraction that could be for a computer from the 1960s. Input comes in, you process it, and you produce output. Sure, a call to strcpy might work better on a modern CPU than on an older one, but your basic algorithms are the same. But what if there were ways to define your programs that would work better on modern hardware? That’s what a pre-print book from [Sergey Slotin] answers.

As a simple example, consider the effects of branching on pipelining. Nearly all modern computers pipeline. That is, one instruction is fetching data while an older instruction is computing something, while an even older instruction is storing its results. The problem arises when you already have an instruction partially executed when you realize that an earlier instruction caused a branch to another part of your code. Now the pipeline has to be backed out and performance suffers while the pipeline refills. Anything that had an effect has to reverse and everything else needs to be discarded.

That’s bad for performance. Because of this, some CPUs try to predict if a branch is likely to occur or not and then speculatively fill the pipeline for the predicted case. However, you can structure your code, for example, so that it is more obvious how branching will occur or even, for some compilers, explicitly inform the compiler if the branch is likely or not.

As you might expect, techniques like this depend on your CPU and you’ll need to benchmark to show what’s really going on. The text is full of graphs of execution times and an analysis of the generated assembly code for x86 to explain the results. Even something you think is a pretty good algorithm — like binary search, for example, suffers on modern architectures and you can improve its performance with some tricks. Actually, it is interesting that the tricks work on GCC, but don’t make a difference on Clang. Again, you have to measure these things.

Probably 90% of us will never need to use any of the kind of optimization you’ll find in this book. But it is a marvelous book if you enjoy solving puzzles and analyzing complex details. Of course, if you need to squeeze those extra microseconds out of a loop or you are writing a library where performance is important, this might be just the book you are looking for. Although it doesn’t cover many different CPUs, the ideas and techniques will apply to many modern CPU architectures. You’ll just have to do the work to figure out how if you use a different CPU.

We’ve looked at pieces of this sort of thing before. Pipelining, for example. Sometimes, though, optimizing your algorithm isn’t as effective as just changing it for a better one.

Where Are All The Cheap X86 Single Board PCs?

If we were to think of a retrocomputer, the chances are we might have something from the classic 8-bit days or maybe a game console spring to mind. It’s almost a shock to see mundane desktop PCs of the DOS and Pentium era join them, but those machines now form an important way to play DOS and Windows 95 games which are unsuited to more modern operating systems. For those who wish to play the games on appropriate hardware without a grubby beige mini-tower and a huge CRT monitor, there’s even the option to buy one of these machines new: in the form of a much more svelte Pentium-based PC104 industrial PC.

Continue reading “Where Are All The Cheap X86 Single Board PCs?”