Close-up of a CPU

Register Renaming: The Art Of Parallel Processing

In the quest for faster computing, modern CPUs have turned to innovative techniques to optimize instruction execution. One such technique, register renaming, is a crucial component that helps us achieve the impressive multi-tasking abilities of modern processors. If you’re keen on hacking or tinkering with how CPUs manage tasks, this is one concept you’ll want to understand. Here’s a breakdown of how it works and you can watch the video, below.

In a nutshell, register renaming allows CPUs to bypass the restrictions imposed by a limited number of registers. Consider a scenario where two operations need to access the same register at once: without renaming, the CPU would be stuck, having to wait for one task to complete before starting another. Enter the renaming trick—registers are reassigned on the fly, so different tasks can use the same logical register but physically reside in different slots. This drastically reduces idle time and boosts parallel tasking. Of course, you also have to ensure that the register you are using has the correct contents at the time you are using it, but there are many ways to solve that problem. The basic technique dates back to some IBM System/360 computers and other high-performance mainframes.

Register renaming isn’t the only way to solve this problem. There’s a lot that goes into a superscalar CPU.

Continue reading “Register Renaming: The Art Of Parallel Processing”

Tridora: A Full-Custom CPU Designed For Pascal

[Sebastian Lederer] has created Tridora: an unusual stack-based CPU core intended for FPGA deployment, co-developed with its own Pascal compiler. The 32-bit word machine is unusual in that it has not one but three stacks, 16-bit instruction words, and a limited ISA, more like those of the 8-bit world. No multiply or divide instructions will be found in this CPU.

The design consists of about 500 lines of Verilog targeting the Digilent Arty-A7 FPGA board, which is based around the Xilinx Artix-7 FPGA line. [Sebastian] plans to support the Nexys A7 board, which boasts a larger FPGA array but has less RAM onboard. The CPU clocks in at 83 MHz with four clock cycles per instruction, so over 20 MIPS, which is not so shabby for a homebrew design. Wrapped around that core are a few simple peripherals, such as the all-important UART, an SD card controller and a VGA display driver. On the software side, the Pascal implementation is created from scratch with quite a few restrictions, but it can compile itself, so that’s a milestone achieved. [Sebastian] also says there is a rudimentary operating system, but at the moment, it’s a little more than a loader that’s bundled with the program image.

The Tridora Gitlab project hosts the Verilog source, an emulator (written in Golang, not Pascal) and a suite of example applications. We see quite a few custom CPUs, often using older or less popular programming languages. Here’s an FPGA-based Forth machine to get you started. Implementing programming languages from scratch is also a surprisingly common hack. Check out this from-scratch compiler for the Pretty Laughable Programming language.

A Robust Guide To The Xbox 360 Glitch Hack

The Xbox 360 was a difficult console to jailbreak. Microsoft didn’t want anyone running unsigned code, and darn if they didn’t make it difficult to do so. However, some nifty out of the box thinking and tricky techniques cracked it open like a coconut with a crack in it. For the low down, [15432] has a great in-depth article on how it was achieved. The article is in Russian, so you’ll want to be armed with Google Translate for this one.

The article gets right into the juice of how glitch attacks work—in general, and with regards to the Xbox 360. In the specific case of the console, it was all down to the processor’s RESET line. Flicker it quickly enough, and the processor doesn’t actually reset, but nonetheless its behavior changes. If you time the glitch right, you can get the processor to continue running through the bootloader’s instructions even if a hash check instruction failed. Of course, timing it right was hard, so it helps to temporarily slow down the processor.

From there, the article continues to explore the many and varied ways this hack played out against Microsoft’s copy protection across multiple models and revisions of the Xbox 360. The bit with the BGA ball connections is particularly inspired. [15432] also goes even deeper into a look at how the battle around the Xb0x 360’s DVD-ROM drive got heated.

We seldom talk about the Xbox 360 these days, but they used to grace these pages on the regular. Video after the break.

Continue reading “A Robust Guide To The Xbox 360 Glitch Hack”

Screen caps of upgraded BBC Micro, and OS 9 code

BBC Micro: A Retro Revamp With The 68008 Upgrade

The BBC Microcomputer, launched in the early 1980s, holds a special place in computing history. Designed for educational purposes, it introduced a generation to programming and technology. With its robust architecture and community-driven modifications, the BBC Micro remains a beloved project for retro computing enthusiasts. [Neil] from Retro4U has been delving into this classic machine, showcasing the fascinating process of repairing and upgrading his BBC Micro with a 68008 CPU upgrade.

Last week, [Neil] shared his progress, unveiling advancements in his repairs and upgrades. After tackling a troublesome beep issue, he successfully managed to get the BBC running with 32 KB of functional memory, allowing him to boot into BASIC. But he wasn’t stopping there. With ambitions set on installing the 68008 CPU, [Neil]’s journey continued.

The 68008 board offers significant enhancements, including multitasking capabilities with OS-9 and its own hard drive and floppy disk controller. However, [Neil] quickly encountered challenges; the board’s condition revealed the usual broken capacitors and a few other faulty components. After addressing these issues, [Neil] turned his attention to programming the necessary ROM for OS-9.

Looking to get your hands dirty? [Neil] has shared a PDF of the upgrade circuit diagram. You can also join the discussion with fellow enthusiasts on his Discord channel, linked in the video description.

Continue reading “BBC Micro: A Retro Revamp With The 68008 Upgrade”

Mainframe Chip Has 360MB Of On-Chip Cache

It is hard to imagine what a mainframe or supercomputer can do when we all have what amounts to supercomputers on our desks. But if you look at something like IBM’s mainframe Telum chip, you’ll get some ideas. The Telum II has “only” eight cores, but they run at 5.5 GHz. Unimpressed? It also has 360 MB of on-chip cache and I/O and AI accelerators. A mainframe might use 32 of these chips, by the way.

[Clamchowder] explains in the post how the cache has a unique architecture. There are actually ten 36 MB L2 caches on the chip. There are eight caches, one for each core, plus one for the I/O accelerator, and another one that is uncommitted.

A typical CPU will have a shared L3 cache, but with so much L2 cache, IBM went a different direction. As [Clamchowder] explains, the chip reuses the L2 capacity to form a virtual L3 cache. Each cache has a saturation metric and when one cache gets full, some of its data goes to a less saturated cache block.

Remember the uncommitted cache block? It always has the lowest saturation metric so, typically, unless the same data happens to be in another cache, it gets moved to the spare block.

There’s more to it than that — read the original post for more details. You’ll even read speculation about how IBM managed a virtual L4 cache, across CPUs.

Cache has been a security bane lately on desktop CPUs. But done right, it is good for performance.

PCB data sheet of a custom 4-bit microcontroller

Building A Microcontroller From Scratch: The B4 Thinker Project

[Marius Taciuc’s] latest endeavor, the B4 Thinker, offers a captivating glimpse into microcontroller architecture through a modular approach. This proof-of-concept project is meticulously documented, with a detailed, step-by-step guide to each component and its function.

Launched in 2014, the B4 Thinker project began with the ambitious goal of building a microcontroller from scratch. The resulting design features a modular CPU architecture, including a base motherboard that can be expanded with various functional modules, such as an 8-LED port card. This setup enables practical experimentation, such as writing simple assembly programs to control dynamic light patterns. Each instruction within this system requires four clock pulses to execute, and the modular design allows for ongoing development and troubleshooting.

Continue reading “Building A Microcontroller From Scratch: The B4 Thinker Project”

Custom Microcode Compiler, Made In Google Sheets

When homebrewing a CPU, one has to deal with microcode. Microcode is the low-level nuts and bolts of how, precisely, a CPU executes instructions (like opcodes) and performs functions such as updating the cycle counter or handling interrupt requests. To make this task easier, [Bob Alexander] created a microcode compiler built in Google Sheets to help with his own homebrew work, but it’s flexible and configurable enough to be useful to others, as well.

A CPU’s microcode usually lives in read-only memory, and writing the microcode is only one step in the journey. [Bob]’s tool compiles his microcode into files that can be burned into memory (multiple EEPROM chips, in [Bob]’s case) or used as a Verilog program in the case of implementing the CPU in an FPGA. It’s configurable enough to be adapted for other homebrew CPU projects, though one would of course have to re-write the microcode portion.

A read-only version of the spreadsheet makes for some fun browsing, and if it piques your interest enough to get a copy of your own complete with the compiler script, you can do that here. It uses Google Sheets, and writes the output files into one’s Google Drive.

This kind of low-level project really highlights the finer points of just how the hard work of digital computing gets done. A good example is the Gigatron which implemented a RISC CPU using only microcode, memory, and logic gates in the late 70s. We’ve even seen custom microcode used to aid complex debugging.