The 386's main register bank, at the bottom of the datapath. The numbers show how many bits of the register can be accessed. (Credit: Ken Shirriff)

The Convoluted Way Intel’s 386 Implemented Its Registers

May 5, 2025 by Maya Posch 4 Comments

The fact that modern-day x86 processors still pretty much support the same operating systems and software as their ancestors did is quite a feat. Much of this effort had already been accomplished with the release of the 80386 (later 386) CPU in 1985, which was not only the first 32-bit x86 CPU, but was also backwards compatible with 8- and 16-bit software dating back to the 1970s. Making this work transparently was anything but straightforward, as [Ken Shirriff]’s recent analysis of the 80386’s main register file shows.

Labelled Intel 80386 die shot. (Credit: Ken Shirriff)

Using die shots of the 386’s registers and surrounding silicon, it’s possible to piece together how backwards compatibility was implemented. The storage cells of the registers are implemented using static memory (SRAM) as is typical, with much of the register file triple-ported (two read, one write).

Most interestingly is the presence of different circuits (6) to support accessing the register file for 8-, 16- or 32-bit writes and reads. The ‘shuffle’ network as [Ken] calls it is responsible for handling these distinct writes and reads, which also leads to the finding that the bottom 16 bits in the registers are actually interleaved to make this process work smoother.

Fortunately for Intel (and AMD) engineers, this feat wouldn’t have to be repeated again with the arrival of AMD64 and x86_64 many years later, when the 386’s mere 275,000 transistors on a 1 µm process would already be ancient history.

Want to dive even deeper in to the 386? This isn’t the first time [Ken] has looked at the iconic chip.

Mockup of a printed copy of the Little OS Book

One Book To Boot Them All

April 2, 2025 by Heidi Ulrich 19 Comments

Somewhere in the universe, there’s a place that lists every x86 operating system from scratch. Not just some bootloaders, or just a kernel stub, but documentation to build a fully functional, interrupt-handling, multitasking-capable OS. [Erik Helin and Adam Renberg] did just that by documenting every step in The Little Book About OS Development.

This is not your typical dry academic textbook. It’s a hands-on, step-by-step guide aimed at hackers, tinkerers, and developers who want to demystify kernel programming. The book walks you through setting up your environment, bootstrapping your OS, handling interrupts, implementing virtual memory, and even tackling system calls and multitasking. It provides just enough detail to get you started but leaves room for exploration – because, let’s be honest, half the fun is in figuring things out yourself.

Completeness and structure are two things that make this book stand out. Other OS dev guides may give you snippets and leave you to assemble the puzzle yourself. This book documents the entire process, including common pitfalls. If you’ve ever been lost in the weeds of segmentation, paging, or serial I/O, this is the map you need. You can read it online or fetch it as a single 75-page long PDF.

Mockup photo source: Matthieu Dixte

Faster Integer Division With Floating Point

December 22, 2024 by Al Williams 24 Comments

Multiplication on a common microcontroller is easy. But division is much more difficult. Even with hardware assistance, a 32-bit division on a modern 64-bit x86 CPU can run between 9 and 15 cycles. Doing array processing with SIMD (single instruction multiple data) instructions like AVX or NEON often don’t offer division at all (although the RISC-V vector extensions do). However, many processors support floating point division. Does it make sense to use floating point division to replace simpler division? According to [Wojciech Mula] in a recent post, the answer is yes.

The plan is simple: cast the 8-bit numbers into 32-bit integers and then to floating point numbers. These can be divided in bulk via the SIMD instructions and then converted in reverse to the 8-bit result. You can find several code examples on GitHub.

Continue reading “Faster Integer Division With Floating Point” →

Intel Terminates X86S Initiative After Formation Of New Industry Group

December 21, 2024 by Maya Posch 59 Comments

Although the world of the X86 instruction set architecture (ISA) and related ecosystem is often accused of being ‘stale’ and ‘bloated’, we have seen a flurry of recent activity that looks to shake up and set the future course for what is still the main player for desktop, laptop and server systems. Via Tom’s Hardware comes the news that the controversial X86S initiative is now dead and buried. We reported on this proposal when it was first announced and a whitepaper released. This X86S proposal involved stripping 16- and 32-bit features along with rings 1 and 2, along with a host of other ‘legacy’ features.

This comes after the creation of a new x86 advisory group that brings together Intel, AMD, as well as a gaggle of industry giants ranging from HP and Lenovo to Microsoft and Meta. The goal here appears to be to cooperate on any changes and new features in the ISA, which is where the unilateral X86S proposal would clearly have been a poor fit. This means that while X86S is dead, some of the proposed changes may still make it into future x86 processors, much like how AMD’s 64-bit extensions to the ISA, except this time it’d be done in cooperation.

In an industry where competition from ARM especially is getting much stronger these days, it seems logical that x86-oriented companies would seek to cooperate rather than compete. It should also mean that for end users things will get less chaotic as a new Intel or AMD CPU will not suddenly sneak in incompatible extensions. Those of us who remember the fun of the 1990s when x86 CPUs were constantly trying to snipe each other with exclusive features (and unfortunate bugs) will probably appreciate this.

Reverse-Engineering The AMD Secure Processor Inside The CPU

August 18, 2024 by Maya Posch 10 Comments

On an x86 system the BIOS is the first part of the system to become active along with the basic CPU core(s) functionality, or so things used to be until Intel introduced its Management Engine (IME) and AMD its AMD Secure Processor (AMD-SP). These are low-level, trusted execution environments, which in the case of AMD-SP involves a Cortex-A5 ARM processor that together with the Cryptographic Co-Processor (CCP) block in the CPU perform basic initialization functions that would previously have been associated with the (UEFI) BIOS like DRAM initialization, but also loading of encrypted (AGESA) firmware from external SPI Flash ROM. Only once the AMD-SP environment has run through all the initialization steps will the x86 cores be allowed to start up.

In a detailed teardown by [Specter] over at the Dayzerosec blog the AMD-SP’s elements, the used memory map and integration into the rest of the CPU die are detailed, with a follow-up article covering the workings of the CCP. The latter is used both by the AMD-SP as well as being part of the cryptography hardware acceleration ISA offered to the OS. Where security researchers are interested in the AMD-SP (and IME) is due to the fascinating attack vectors, with the IME having been the most targeted, but AMD-SP having its own vulnerabilities, including in related modules, such as an injection attack against AMD’s Secure Encrypted Virtualization (SEV).

Although both AMD and Intel are rather proud of how these bootstrapping systems enable TPM, secure virtualization and so on, their added complexity and presence invisible to the operating system clearly come with some serious trade-offs. With neither company willing to allow a security audit, it seems it’s up to security researchers to do so forcefully.

A 64-bit X86 Bootloader From Scratch

July 14, 2024 by Al Williams 20 Comments

For most people, you turn on your computer, and it starts the operating system. However, the reality is much more complex as [Thasso] discovered. Even modern x86 chips start in 16-bit real mode and there is a bit of fancy footwork required to shift to modern protected mode with full 64-bit support. Want to see how? [Thasso] shows us the ropes.

Nowadays, it is handy to develop such things because you don’t have to use real hardware. An emulator like QEMU will suffice. If you know assembly language, the process is surprisingly simple, although there is a lot of nuance and subtlety. The biggest task is setting up appropriate paging tables to control the memory mapping. In real mode, segments have access to fixed 64 K blocks of memory unless you use some tricks. But in protected mode, segments define blocks of memory that can be very small or cover the entire address space. These segments define areas of memory even though it is possible to set segments to cover all memory and — sort of — ignore them. You still have to define them for the switch to protected mode.

In the bad old days, you had more reason to worry about this if you were writing a DOS Extender or using some tricks to get access to more memory. But still good to know if you are rolling your own operating system. Why do the processors still boot into real mode? Good question.

Comparing X86 And 68000 In An FPGA

June 7, 2024 by Dave Rowntree 15 Comments

[Michael Kohn] started programming on the Motorola 68000 architecture and then, for work reasons, moved over to the Intel x86 and was not exactly pleased by the latter chip’s perceived shortcomings. In the ’80s, the 68000 was a very popular chip, powering everything from personal computers to arcade machines, and looking at its architecture and ease of programming, you can see why this was.

Fast-forward a few years, and [Michael] decided to implement both cores in an FPGA to compare real applications, you know, for science. As an extra bonus, he also compares the performance of a minimal RISC-V implementation on the same hardware, taken from an earlier RISC-V project (which you should also check out !)

Utilizing their ‘Java Grinder’ application (also pretty awesome, especially the retro console support), a simple Mandelbrot fractal generator was used as a non-trivial workload to produce binaries for each architecture, and the result was timed. Unsurprisingly, for CISC architectures, the 68000 and x86 code sizes were practically identical and significantly smaller than the equivalent RISC-V. Still, looking at the execution times, the 68000 beat the x86 hands down, with the newer RISC-V speeding along to take pole position. [Michael] admits that these implementations are minimal, with no pipelining, so they could be sped up a little.

Also, it’s not a totally fair race. As you’ll note from the RISC-V implementation, there was a custom RISC-V instruction implemented to perform the Mandelbrot generator’s iterator. This computes the complex operation Z = Z² + C, which, as fellow fractal nerds will know, is where a Mandelbrot generator spends nearly all the compute time. We suspect that’s the real reason RISC-V came out on top.

If actual hardware is more your cup of tea, you could build a minimal 68k system pretty easily, provided you can find the chips. The current ubiquitous x86 architecture, as odd as it started out, is here to stay for the foreseeable future, so you’d just better get comfortable with it!

Continue reading “Comparing X86 And 68000 In An FPGA” →