The 386's main register bank, at the bottom of the datapath. The numbers show how many bits of the register can be accessed. (Credit: Ken Shirriff)

The Convoluted Way Intel’s 386 Implemented Its Registers

May 5, 2025 by Maya Posch 4 Comments

The fact that modern-day x86 processors still pretty much support the same operating systems and software as their ancestors did is quite a feat. Much of this effort had already been accomplished with the release of the 80386 (later 386) CPU in 1985, which was not only the first 32-bit x86 CPU, but was also backwards compatible with 8- and 16-bit software dating back to the 1970s. Making this work transparently was anything but straightforward, as [Ken Shirriff]’s recent analysis of the 80386’s main register file shows.

Labelled Intel 80386 die shot. (Credit: Ken Shirriff)

Using die shots of the 386’s registers and surrounding silicon, it’s possible to piece together how backwards compatibility was implemented. The storage cells of the registers are implemented using static memory (SRAM) as is typical, with much of the register file triple-ported (two read, one write).

Most interestingly is the presence of different circuits (6) to support accessing the register file for 8-, 16- or 32-bit writes and reads. The ‘shuffle’ network as [Ken] calls it is responsible for handling these distinct writes and reads, which also leads to the finding that the bottom 16 bits in the registers are actually interleaved to make this process work smoother.

Fortunately for Intel (and AMD) engineers, this feat wouldn’t have to be repeated again with the arrival of AMD64 and x86_64 many years later, when the 386’s mere 275,000 transistors on a 1 µm process would already be ancient history.

Want to dive even deeper in to the 386? This isn’t the first time [Ken] has looked at the iconic chip.

Inside The F-4 Attitude Indicator

September 30, 2024 by Al Williams 10 Comments

[Ken] recently obtained an attitude indicator—sometimes called an artificial horizon—from an F-4 fighter jet. Unlike some indicators, the F-4’s can rotate to show pitch, roll, and yaw, so it moves in three different directions. [Ken] wondered how that could work, so, like any of us, he took it apart to find out.

With the cover off, the device is a marvel of compact design. Then you realize that some of the circuit is inside the ball, so there’s even more than it appears at a quick glance. As you might have guessed, there are two separate slip rings that allow the ball to turn freely without tangling wires. Of course, even if you don’t tangle wires, getting the ball to reflect the aircraft’s orientation is an exercise in control theory, and [Ken] shows us the servo loop that makes it happen. There’s a gyroscope and synchros—sometimes known by the trade name selsyn—to keep everything in the same position.

You have to be amazed by the designers of things like this. Sophisticated both electrically and mechanically, rugged, compact, and able to handle a lot of stress. Good thing it didn’t have to be cheap.

We’ve seen inside an ADI before. If you want to make any of this look simple, check out the mechanical flight computers from the 1950s.

Continue reading “Inside The F-4 Attitude Indicator” →

[Ken] Looks At The 386

October 16, 2023 by Al Williams 27 Comments

The 80386 was — arguably — Intel’s first modern CPU. The 8086 was commercially successful, but the paged memory model was stifling. The 80286 also had a protected mode, which differed from the 386’s. [Ken Shirriff] takes the 386 apart for us in a recent blog post.

The 286’s protected mode was less successful than the 386 because of several key limitations as it was a 16-bit processor with a 24-bit address bus. It still required segment changes to access larger amounts of memory, and it had no good way to call back into real mode for compatibility reasons. The 386 fixed all that. You could adopt a segment strategy if you wanted to. But you could also load the segment registers once to point to a 4 GB linear address space and then essentially forget them. You also had a virtual 86 mode that could simulate real mode with some work.

The CPU used a 1-micron process, compared to the 1.5-micron process used earlier. The chip had 285,000 transistors (although the 80386SL had many more). That was ten times the number of devices on the 8086. The cheaper 386SX did use the 1.5 micron process for a while, but with a 16-bit external bus, this was feasible. While 285,000 sounds like a lot, a Core i9 has around 4.2 billion transistors. Times have changed.

A smaller design also allowed chips like the 386SL for laptops. The CPU took up only about a fourth of the die. The rest held bus controllers and cache interfaces to cut costs on laptops. That’s why it had so many more transistors.

[Ken] does his usual in-depth analysis of both the die and the history behind this historic device. We spent a lot of time writing protected mode 386 code, and it was nice to see the details of a very old friend. These days, you can get a pretty capable CPU system on a solderless breadboard, but designing a working 386 system took a few extra parts. The 80286 was a stepping stone between the 8086 and 80386, but even it had some secrets to give up.

16 Kbit DRAM Gives Up Its Secrets

September 26, 2023 by Al Williams 5 Comments

[Ken Shirriff] is looking inside chips again. This time, the subject is the MK4116 — a 16 Kbit DRAM chip. Even without a calculator, you know that’s a whopping 2 Kbytes, and while that doesn’t sound impressive, in the late 1970s, it was a modern miracle.

The chip showed up in computers ranging from the TRS-80 to the Xerox Alto and was even a mainstay of arcade video games. While [Ken] thought it would be a pretty predictable teardown, he found several surprises.

Static RAM chips use flip flops and retain their state as long as power is on. That’s convenient, but each flip flop takes multiple transistors, so there is a limit to how many bits you can put on a particular size chip. Dynamic RAM increases that limit because it is nothing more than a capacitor and a single transistor. This increases memory density, but the problem is that the capacitor doesn’t hold charge indefinitely. The computer or an associated circuit had to refresh the memory periodically to maintain the contents.

One of the key innovations for this chip was the use of multiplexed address lines so it could use a smaller package. Inside, two banks of capacitors store the bits, and, usually, a computer would use eight chips to store a byte. Of course, each memory bit is made to be as compact as possible. This chip is also made to be very low power when idle. The secret is that it doesn’t use load transistors but instead uses an active pull-up tied to the system clock. Another interesting feature is the sense amplifier, which has to measure the tiny noisy voltage from the capacitors.

You’ll see all this and more in [Ken’s] write-up. Chips from that era were relatively easy to take apart compared to today’s devices. Want to know how it’s done? [Ken] can tell you. He is well-known for doing a lot of cool stuff, with ICs and even old mainframe and space hardware.

Finding Undocumented 8086 Instructions Via Microcode

July 17, 2023 by Al Williams 23 Comments

Video gamers know about cheat codes, but assembly language programmers are often in search of undocumented instructions. One way to find them is to map out all of a CPU’s opcodes and where there are holes, try those values, and see what happens. Not good enough for [Ken Shirriff]. He prefers examining the CPU’s microcode and deducing what each part of it does.

Microcode is a feature of many modern CPUs. The CPU runs several “microcode” instructions to process a single opcode. For the Intel 8086, there are 512 micro instructions, each with 21 bits. Each instruction has two parts: a part that moves a source to a destination and another that performs some other operation, such as an ALU operation. [Ken] explains it all in the post, including several hidden registers you can’t see, but the microcode can.

Searching for holes in the opcode table.

Some of the undocumented instructions are probably not useful. They are either impractical or duplicate a function you can already do another way. Not all of the instructions are there for technical reasons. For example, opcode D6, commonly known as SALC for “Set AL to Carry”, seems to exist only as a trap for anyone making a carbon copy of Intel’s microcode. When other companies like NEC made 8086 clones, having an undocumented instruction would strongly suggest they just copied Intel’s intellectual property (in NECs case, they didn’t).

Other cases happen where an instruction just doesn’t make sense. For example, you can pop all segment registers, and though it is not documented, you can deduce that POP CS should be opcode 0F. The problem is there is no sane reason to pop CS off the stack. The instruction works; it just isn’t useful. The opcodes from 60-6F are conditional jumps that are no different from the instructions at 70-7F because of decoding. There is no reason to document both identical instruction ranges.

The plot thickens when you go to two-byte instructions. You’ll find plenty of instructions of dubious value. You don’t hear much about undocumented instructions anymore. Why? Because modern CPUs have enough circuitry to dedicate some to detecting illegal instructions and halting the CPU. But the 8086 was squeezed too tight to allow for such a luxury. Good thing for people like us who enjoy solving puzzles.

You can still get a modern CPU to tell you more about instructions even if it won’t run them. Even the 80286 had some secret opcodes.

String Operations The Hard(ware) Way

April 7, 2023 by Al Williams 7 Comments

One of the interesting features of the 8086 back in 1978 was the provision for “string” instructions. These took the form of prefixes that would repeat the next instruction a certain number of times. The next instruction was meant to be one of a few string instructions that operated on memory regions and updated pointers to the memory region with each repeated operation. [Ken Shirriff] examines the 8086 die up close and personal to explain how the 8086 microcode pulled this off and it is a great read, as usual.

In general, the string instructions wanted memory pointers in the SI and DI registers and a count in CX. The flags also have a direction bit that determines if the SI and DI registers will increase or decrease on each execution. The repeat prefix could also have conditions on it. In other words, a REP prefix will execute the following string instruction until CX is zero. The REPZ and REPNZ prefixes would do the same but also stop early if the zero flag was set (REPZ) or not set (REPNZ) after each operation. The instructions can work on 8-bit data or 16-bit data and oddly, as [Ken] points out — the microcode is the same either way.

[Ken] does a great job of explaining it all, so we won’t try to repeat it here. But it is more complicated than you’d initially expect. Partially this is because the instruction can be interrupted after any operation. Also, changing the SI and DI registers not only have to account for increment or decrement, but also needs to understand the byte or word size in play. Worse still, an unaligned word had to be broken up into two different accesses. A lot of logic to put in a relatively small amount of silicon.

Even if you never design a microcoded CPU, the discussion is fascinating, and the microphotography is fun to look at, too. We always enjoy [Ken’s] posts on little CPUs and big computers.

Silicon Sleuthing: Finding A Ancient Bugfix On The 8086

November 28, 2022 by Al Williams 31 Comments

Few CPUs have had the long-lasting influence that the 8086 did. It is hard to believe that when your modern desktop computer boots, it probably thinks it is an 8086 from 1978 until some software gooses it into a more modern state. When [Ken] was examining an 8086 die, however, he noticed that part of the die didn’t look like the rest. Turns out, Intel had a bug in the original version of the 8086. In those days you couldn’t patch the microcode. It was more like a PC board — you had to change the layout and make a new one to fix it.

The affected area is the Group Decode ROM. The area is responsible for categorizing instructions based on the type of decoding they require. While it is marked as a ROM, it is more of a programmable logic array. The bug was pretty intense. If an interrupt followed either a MOV SS or POP SS instruction, havoc ensues.

Continue reading “Silicon Sleuthing: Finding A Ancient Bugfix On The 8086” →