8087's 4-bit adder block. (Credit: Ken Shirriff)

The Adder At The Heart Of Intel’s 8087 FPU

As simple as the concept of adding two numbers appears at first glance, doing it in the 1970s in Intel’s 8087 FPU with its 69-bit adder was still a tall order. This is namely the core feature that many features like tangents, cosines and exponentiation rely on, so it had to be basically perfect. In a recent die-level analysis of the 8087 [Ken Shirrif] dives into the structure, layout and functioning of this ‘beating heart’ of this piece of semiconductor history.

The Intel 8087 adder and associated registers. (Credit: Intel)
The Intel 8087 adder and associated registers. (Credit: Intel)

Although anyone can build a simple binary adder out of off-the-shelf parts including 74-series logic ICs, the problem is to make it fast so that the 69th bit doesn’t have to wait for e.g. a carry to trickle all the way through the preceding bits. The main way that this is solved is by breaking addition into 4-bit blocks, reducing the problem by a factor of four, along with an optimized Manchester carry-chain carry-lookahead implementation.

The main advantage of this variation of a carry-lookahead is that it reduces the number of required transistors, without sacrificing too much performance. Later on Intel would switch to the faster, but more transistor-intensive Kogge-Stone adder.

Implementing this entire adder with NMOS technology and wiring it all up to the rest of the die required a lot of ingenuity on the side of the Intel engineers, as previously noted this adder is effectively always used in any operation at some stage. This necessitates many surrounding registers and in turn circuitry to manage these, with part of the complexity handled in microcode and part in silicon.

How The Intel 8087 FPU Knows Which Instructions To Execute

An interesting detail about the Intel 8087 floating point processor (FPU) is that it’s a co-processor that shares a bus with the 8086 or 8088 CPU and system memory, which means that somehow both the CPU and FPU need to know which instructions are intended for the FPU. Key to this are eight so-called ESCAPE opcodes that are assigned to the co-processor, as explained in a recent article by [Ken Shirriff].

The 8087 thus waits to see whether it sees these opcodes, but since it doesn’t have access to the CPU’s registers, sharing data has to occur via system memory. The address for this is calculated by the CPU and read from by the CPU, with this address registered by the FPU and stores for later use in its BIU register. From there the instruction can be fully decoded and executed.

This decoding is mostly done by the microcode engine, with conditional instructions like cos featuring circuitry that sprawls all over the IC. Explained in the article is how the microcode engine even knows how to begin this decoding process, considering the complexity of these instructions. The biggest limitation at the time was that even a 2 kB ROM was already quite large, which resulted in the 8087 using only 22 microcode entry points, using a combination of logic gates and PLAs to fully implement the entire ROM.

Only some instructions are directly implemented in hardware at the bus interface (BIU), which means that a lot depends on this microcode engine and the ROM for things to work half-way efficiently. This need to solve problems like e.g. fetching constants resulted in a similarly complex-but-transistor-saving approach for such cases.

Even if the 8087 architecture is convoluted and the ISA not well-regarded today, you absolutely have to respect the sheer engineering skills and out-of-the-box thinking of the 8087 project’s engineers.