An interesting detail about the Intel 8087 floating point processor (FPU) is that it’s a co-processor that shares a bus with the 8086 or 8088 CPU and system memory, which means that somehow both the CPU and FPU need to know which instructions are intended for the FPU. Key to this are eight so-called ESCAPE opcodes that are assigned to the co-processor, as explained in a recent article by [Ken Shirriff].
The 8087 thus waits to see whether it sees these opcodes, but since it doesn’t have access to the CPU’s registers, sharing data has to occur via system memory. The address for this is calculated by the CPU and read from by the CPU, with this address registered by the FPU and stores for later use in its BIU register. From there the instruction can be fully decoded and executed.
This decoding is mostly done by the microcode engine, with conditional instructions like cos featuring circuitry that sprawls all over the IC. Explained in the article is how the microcode engine even knows how to begin this decoding process, considering the complexity of these instructions. The biggest limitation at the time was that even a 2 kB ROM was already quite large, which resulted in the 8087 using only 22 microcode entry points, using a combination of logic gates and PLAs to fully implement the entire ROM.
Only some instructions are directly implemented in hardware at the bus interface (BIU), which means that a lot depends on this microcode engine and the ROM for things to work half-way efficiently. This need to solve problems like e.g. fetching constants resulted in a similarly complex-but-transistor-saving approach for such cases.
Even if the 8087 architecture is convoluted and the ISA not well-regarded today, you absolutely have to respect the sheer engineering skills and out-of-the-box thinking of the 8087 project’s engineers.

Um, the die photo shown at the beginning of the article is not at all the same as the die photo in Ken’s analysis of the 8087. In the 8087, the microcode ROM is in the middle of the chip. Simple mistake, or AI slop? After reading Ken’s fantastic (as usual) analysis, I tend to believe that this article could use some attention.
My mistake. The die photo shown at the beginning of the article is a section of the full 8087 die that I failed to recognize from later in Ken’s article. I still maintain, though, that the text of the article could use some work.
The 8087 really was s neat little chip! 😎
It turned an average IBM typewriter into a real computer (number cruncher)!
I fondly remember how much of a difference an NPU made at drawing vector graphics in AutoSketch on DOS!
Or how it sped up Mandelbrot programs (fractal generators)..
Ok, it maybe didn’t accelerate the drawing process itself (pixel setting) – but calculating of co-ordinates.
It’s a bit sad that DOS software support in general wasn’t as good as it could have been.
Because, by late 80s, when 80287/80386 FPUs were current, the price of an i8087 had dropped significantly.
Most high-level compilers of the mid-80s had offered x87 support and an 80186/80286 and up could emulate an 8087 in software.
And games using vector graphics instead of bitmaps were very common by mid-80s, too!
All those text-adventures with graphics come to my mind. Oo-Topos, Rendezvous with Rama, etc.
The x87 could have helped here in a similar way it did on AutoCAD/AutoSketch.
Providing two different binaries for 8086 and 8086+8087 also was an option, I think.
PS: Anyone remember FWAIT instruction, too?
It strictly speaking wasn’t needed anymore on 386/387 and up, I vaguely remember.
By that time, the communication link between CPU/FPU has changed quite a bit but software side remained compatible.
The earlier 80287 could be used on early 80386 motherboards, also.
Very well said! The 8087 designers did a great job, I think.
There’s an interesting interview that I’ve seen a while ago: https://www.youtube.com/watch?v=L-QVgbdt_qg
The 80-Bit internal precision is something that’s often forgotten, I think.
Most newer designs cap out at 64-Bit or so. Especially Power Macs in the 90s come to mind here.
When precision matters, even SSE on x86 platform wasn’t a match to good ol’ x87.
Merely AVX was a true game changer here, I assume.
Looking back, I’m a bit sad that the Waitek NPUs didn’t catch on (3167 and 4167).
They were memory-mapped devices and could co-exist with an x87 co-processor.
Both physically and software-wise. Well, more or less.
EMM386 was needed to enable support on DOS,
which in turn moved the whole DOS environment into V86 mode (makes some Real-Mode applications unhappy).
But they could be used same time in Protected-Mode otherwise (x87 used i/o ports, Weitek a reserved memory region).
Physically, a Weitek enhanced socket with an adapter could take both an 80387 and an Weitek same time.
The 80486 motherboards didn’t need that because the x87 FPU (NPU) was built into 486DX CPU,
so the Weitek socket (if available) was solely for Weiteks.
Another reason why they were so popular in CAD and 3D rendering, I guess.
The 486 platform was peak in early 90s when multimedia was a thing! 😎
Anyway, the interesting thing is that both x87 and Weitek Abacus did supplement each others so well! 😃
The x87 was great at high-precision and at performing floating-point math (-but could do integer too-),
while the Weitek did single-precision math at fast speeds.
In some ways, I think, the Weitek was a predecessor to MMX (which shared x87 registers for context-switching reasons to avoid special support in Windows 3.1 and other muklitasking environments; both couldn’t work in tandem sadly).
MMX focused on integer math (in its role as a SIMD instruction),
while the Weitek could do some floating-point math derived from integer values.
But even here, the x87 family was a step ahead (to MMX).
Some third-party x87 chips of early 90s could do matrix math and memory-mapped i/o in their native, non-intel compliant mode.
These exots were supported in Autodesk 3D Studio Release 3 or 4, I assume. And other specialised CAD/CAM software.
Unfortunately, this ended after the 80386 motherboards were superseded.
The i486 moved the NPU (FPU) into the CPU itself, making the the dedicated 80387 socket (or Weitek enhanced 387 socket) obsolete.
So the market for x87 math co-processors slowly ended (except for Weitek).
Merely those 486 systems using 486 upgrade chips on 386 mainboards (486DLC, 486SXL, the IBM 486 Blue Lightning etc) were excepted.
They could be paired with various math co-processors, still.
An 33 MHz or 40 MHz 386DX/486DLC with a matching math co-processor was quite a powerhouse in early to mid-90s, still. 😃
Until the Pentium (586) blew everything prior out of water!
Well, except for the early 60 MHz model(s) with the FDIV bug! 😉
NPUs (aka FPUs) are really fascinating if we go beyond the realms of Multiplan, Lotus 1-2-3 or Excel!
They’ve used to make math beautiful! I recommend reading the classic “Everything You Always Wanted To Know About Math Coprocessors”.
It’s a free text file (coproa16a.txt) that is about as old as the world wide web! 🙂
An HTML version can be found here: https://dougx.net/gaming/coproc.html
❤️