A Die-Level Look At The Pentium FDIV Bug

The early 1990s were an interesting time in the PC world, mainly because PCs were entering the zeitgeist for the first time. This was fueled in part by companies like Intel and AMD going head-to-head in the marketplace with massive ad campaigns to build brand recognition; remember “Intel Inside”?

In 1993, Intel was making some headway in that regard. The splashy launch of their new Pentium chip in 1993 was a huge event. Unfortunately an esoteric bug in the floating-point division module came to the public’s attention. [Ken Shirriff]’s excellent account of that kerfuffle goes into great detail about the discovery of the bug. The issue was discovered by [Dr. Thomas R. Nicely] as he searched for prime numbers. It’s a bit of an understatement to say this bug created a mess for Intel. The really interesting stuff is how the so-called FDIV bug, named after the floating-point division instruction affected, was actually executed in silicon.

We won’t presume to explain it better than [Professor Ken] does, but the gist is that floating-point division in the Pentium relied on a lookup table implemented in a programmable logic array on the chip. The bug was caused by five missing table entries, and [Ken] was able to find the corresponding PLA defects on a decapped Pentium. What’s more, his analysis suggests that Intel’s characterization of the bug as a transcription error is a bit misleading; the pattern of the missing entries in the lookup table is more consistent with a mathematical error in the program that generated the table.

The Pentium bug was a big deal at the time, and in some ways a master class on how not to handle a complex technical problem. To be fair, this was the first time something like this had happened on a global scale, so Intel didn’t really have a playbook to go by. [Ken]’s account of the bug and the dustup surrounding it is first-rate, and if you ever wanted to really understand how floating-point math works in silicon, this is one article you won’t want to miss.

10 thoughts on “A Die-Level Look At The Pentium FDIV Bug

  1. Nostradarmus:

    “When the GATES of logic misalign,
    Numbers shall falter in their prime.
    Calculations awry, trust cast aside,
    The maker of chips shall pay the tide.”

    (emphasis added…) it cost them billions iirc.

  2. I worked at Intel in the processor and chipset validation labs during this. It was crazy – long nights, testing every piece of software available at the time with machines hooked up to debugger. Games, productivity apps, anything and everything.

  3. I still have a Pentium 90 with the FDIV bug. Complete with its original VLB motherboard, which is possibly more rare than the CPU itself.

    Originally, I thought it might be funny to hang onto it for a number of years and then file a claim for replacement long after any reasonable person might do so. As a naive teen at the time, I pictured a scenario where Intel would laugh and replace it with their latest and greatest offering, rather than the more likely reality of telling me to beat it.

    As time marched on, the nostalgia this old machine brought was worth more to me, so I never followed through with that plan. Instead, I keep it along with a small number of retro devices, which on occasion is brought out for some old school tinkering.

    As a home user, the FDIV bug itself never caused me any problems, despite many years of faithful service.

  4. Hah….
    “How many Intel Engineers does it take to screw in a light bulb? 1.9999999999”

    This was also the first time I heard the word “Errata”. Not allowing customers to RMA bad hardware but providing a software “fix” to OS makers instead. Today I can almost appreciate the cleverness of the product team at Intel. But back then this was still very close to “it is now safe to turn of your computer”, maybe I did not want to run Windows today, maybe I had a native DOS app. Errata would not work for those.

  5. For those interested in the deep details of computer math, Sherriff’s article is fascinating. Seeing the PLA LUT made me think “This is an error caused by truncation being used instead of rounding.” My opinion is reinforced by footnote 21.

    He claims “The logic in a carry-lookahead adder gets more and more complex for each bit so a carry-lookahead adder is impractical for large words.” I have to disagree here. For N bits, the carry-lookahead burden is proportional N*log(N). This can be tolerable when fast addition is required. When doing fast parallel multiplication of large numbers, the overhead of lookahead is negligible.

    1. Okay, I cracked up when I saw the bad PLA logic because my instant reaction was “WTF Intel, if you have do-not-care entries in logic you don’t set them to zero, you set them to simplify the logic”…
      and then I realized that would have avoided the problem. And it’s what Intel did to fix it.

      So it’s a bad math error and bad engineering! Either one would’ve avoided it.

      Every time I see a register decoder where the unuseds are zero and mention shadow registers it amazes me.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.