Just How Bad Was The Intel IAPX432?

A square red circuit board is shown on a black workbench. The circuit board houses two large chips in the upper left corner, each with a large heat sink attached.

Processor design over the last few decades has moved toward RISC processors that aim to implement a few simple operations very efficiently. For a while, though, the trend was toward ever-more-complex CISC designs that let programmers implement complex behaviors using as few instructions as possible. Few processors took this approach further than the Intel iAPX432. This hyper-CISC processor was a commercial failure, largely due to its notoriously poor performance, but [MarkTheQuasiEngineer]’s benchmark suggests that this notoriety wasn’t totally deserved.

The first step before running a benchmark was to build a computer around the processor. The iAPX432 was implemented in three chips, two of which acted as the general data processor (GDP), and one of which handled input and output. [Mark] built an SBC (design and code here) that houses the two GDP chips and an FPGA for I/O. The 432 did have a well-deserved reputation for efficiently turning electricity into heat, and the original voltage regulator failed rather quickly.

The 432 was designed to use machine code which was almost a high-level language, with built-in object-oriented programming. It had over 200 operators, some of which implemented complex object-oriented operations, and a wide variety of data types, but it had no directly-accessible general-purpose registers. In addition to the lack of registers, it also had a very complex addressing system, allowing both direct and indirect addressing. For better performance, [Mark] used direct addressing.

For the benchmark, [Mark] implemented the Spigot algorithm to calculate the value of Pi. The results were somewhat surprising: calculating 2048 digits, it beat his previous retro-processor benchmarks; an Intel 8086 running the same algorithm took 2.5 times as long. Based on the results of this hand-written code, [Mark] speculates that the 432’s poor performance had more to do with poor compiler optimization than with the fundamental design.

We’ve covered some of the history of this troubled chip before. For a similarly ambitious but ill-fated Intel project, check out the history of Itanium.

25 thoughts on “Just How Bad Was The Intel IAPX432?

  1. Ironically we cannot go that way anymore because it would be inherently incompatible with C, in the way that C wont be able to use its advantages. C has unfortunately made a dead end many of these opportunities.

        1. The features to support objects should be generic enough that you could map them from a compiler. But building a compiler to use them would be a lot of effort, for little gain.

    1. Turing machines, remember. Any computer language can be run on any computer, though there may be differences in efficiency and some languages or libraries may be better able to leverage the features of that computer that others.

      So, no, C did not create a dead end. It, and its libraries, can be and were adapted to new architectures.

      1. If anything, I think C’s simplicity and use as a “high level assembler” may have facilitated processor development, illustrating how a minimal set of operations can be used to implement anything and pointing out some core loop optimizations.

        1. I hate C being described as a high level assembler, because it’s actually a combination of a high level assembler and a runtime ABI.

          You can make an assembly language that looks practically identical to C with only a few restrictions, but the standard library and function calls with arguments are so familiar to people it still confuses them.

    2. These weren’t opportunities, they were bad ideas to begin with. It’s better to implement high level abstractions in software because if security issues are found due to the underlying architecture, you patch in software rather than needing to scrap the hardware benefits.

      In fact, plenty of programming languages already scrap processor features in favor of flexibility, standardization, and avoiding security issues.

      1. The tradition for the last 30 or so years has been to mix patchable microcode and fast path instructions. That’s paid off well in stemming architectural security holes and fixing unforeseen corner cases.

        The downside is it’s a given that any new core design will leak performance over time as ucode patches accumulate and more uops are burned checking for exploit conditions.

        1. How is that a downside? The alternative is that an old processor becomes unusable due to security issues.

          Making a processor that’s somehow magically secure from the start just isn’t possible, so mitigations are always going to be needed. This isn’t a modern thing – older processors were hilariously insecure, it’s just that the attacking technology hadn’t developed to the point.

  2. There were multiple directions computer architecture could have gone, and in fact we have taken several of them.

    At the same time the RISC/CISC/VLIW diversification was occurring, we also introduced vector processing, originally developed to meet the needs of scientific and engineering computation (weather forecasting, hunting for oil,…) but now found in every graphics co-processor. That’s largely a separate axis, of course.

    Internally, many modern processors still run VLIW microcode; that’s what handles all of the out-of-order and speculative execution details. RISC is basically the observation that if you keep the microcode simple enough, and are willing to give up those assists or implement them in the compiler rather than hardware, you can just let people program in the microcode and call that your architecture.

    (One of my favorite classes as an undergrad (6.032) was one that started with “a transistor inverts; use it in the right ranges and it’s a logical inverter, add an input and you have a NOR gate, and you can build anything from nor gates” and worked its way up through each layer of simplifying abstractions to show their implications for system, architecture, and ultimately language design, designing a multiprocessor bus-grant microcode machine along the way. My undergrad thesis, in fact, was laying out an actual implementation of that system… and discovering and fixing a lurking error in the bus handshake design as a side effect. I’m not sure whether my implementation was ever actually built; I’m guessing not.)

    1. Oh, I should have added “bit-sliced” to that list of architectural adjectives… though that’s an implementation detail related to the level of off-the-shelf ICs available at the time.

    2. What do you mean “most processors internally do VLIW”? There are GPUs that are VLIW, but a processor that translates x86 to VLIW was Transmeta’s idea, not modern PCs.

      x86 uops are typically around 64 bits, which is way smaller than what would be considered VLIW.

    1. not if you considere that this cpu was introduced before the 186 and 286 from intel. The 8086 was very much current at the time. So the comparison is between cpu’s that you could buy. And bus with is not everthing. Many early mainfraimes had wide busses, stile and 8but 8086 might be much faster.

      Bus width is not a measure of performance.

    2. The 8086 is not 8-bit. It’s a 16-bit CPU. And the 432 is only “sort of” 32-bit (it only has a 16-bit ALU internally, and most of its buses are only 16-bit).

      Both architectures were designed in the late 70’s as successors to the 8080; the 432 was Intel’s intended CPU for microcomputers, while the 8086 was designed as the temporary stopgap CPU when the 432 project started.

      You could argue that the Motorola 68k, which also implements 32-bit features with a 16-bit ALU, would be a closer comparison, and it’d be interesting to see how those two stack up also, but I hardly think comparing the ’86 and 432 is misleading.

    3. 8086 is 16-bit. And most computations are 16-bit, even if 32-bit types has been used. Open Watcom has quite decent runtime for 32-bit division/multiplication and uses single 16-bit operation, supported by processor, in many suitable cases.

  3. In a government systems division that I worked for, we designed a 432 board to test its feasibility for an upgrade to our current CPUs. Our main interest was to test how well it did when compiling ADA code, which the military was interested in using. The project never went beyond our R&D project.

  4. A CISC is effectively a microcoded RISC so conceptually there’s really not a lot of need for a formal CISC these days. The idea made more sense when memory was relatively slow and expensive but we now can readily build many creative architectures (such as multiprocessors), so there’s no reason to hide the microcode unless you’re trying to protect sensitive IP from competitors.

    The underlying idea behind CISC processors is sound, though. Computers peel — or at least should peel — like onions with successive layers tuning the resulting architecture to be more application friendly (and, hopefully, secure). This is why conversations about the suitability of ‘C’ compared to other languages really are moot — C is designed to mesh with a particular (albeit common) type of low level architecture to support the next layers of the model. (You can write entire user applications using it if you really, really, had no other choice but its not what it was designed for.) The 432 was designed to support high level business oriented languages so was definite one of those “seems like a great idea at the time” things.

    (What really set computer architectures back generations was the PC. It was a cheap re-engineering of a 1960s era minicomputer, a great idea but most people never really saw it in context, they grew up thinking that this is the totality of a computer.)

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.