What do the HP-1000 and the DEC VAX 11/730 have in common with the video games Tempest and Battlezone? More than you might think. All of those machines, along with many others from that time period, used AM2900-family bit slice CPUs.
The bit slice CPU was a very successful product that could only have existed in the 1970s. Today, if you need a computer system, there are many CPUs and even entire systems on a chip to choose from. You can also get many small board-level systems that would probably do anything you want. In the 1960s, you had no choices at all. You built circuit boards with gates on the using transistors, tubes, relays, or — maybe — small-scale IC gates. Then you wired the boards up.
It didn’t take a genius to realize that it would be great to offer people a CPU chip like you can get today. The problem is the semiconductor technology of the day wouldn’t allow it — at least, not with any significant amount of resources. For example, the Motorola MC14500B from 1977 was a one-bit microprocessor, and while that had its uses, it wasn’t for everyone or everything.
The Answer
The answer was to produce as much of a CPU as possible in a chip and make provisions to use multiple chips together to build the CPU. That’s exactly what AMD did with the AM2900 family. If you think about it, what is a CPU? Sure, there are variations, but at the core, there’s a place to store instructions, a place to store data, some way to pick instructions, and a way to operate on data (like an ALU — arithmetic logic unit). Instructions move data from one place to another and set the state of things like I/O devices, ALU operations, and the like.
Sure, that’s an oversimplification, but it can be stretched to describe most traditional CPUs. The AMD chips provided a 4-bit data path that could be chained together. Other chips in the family could manage memory (including, optionally, DMA) and take care of bookkeeping between slices. You could build an 8-bit machine with two slices, a 12-bit machine with three, and so on.
Not only did this allow fewer ICs than using conventional chips, it also allowed bipolar logic which — especially at the time — was faster but not as dense as NMOS or CMOS. Chips like the AM2900 family let you create flexible CPUs. They fit your application and ran fast compared to what you might be able to do using other methods.
Microcoding
Microcode is common in many CPUs, and bitslice CPUs were no exception. For example, you might have a very long microcode instruction where each register has a separate read and write line. If you had eight registers, that’s 16 bits just in those controls. Then you might also have a function code (4 bits) and a bit indicating if the condition codes should update. Now, each “instruction” is 21 bits. That’s longer than you want for, say, an 8-bit machine, so you define instructions that execute microcode.
For example, an instruction to add register A to register B and leave the result in B might have three microcode steps. The first would gate register A and B to the data bus and assert the code that makes the ALU add. Then, the second instruction would put the result on the databus and command the B register to read the databus. The final microcode instruction would jump to the main part of the microcode that reads the next instruction and continues the program.
That’s how you’d typically design a bitslice CPU. An AM2909 , AM2910, or AM2911 (microprogram sequencers) would address a microprogram store ROM, which would feed commands to an array of AM2901 chips. The 40-pin AM2901 came in several variations (e.g., AM2901B) each having technology improvements to make them smaller and faster.
The microprogram store would then orchestrate the fetching of instructions and their execution. Presumably, the microprogram was relatively small compared to the real software, so your ROM with the microcode could be smaller than the ROM containing your actual application.
Cooperation
Of course, you can’t do everything just by adding another chip. They have to cooperate. For example, there is an F output on each chip that is open collector. You tie all the F outputs together with a pull-up resistor. If any CPU slice has a non-zero result, it will pull down the F output. Therefore, if the F output is high, then the entire result (however many bits that may be) must be zero.
Handling carry is also a problem. If your first slice adds 1111+0001 the answer isn’t really 0000. It is 0000 + a carry. You can simply wire up each Cn+4 output to the next chip’s Cn input to get ripple carry, but that will create a speed penalty that gets worse as you add slices. You can also use an AM2902 to “look ahead” for better performance. Each 2902 could handle four slices or 16 bits. If you wanted to go beyond that, you could use one AM2902 to look ahead for up to another four AM2902s, each of which handled four CPU slices. Presumably, it would be possible to extend this scheme further if you wanted to go beyond 64 bits, although in 1975, that might not have been your biggest problem with building a machine that large.
Shifting and multiplication required cooperation, too. It was common to use a multiplexer at each end of the chain to determine the source of new bits when shifting. It just depended on what you needed.
Customization
That’s one interesting thing about using bit slice. You could design just what you needed. Why build a 32-bit machine if you needed 24-bits for the task at hand? Why use multiplexers to enable a rotate instruction that you will never use?
These days, we take a building block and make it fit our problem. With bitslice, you made a CPU that exactly fit what you needed. There were many tidbits about how to do different operations like fetching instructions, multiplying, or byte swapping in the AM2900 data book.
That data book also shows the chips you’d need, like AM2902s or the 48-pin AM2903 “Superslice” with extendable registers, multipliers, division, and other special circuits onboard.
Evolution
The AM2900 family was very successful. The original AM2901 started out on a very large die using low-power Schottky and could operate at 80 nanoseconds. By 1978 — three years later, the AM2901B die was less than half the size and could handle 50 nanoseconds. By 1981, the AM2901C used ECL internally and could do 37 nanoseconds on the same die. By 1979, you could even get a floating point coprocessor (the AMD9511).
These were simple devices because you needed multiple chips to support it and multiple AM2901s to do anything bigger than four bits. The original AM2901, for example, had only 540 gates onboard. Yet they found their way into everything from workstations and music synthesizers to video games. Because of their popularity, there were many second-source suppliers for the device, and it is still possible to find new old stock. There were even Soviet copies produced.
More Bitslice
Many of the CPUs made with the AM2900 were proprietary or military. But if you want to see one that has a good bit of documentation, here’s a 1980 Master’s thesis on implementing a Nova 1200-compatible CPU using the technology — well, part of one, anyway.
The AM2900 wasn’t the only game in town. In 1974, National Semiconductor’s IMP and Intel’s 3000-series were available. In addition, Motorola, Texas Instruments, Fairchild, Raytheon, and others made similar devices. But for various reasons, the AM2900 is what most people think of when they remember bitslice CPUs. In fact, the Master’s thesis about the Nova CPU also has a table of other bitslice tech and the reason they didn’t use any of the other ones. For example, some of the devices used PMOS, which was slow. Others used ECL, a fast technology with a deserved reputation for being difficult to use. Another thesis from 1976 has similar logic for selecting the AM2900.
Want more?
[Ken Shirriff] took an ECL variant of the AM2901 apart. There’s also a book from 1980 you can read. There is also a ton of documents on a Gopher server (not kidding). If your browser doesn’t handle Gopher — and that won’t surprise us — try one of the many Gopher proxies. The 16-bit computer design example is especially worth a look. Want a more complex example? Here’s a blazing-fast 8080 CPU built with bitslice. Over on Hackaday.io, [zpekic] recently built this 8080 and ran tiny Basic on it (see the video if you don’t believe it).
The Xerox Star 8010 used the AM2900 in 1981. Cost less than $17,000! Luckily, you can emulate one if you like. For that matter, you can sort of emulate the AM2900 using Java, although it might not work for every possible design (tip: download from the releases).
> You could build an 8-bit machine with two slices, a 24-bit machine with three, and so on.
Shouldn’t that be six slices for a 24-bit machine?
Nice article. When I think of bit-slice CPU designs, I go back to the original: the 74181. Less functional than the AM2901, of course, but 10’s of thousands of minicomputers were built using it. I worked for Digital Equipment Corp. in the 70’s and most of the early PDP-11 CPUs were 74181-based. It is a good chip for newcomers to CPU design to study because it is very understandable.
DEC was flexible. My vax-11/730 uses 2901s, as did the FP11 floating point card. Whatever worked.
Yep. Whoops
The AM2900’s were used well into the 1980’s (e.g., the Vax 11/730 was introduced in 1982)
Xerox Star for example.
http://www.righto.com/2020/04/inside-am2901-amds-1970s-bit-slice.html
We once had a couple of HP1000’s where I work, complete with 9-track tape drives, disk packs, and “blinkenlight” panels. When they were scrapped, I went through a lot of the circuit boards to satisfy my historical interest in such things.
I don’t recall seeing any AM2900s. I do, however, distinctly remember seeing 74181 (ALU) chips. I was under the impression that the “cpu” in our machines was implemented exclusively in TTL chips, with the 74181s part of its ALU.
In light of your claim that the HP1000 “used AM2900-family bit slice CPUs”, I’m not sure why the CPU card in our machines would have featured 74181s, unless they were part of some kind of “math coprocessor” logic.
…interesting…
The HP-1000 A-series model A600 from 1981 used the AM2900.
I designed the HP-1000 A600. It used 4 AMD 2901B, AMD 2904, AMD 2910 and registered PROMs for microstore. Because of the 2901’s capabilities, there was never a bug in the HP-1000 instruction set. That can’t be said for many custom chip implementations. Start to manufacturing release, the project took 9 months. Since the 2901 family was bipolar logic, the system operated over the temperature range of -40C to 80C (with no heat sinks). Check the temperature range for Intel’s or AMD’s latest CPUs. Overall it was a fun project.
I worked quite a bit on the HP 1000 family back in the day. There were several different CPUs. The A series used some form of 2900 bit slice. https://www.hpl.hp.com/hpjournal/pdfs/IssuePDFs/1984-02.pdf
The 74181 was available 5 years before the AM2900.
Went to AMD building in Woking when I was younger and asked nicely and got lots of data manuals on their bit slice processors and other parts. Must have been late 70s early 80s.
Very interesting reading for me at the time.
I wish I had not rid of them now.
You and me both. Load testing my bookshelf with that and the blue Motorola set.
a lot of this becomes much relevant with the rise of the classic risc pipeline. the 1970’s were a very uncivilised, unenlightened era in CPU design.
https://en.wikipedia.org/wiki/Classic_RISC_pipeline
much less relevant*
Given that the general purpose CPUs have been various flavors of RISC for a long time now, micro-coding is no longer relevant from that aspect, but it can be still effectively used in custom processors, for example implementing data protocols or fast control functions. I wrote a “custom CPU” which converts ASCII stream in Intel HEX format to memory writes and vice versa, sort of memory DMA to serial UART. Another crazy idea I have (but no time for it) is to implement a Tiny Basic CPU in micro-code. https://hackaday.io/project/181664-intel-hex-files-for-fpgas-no-embedded-cpus
While working at Q1 in the mid 70’s to early 80’s I used the 2900 series to build one of the [if not the] first bit-slice disk controller on the East coast – it was 8 bits wide, had 256 32 bit words of microcode, and could do up to 5 different functions at the same time – it ran at 5MIPS and controlled two 29Meg drives.
The fun part was building a RAM emulator for the microcode, and then writing a compiler for the microcode.
I socketed the divider chip for the clock circuit, and could substitute a de-bounced push button so that I could single step the microcode. Later, at a different company, I developed a way of using two interleaved burst error correction chips from AMD so that the unit could correct up to 22 bit burst errors iinstead of the 11 bit errors a single chip could correct.
Cool stuff! Did you save any of the documentation or source code for those projects? In case you want to “re-live” the era, here is my custom microcode compiler, I wonder if some of the concepts will be familiar :-) https://hackaday.io/project/172073-microcoding-for-fpgas
I still have a copy of Mick and Brick on my bookshelf. A colleague of mine built a microcoded S/370 compatible CPU using 2900’s. A fascinating part that held a world of possibilities, or so it seemed at the time.
I built a 16-bit processor using 4x 2901s and a 2910 microprogram sequencer in 1985 for my undergrad EE thesis. The whole thing was wirewrspped. Fun but very demanding project on top of all the labs and coursework
I built a display processor out of 2901s (US Patent 4,860,218) but couldn’t get the 2910 to run fast enough, so I abandoned the notion of a program counter entirely. In addition to its ALU functions, each instruction had a branch condition field and two branch addresses (somewhat like SNOBOL4’s :S & :F). This way I only had to deal with the delay of the address multiplexer, which was much faster than the 2910. As a result, it didn’t actually matter what the order of instructions in memory was. The entire program was a doubly-linked list. Gives entirely new meaning to the term “spaghetti code”.
One input of the branch condition multiplexer was tied off to a logic 1, which when selected allowed unconditional progress. The other inputs were connected to ALU results, etc.
Fun stuff.
Sweet! I use the same idea to generate microcode control unit (VHDL code) from my mcc compiler – “if (cond) then … else …” statement in microcode becomes a 2 -> 1 MUX where select is output of the conditional code MUX. That way microcode counter is simply a register with 2 multiplexed inputs. I reserved first 4 codes to implement RETURN, REPEAT, NEXT, FORK. Essentially, each microinstruction is at the same time an IF statement that can go anywhere, as well as do standard program control operations in 1 clock cycle.
The Data General Nova 4C (for sure) was 2901s and microcoded. I think a lot of the DG machines used 2901s, but I only worked on the Nova 4…built a microcoded serial terminal card that supported 16 asynch terminals and used the 2901 with a modified (by me) microcoded instruction set to preprocess the terminal input streams before they got to the main processor (mostly things like line editing…processing backspaces, etc, so it only passed serial data to the processor as full command lines when it saw a newline character)
The microcoded DG machines were easily upgraded simply by changing the microcode PROMS, and DG went to great lengths with epoxy covering them, serialising them, etc to keep people (and independent repair places) from doing that.
When I went to USN soldering school we were taught how to dig the epoxy off to repair damaged CKB’s. It is not hard at all. Just use a solder iron with a flat tip and start scraping the epoxy or conformal coating. Comes right off.
I was sure my battlezone was 6502 inside (R).
I shall have to consult the schematics.
The main game code runs on a 6502 at 1.5 MHz but to do the 3D calculation a 16 bit (8.8) fixed point math coprocessor running at 3 MHz is used made from four AM2901s, a couple PROMs, and a little TTL logic. Those ALU slices and the sound circuits are on the smaller 2nd circuit board called the Math Box.
The Math Box has 31 instructions plus a piece of self-test microcode. Most of the instructions just move 8 bit 6502 data to or from various 16 bit fixed point Math Box registers but a few kick off a series of computations. For instance, command $17 will perform R7=(R0*R4)-(R1*R5)+R2; R8=(R1*R4)+(R0*R5)+R3; result = R8/R7. Each of those Rx registers has an 8 bit mantissa and an 8 bit exponent.
Battlezone had a 6502 and a vector calculator “mathbox” done in TTL logic. But not a 2901. I rememember Rock-Ola Starcastle was based on 2901:s, not sure about Tempest.
Both Battlezone and Tempest use the same Math Box design made from four Am2901s, six 256×4 bit PROMS, one 32×8 bit PROM, two 4 bit counters, and an 8 bit latch. I have both Battlezone and Tempest Math Boxes sitting right here and I’ve reverse engineered the microcode.
Star Castle used a custom 12 bit CPU made from three Raytheon 25LS181 four bit ALU slices with a 74LS182 lookahead carry generator but it did not make use of any Am2901s.
Thanks for correcting. Maybe I was thinking about Asteroids.
Sorry, the mathbox was indeed done with 2901’s. Don’t know how I recalled it was done with TTL only.
In the early 80’s a disk drive-tester company where I worked called Cambrian Systems had an S100 system that had 4 2901’s on a board (vector machine wire wrapped), and a local bus connecting to memory and other interface boards to test hard drives. The system ran CP/M and tester was programmed in FORTH!
The Nova 1200 had a single 74181 ALU, as I recall. It’s 16 bit instruction and data took 4 cycles for the instruction calculation, and another 4 cycles to advance the program counter, which was also a 16 bit variable – mostly just adding 1 to fetch the next instruction. Didn’t matter much as the memory wasn’t solid state, as I recollect. The memory had something like a microsecond access time.
Core memory in those old Novas. I have a core memory card in my attic somewhere. The Nova 4 went to 2901s for the speed. I have a Nova 3 front panel staring at me from my bookshelf. Should hook an Arduino up to it and get those lights blinking…
The ICL PERQ computers (both PERQ1 and PERQ2) used the 2900 family and huge boards full of ECL. The PERQ 2 i had had four 4K word banks if microcode RAM that could be switched on the fly. The CPU handled 24 bit data (iirc) with 2MB if RAM. The microcode included graphics operation library and managed bit blit and mouse pointers on a 1280×1024 1bit/pixel display. Very slick for 1979-1983 era.
I generally just lurk here, but this tickled my fancy. In the early 1980’s I worked a t a company named Metheus that built several “high end” graphics controllers using 2900 series chips. They were microcoded, of course, and were very high performance (at the time). The thinking
I got cut off…. I loved micro-coding. Being able to do multiple things simultaneously in a single instruction required a broad perspective. At some point, I wrote a macro-assembler for it (in Lex and Yacc) that was smart enough to collapse multiple adjacent ‘instructions’ into a single micro-code word. I don’t miss those days, but they sure were fun.
Ask me about a multi 4004 micro-processor based holographic read-only memory system sometime!
John P
Wait… Metheus… in Portland, OR? That name and your name are triggering some very dusty neurons. We may have met.
Please… It was Hillsboro, I think. Portland definitely was not that big! It was a Tektronix spin-off that also tried to get into EE CAD tools and even developed a 68K based workstation that had VM. It took some interesting tricks to get that to work!
Yup, exactly how my home-brew microcode compiler works too, except written in C#.
And I am definitely asking about 4004-based multiprocessing!
I had a Metheus ISA video card in a long gone PC and it was amazing. For a very brief moment in time, you guys did amazing things.
I worked for a now-defunct medical computing company in the 1970s. They had a series of CAT scanners that initially used a 16-bit processor to collect data and reconstruct images. It took ~30 minutes for the first image to be generated from raw data. Image data was collected in sets (parallel planes of the patient) and the first image required a long calibration step; successive images in the set were faster because they could reuse the calibration data, but still required a few minutes per image (about five minutes). A specialized hardware co-processor based on AM2900 bit slices was offered later and first-image reconstruction dropped from 30 minutes to about 5 minutes. A big part of the preprocessing of image data required realigning fan-beam data into parallel beams, so the co-processor had special instructions to swizzle the fans into parallels. Reconstruction required “de-scattering” to compensate for x-ray scattering and a couple of digital signal processing instructions were added to accomplish this (a sort of filtering). These features could not be added to mainstream processors because they were too specialized and the market too small to justify the design and manufacturing expenses.
Data General’s MP/200 and Nova 4 were both designed around 2900-based data paths.
As a teen, I checked out a book from the library on designing a computer using that bitslice chip. Was quite interesting and git deep into microcode. And honestly, I was annoyed at what to me looked like an obvious oversight. The microcode implemented both a conditional jump and an unconditional jump, both of which took the same number of clock cycles. Additionally, the conditional jump selected the condition to test via a multiplexer. The issue was that there were fewer potential conditions than there were multiplexer inputs. To me, it was obvious that a separate implementation of a jump instruction was unnecessary since you could tie one of the unused multiplexer inputs to “true” and then replace the unconditional jump with a conditional jump that was always “true”. Doing that would slightly simplify the design with zero cost.
2901 bit slices are alive and growing well here. My 16 bit CPU has 2 extra slices:) I need to hack it down to 20 bits. They hide among the 22v10c’s and the 74ls645’s. I Have each slice on tiny pcb with io and mar buffer. This lets me play with ever word size I like. FInding vintage TTL is harder, because 74xx is still valid for homebrew projects.
The Pixar Image Computer, which I had to the privilege to program in microcode for while at Pixar during it’s earliest days, was built from these. Challenging but fun job. https://en.wikipedia.org/wiki/Pixar_Image_Computer