Reverse Engineering The iPhone’s Ancestor

By all accounts, the ARM architecture should be a forgotten footnote in the history of computing. What began as a custom coprocessor for a computer developed for the BBC could have easily found the same fate as National Semiconductor’s NS32000 series, HP’s PA-RISC series, or Intel’s iAPX series of microprocessors. Despite these humble beginnings, the first ARM processor has found its way into nearly every cell phone on the planet, as well as tablets, set-top boxes, and routers. What made the first ARM processor special? [Ken Shirriff] potsed a bit on the ancestor to the iPhone.

The first ARM processor was inspired by a few research papers at Berkeley and Stanford on Reduced Instruction Set Computing, or RISC. Unlike the Intel 80386 that came out the same year as the ARM1, the ARM would only have a tenth of the number of transistors, used one-twentieth of the power, and only use a handful of instructions. The idea was using a smaller number of instructions would lead to a faster overall processor.

This doesn’t mean that there still isn’t interesting hardware on the first ARM processor; for that you only need to look at this ARM visualization. In terms of silicon area, the largest parts of the ARM1 are the register file and the barrel shifter, each of which have two very important functions in this CPU.

The first ARM chip makes heavy use of registers – all 25 of them, holding 32 bits each. Each bit in a single register consists of two read transistors, one write transistor, and two inverters. This memory cell is repeated 32 times vertically and 25 times horizontally.

The next-largest component of the ARM1 is the barrel shifter. This is just a device that allows binary arguments to be shifted to the left and right, or rotated any amount, up to 31 bits. This barrel shifter is constructed from a 32 by 32 grid of transistors. The gates of these transistors are connected by diagonal control lines, and by activating the right transistor, any argument can be shifted or rotated.

In modern terms, the ARM1 is a fantastically simple chip. For one reason or another, though, this chip would become the grandparent of billions of devices manufactured this year.

Exponential Growth In Linear Time: The End Of Moore’s Law

Moore’s Law states the number of transistors on an integrated circuit will double about every two years. This law, coined by Intel and Fairchild founder [Gordon Moore] has been a truism since it’s introduction in 1965. Since the introduction of the Intel 4004 in 1971, to the Pentiums of 1993, and the Skylake processors introduced last month, the law has mostly held true.

The law, however, promises exponential growth in linear time. This is a promise that is ultimately unsustainable. This is not an article that considers the future roadblocks that will end [Moore]’s observation, but an article that says the expectations of Moore’s Law have already ended. It ended quietly, sometime around 2005, and we will never again see the time when transistor density, or faster processors, more capable graphics cards, and higher density memories will double in capability biannually.

Continue reading “Exponential Growth In Linear Time: The End Of Moore’s Law”

Raspberry Pi Halt and Catch… Well, Halt

As far back as we can remember, there have always been hacks, exploits, and just curiosity about undocumented CPU instructions. The Z80 had them. Even the HP41C calculator had some undocumented codes. The HCF (Halt and Catch Fire) instruction was apocryphal, but we always heard the old video controller chips could be coaxed into blowing up certain monitors. You don’t hear too much about things like that lately, perhaps because fewer people are working in assembly language.

[Sergi Àlvarez i Capilla] not only works in assembly language, he was writing an ARM assembler when he noticed something funny. Instructions are built in a regular pattern and some of the patterns were missing. What to do? [Sergi] lost no time trying them out.

Continue reading “Raspberry Pi Halt and Catch… Well, Halt”

Teach Yourself Verilog with this Tiny CPU Design

You probably couldn’t write a decent novel if you’d never read a novel. Learning to do something often involves studying what other people did before you. One problem with trying to learn new technology is finding something simple enough to start your studies.

[InfiniteNOP] wanted to get his feet wet writing CPUs and developed a simple 8-bit architecture that would be a good start for a classroom or self-study. It is a work in progress, so there may be a few bugs in it still to squash, but squashing bugs might be educational too. You can read the documentation in the HACKING file for details on the architecture. Briefly, the instruction’s top four bits encode the operation, while the last four bits select the register operands (there are four registers).

[InfiniteNOP] used the Xilinx tools to simulate and synthesize the CPU, but we thought it might be a good excuse to play with EDAPlayground. You can find a testbench that works with EDAPlayground, although you’ll probably want to update the CPU files to match the latest version.

Continue reading “Teach Yourself Verilog with this Tiny CPU Design”

Designing a CPU in VHDL for FPGAs: OMG.

If you’ve been thinking about playing around with FPGAs and/or are interested in CPU design, [Domipheus] has started a blog post series that you should check out. Normally we’d wait until the whole series is done to post about it, but it’s looking so good, that we thought we’d share it with you while it’s still in progress. So far, there are five parts.

minispartan6In Part One, [Domipheus] goes through his rationale and plans for the CPU. If you’re at all interested in following along, this post is a must-read. The summary, though, is that he’s aiming to make a stripped-down 16-bit processor on a Spartan 6+ FPGA with basic arithmetic and control flow, and write an assembler for it.

In Part Two, [Domipheus] goes over the nitty-gritty of getting VHDL code rendered and uploaded to the FPGA, and as an example builds up the CPU’s eight registers. If you’re new to FPGAs, pay special attention to the test bench code at the end of the post. Xilinx’s ISE package makes building a test suite for your FPGA code pretty easy, and given the eventual complexity of the system, it’s a great idea to have tests set up for each stage. Testing will be a recurring theme throughout the rest of the posts.

In Part Three, [Domipheus] works through his choices for the instruction set and starts writes up the instruction set decoder. In Part Four, we get to see an ALU and the jump commands are implemented. Part Five builds up a bare-bones control unit and connects the decoder, ALU, and registers together to do some math and count up.


We can’t wait for further installments. If you’re interested in this sort of thing, and are following [Domipheus]’s progress, be sure to let him know: we gotta keep him working.

Of course, this isn’t the first time anyone’s built a soft-CPU in an FPGA. (The OMG was added mostly to go along with the other TLAs.) Here’s a tiny one, a big one, and a bizarre one.

Discrete Transistor Computer Is Not Discreet

Every few years, we hear about someone building a computer from first principles. This doesn’t mean getting a 6502 or Z80, wiring it up, and running BASIC. I’m talking about builds from the ground up, starting with logic chips or even just transistors.

[James Newman]’s 16-bit CPU built from transistors is something he’s been working on for a little under a year now, and it’s shaping up to be one of the most impressive computer builds since the days of Cray and Control Data Corporation.

The 10,000 foot view of this computer is a machine with a 16-bit data bus, a 16-bit address bus, all built out of individual circuit boards containing single OR, AND, XOR gates, decoders, multiplexers, and registers.  These modules are laid out on 2×1.5 meter frames, each of them containing a schematic of the computer printed out with a plotter. The individual circuit modules sit right on top of this schematic, and if you have enough time on your hands, you can trace out every signal in this computer.

The architecture of the computer is more or less the same as any 16-bit processor. Three are four general purpose registers, a 16 bit program counter, a stack pointer, and a status register. [James] already has an assembler and simulator, and the instruction set is more or less what you would expect from a basic microprocessor, although this thing does have division and multiplication instructions.

The first three ‘frames’ of this computer, containing the general purpose registers, the state and status registers, and the ALU, are already complete. Those circuits are mounted on towering frames made of aluminum extrusion. [James] already has 32 bytes of memory wired up, with each individual bit having its own LED. This RAM display will be used for the Game of Life simulation once everything is working.

While this build may seem utterly impractical, it’s not too different from a few notable and historical computers. The fastest computer in the world from 1964 to ’69 was built from individual transistors, and had even wider busses and more registers. The CDC6600 was capable of running at around 10MHz, many times faster than the estimated maximum speed of [James]’ computer – 25kHz. Still, building a computer on this scale is an amazing accomplishment, and something we can’t wait to see running the Game of Life.

Thanks [aleksclark], [Michael], and [wulfman] for sending this in.

The Oldland CPU 32-bit FPGA Core

Field Programmable Gate Arrays (FPGAs) let you program any logic you’d like onto a chip. You write your logic using a hardware description language, then flash it to the FPGA. You can even design your own processor and flash it to the chip.

That’s exactly what [jamieiles] has done with the Oldland CPU. It’s an open source 32 bit CPU core that you can synthesize for use on an FPGA. Not only can you browse through all the Verilog code in the Github repo, but there’s also a bunch of tools for working with this CPU core.

Included with the package is oldland-rtlsim, which lets you simulate the processor on a PC. The oldland-debug tool lets you connect to the processor for programming and debugging over JTAG. Finally, there’s a GNU toolchain port that lets you build C code for the device.

Going one step futher, [jamieiles] built a full SoC around the Oldland core. This has SPI, UART, timers, and more features you’d expect to find in a microcontroller. It can be flashed to the relatively cheap Terasic DE0-Nano board.

[jamieiles] has also ported u-boot to the processor, and the next thing on the list is the Linux kernel. If you’ve ever been interested in how CPUs actually work, this is a neat project to look through. If you want more open source CPU cores, check out OpenCores.