Bootstrapping The Old Fashioned Way

The PDP-11, the Altair 8800, and the IMSAI 8080 were some of the heroes of the computer revolution, and they have something in common — front panel switches, and a lot of them. You probably have a fuzzy idea about those switches, maybe from reading Levy’s Hackers, where the painful process of toggling in programs is briefly described. But how exactly does it work? Well thanks to [Dave Plummer] of Dave’s Garage, now we have a handy tutorial. The exact computer in question is a reproduction of the IMSAI 8080, the computer made famous by a young Matthew Broderick in Wargames. [Dave] managed to score the reproduction and a viewer saved him the time of assembly.

The example program is a Larson Scanner, AKA making an strip of lights push a pulse of light across the strip. [Dave] starts with the Assembly code, a scant 11 lines, and runs it through an assembler available online. That gives us machine code, but there’s no hex keypad for input, so we need those in 8-bit binary bytes. To actually program the machine, you set the address switches to your start-of-program location, and the data switches to your first byte. The “deposit” switch sets that byte, while the “deposit next” switch increments the address and then stores the value. It means you don’t have to key in an address for each instruction, just the data. Get to the end of the program, confirm the address is set to the start, and flick run. Hope you toggled everything in correctly. If so, you’re rewarded with a friendly scanner so reminiscent of 80s TV shows. Stick around after the break to see the demonstration!
Continue reading “Bootstrapping The Old Fashioned Way”

Ztachip Accelerates Tensorflow And Image Workloads

[Vuong Nguyen] clearly knows his way around artificial intelligence accelerator hardware, creating ztachip: an open source implementation of an accelerator platform for AI and traditional image processing workloads. Ztachip (pronounced “zeta-chip”) contains an array of custom processors, and is not tied to one particular architecture. Ztachip implements a new tensor programming paradigm that [Vuong] has created, which can accelerate TensorFlow tasks, but is not limited to that. In fact it can process TensorFlow in parallel with non-AI tasks, as the video below shows.

A RISC-V core, based on the VexRiscV design, is used as the host processor handling the distribution of the application. VexRiscV itself is quite interesting. Written in SpinalHDL (a Scala variant), it’s super configurable, producing a Verilog core, ready to drop into the design.

A Digilent Arty-A7, Arducam and a VGA PMOD is all you need

From a hardware design perspective the RISC-V core hooks up to an AXI crossbar, with all the AXI-lite busses muxed as is usual for the AMBA AXI ecosystem. The Ztachip core as well as a DDR3 controller are also connected, together with a camera interface and VGA video.

Other than providing an FPGA-specific DDR3 controller and AXI crossbar IP, the rest of the design is generic RTL. This is good news. The demo below deploys onto an Artix-7 based Digilent (Arty-A7) with a VGA PMOD module, but little else needed. Pre-build Xilinx IP is provided, but targeting a different FPGA shouldn’t be a huge task for the experienced FPGA ninja.

Ztachip top level architecture

The magic happens in the Ztachip core, which is mostly an array of Pcores. Each Pcore has both vector and scalar processing capability, making it super flexible. The Tensor Engine (internally this is the ‘dataplane processor’) is in charge here, sending instructions from the RISC-V core into the Pcore array together with image data, as well as streaming video data out. That camera is only a 0.3 MP Arducam, and the video is VGA resolution, but give it a bigger FPGA and those limits could be raised.

This domain-specific approach uses a highly modified C-like language (with a custom compiler) to describe the application that is to be distributed across the accelerator array. We couldn’t find any documentation on this, but there are a few example algorithms.

The demo video shows a real-time mix of four algorithms running in parallel; one object classification (Google’s Tensorflow mobilenet-ssd, a pre-trained AI model) canny edge detection, a Harris corner detection, and Optical flow which gives it a predator-like motion vision.

[Vuong] reckons, efficiency wise it is 5.5x more computationally efficient than a Jetson Nano and 37x more than Google’s TPU edge. These are bold claims, to say the least, but who are we to argue with a clearly incredibly talented engineer?

We cover many AI-related topics, like this AI assisted tap-typing gadget, for starters. And not wanting to forget about the original AI hardware, the good old-fashioned neuron, we got that covered as well!

Continue reading “Ztachip Accelerates Tensorflow And Image Workloads”

Front and back of a handheld 6502 computer with bubble LED displays

The Pocket265 Is A Pocket-Sized 6502 Single-Board Computer

Single-board computers have been around ever since microprocessors became affordable in the 1970s and never went away. Today we have Raspberry Pis and LattePandas, while back in the ’70s and ’80s there were the Ferguson Big Board, the KIM-1 and a whole array of Intel SDK boards. Although functionally similar to their modern counterparts with a CPU, RAM, ROM and some basic peripherals, the old boards were huge compared to today’s tiny platforms and typically required a rather beefy power supply to operate.

It doesn’t have to be that way though, as [Aleksander] shows with the Pocket265: a handheld 6502 single-board computer somewhat reminiscent of the famous KIM-1. Like that classic machine, it’s got a hexadecimal keypad to enter programs using machine code and a row of LED displays to show the programs’ output. Unlike the KIM, the Pocket265 is small enough to hold in one hand and uses bubble LED displays, which make it look more like a programmable calculator from the 1970s. It comes with a lithium battery that makes it truly portable, as well as a sleek 3D printed case to make it more comfortable to hold than a bare circuit board.

The single ROM chip contains a monitor program that runs the basic user interface. It also makes programming a bit less tedious by implementing a number of system calls to handle things like user input and display output. A serial EEPROM enables local data storage, while a UART with a USB interface enables data transfer to other computers. If you’re interested in building and programming such a machine yourself, [Aleksander] helpfully provides code examples as well as full hardware documentation on his GitHub page.

The 6502 remains a firm favorite among hardware hackers: some projects we recently featured with this CPU include one beautifully made machine, this easy-to-build single-board computer and this huge breadboard-based contraption. Looking for something smaller? Try this tidy little board or this 6502 coupled to an FPGA.

Tesla’s Dojo Is An Interesting CPU Design

What do you get when you cross a modern super-scalar out-of-order CPU core with more traditional microcontroller aspects such as no virtual memory, no memory cache, and no DDR or PCIe controllers? You get the Tesla Dojo, which Chips and Cheese recently did a deep dive on.

It starts with a comparison to the IBM Cell processors. The Cell of the mid-2000s featured something called the SPE (Synergistic Processing Elements). They were smaller cores focused on vector processing or other specialized types of workloads. They didn’t access the main memory and had to be given tasks by the fully featured CPU. Dojo has 1.25MB of SRAM that it can use as working memory with five ports, but it has no cache or virtual memory. It uses DMA to get the information it needs via a mesh system. The front end pulls RISC-V-like (heavily MIPS-inspired) instructions into a small instruction cache and decodes eight instructions per cycle. Continue reading “Tesla’s Dojo Is An Interesting CPU Design”

Magnetic Maniac Manages Mangled Memory

Ahh, floppy disks. Few things carry nostalgia quite like a floppy — either 3 1⁄2 or 5 1⁄4, depending on which generation of hacker you happen to be. (And yes, we hear you grey-beards, 8-inch floppies were definitely a thing.) The real goodies aren’t the floppies themselves, but what they carried, like Wolfenstein 3d, Commander Keen, DOS, or any number of other classics from the past. Unfortunately a bunch of floppy disks these aren’t carrying anything anymore, as bit rot eventually catches up with them. Even worse, on some trashed floppies, a format operation fails, too. Surely, these floppies are destined for the trash, right?
Continue reading “Magnetic Maniac Manages Mangled Memory”

Coffee With Kernighan

There was an interesting tidbit buried in a Computerphile video released last week (below the break), featuring professors [David Brailsford] and [Brian Kernighan] having a chat over coffee. Among other topics, they discuss the history and current state of various text processing tools. We learn that [Kernighan] has taken on a summer project of updating the AWK text processing language to handle UTF-8 text, an omission he admits is embarrassing in this day and age. He is also working on a second edition of The AWK Programming Language book, which hasn’t been updated since being first released in 1988.

[Brian Kernighan] is a legend in the world of Unix and computing, working at Bell Labs during the 70s where Unix and C were developed. Among the many accomplishments in his career, he is well-known as the co-author with [Dennis Ritchie] of The C Programming Language, first published in 1972 and still being used decades later, AWK mentioned above, and major updates to troff. More recently, he co-authored The Go Programming Language book in 2015.

If an updated UTF-8-capable AWK interests you, keep an eye on the AWK GitHub repository where [Kernighan] anticipates an update, once he wraps his head around git a little better. We’re happy to see [Brian] so active at 80 years old. If you want to learn more about those early days at Bell Labs, we reviewed [kernighan]’s very interesting UNIX: A History and a Memoir a couple of years ago. 

Continue reading “Coffee With Kernighan”

Recreating DOOM On A Homebrew 8-Bit CPU

[James Sharman] has been working away on a 8-bit CPU of his own design. Naturally, with his computing device largely functional, the obvious question was asked: can it run DOOM? [James’] latest video explores this question, showing just how close he was able to get.

[James’] 8-bit pipelined CPU also has its own UART, VGA adapter, and sound adapter all built up on discrete components on various PCBs. There’s also a custom interface for a SNES controller as an input device. However, it’s fundamentally well below the specs that DOOM originally required at launch. His 8-bit CPU runs at just 4 MHz, with 64 KB of RAM. This compares poorly to the 32-bit, 33 MHz Intel 386 chips and 4 MB of RAM originally recommended to run the game.

In lieu of running the real thing, [James] demonstrated the limitations of his machine by coding his own demo, nicknamed Doomed. It’s able to average 19 fps video output at a resolution of 80×60, and consists of over 5,000 lines of hand-written assembly code. Fundamentally, it’s a basic 3D engine not dissimilar to Wolfenstein 3D, though without any actual gaming interactions involved.

[James] could have simply stated the machine won’t run DOOM. However, trying to get something similar up and running was a useful learning experience, and in his own words, highly satisfying. This attitude of pushing on in the face of adversity is what propels many other DOOM porting efforts.

Continue reading “Recreating DOOM On A Homebrew 8-Bit CPU”