Ztachip Accelerates Tensorflow And Image Workloads

[Vuong Nguyen] clearly knows his way around artificial intelligence accelerator hardware, creating ztachip: an open source implementation of an accelerator platform for AI and traditional image processing workloads. Ztachip (pronounced “zeta-chip”) contains an array of custom processors, and is not tied to one particular architecture. Ztachip implements a new tensor programming paradigm that [Vuong] has created, which can accelerate TensorFlow tasks, but is not limited to that. In fact it can process TensorFlow in parallel with non-AI tasks, as the video below shows.

A RISC-V core, based on the VexRiscV design, is used as the host processor handling the distribution of the application. VexRiscV itself is quite interesting. Written in SpinalHDL (a Scala variant), it’s super configurable, producing a Verilog core, ready to drop into the design.

A Digilent Arty-A7, Arducam and a VGA PMOD is all you need

From a hardware design perspective the RISC-V core hooks up to an AXI crossbar, with all the AXI-lite busses muxed as is usual for the AMBA AXI ecosystem. The Ztachip core as well as a DDR3 controller are also connected, together with a camera interface and VGA video.

Other than providing an FPGA-specific DDR3 controller and AXI crossbar IP, the rest of the design is generic RTL. This is good news. The demo below deploys onto an Artix-7 based Digilent (Arty-A7) with a VGA PMOD module, but little else needed. Pre-build Xilinx IP is provided, but targeting a different FPGA shouldn’t be a huge task for the experienced FPGA ninja.

Ztachip top level architecture

The magic happens in the Ztachip core, which is mostly an array of Pcores. Each Pcore has both vector and scalar processing capability, making it super flexible. The Tensor Engine (internally this is the ‘dataplane processor’) is in charge here, sending instructions from the RISC-V core into the Pcore array together with image data, as well as streaming video data out. That camera is only a 0.3 MP Arducam, and the video is VGA resolution, but give it a bigger FPGA and those limits could be raised.

This domain-specific approach uses a highly modified C-like language (with a custom compiler) to describe the application that is to be distributed across the accelerator array. We couldn’t find any documentation on this, but there are a few example algorithms.

The demo video shows a real-time mix of four algorithms running in parallel; one object classification (Google’s Tensorflow mobilenet-ssd, a pre-trained AI model) canny edge detection, a Harris corner detection, and Optical flow which gives it a predator-like motion vision.

[Vuong] reckons, efficiency wise it is 5.5x more computationally efficient than a Jetson Nano and 37x more than Google’s TPU edge. These are bold claims, to say the least, but who are we to argue with a clearly incredibly talented engineer?

We cover many AI-related topics, like this AI assisted tap-typing gadget, for starters. And not wanting to forget about the original AI hardware, the good old-fashioned neuron, we got that covered as well!

Continue reading “Ztachip Accelerates Tensorflow And Image Workloads”

DIY LED Cube For The Masses

No matter what the size or shape of an LED, it brings out the curiosity in every hardware nerd, and is the lifeblood of badge life around the planet. Then there is the LED cube that takes LEDs to all sides — literally. [Tomverbeure] had his own adventure of creating an LED Cube by piecing together Pixel Purses and a Cisco3G Modem.

A quick search for Pixel Purse on the internet reveals a toy lady’s handbag with an LED matrix embedded in one side. [tomverbeure] tore down 12 of these so as to get two panels for each side of his creation. After a little bit of experimenting with PCB corner brackets, he finally got it right and he is able to merge the pieces together to form the cube.

Next comes the brain and the elected device An FPGA from an HWIC-3G-CDMA modem. Cisco routers have extension slots and the HWIC connector on this particular piece had usable GPIOs that connect directly to the Altera FPGA. Inside the FPGA, a RISC-V soft CPU is used to generate images that get processed and dispatched in a hardware block. [Tomverbeure] does a detailed explanation of the implementation for all the blocks which were written in SpinalHDL. The video below shows the project in action.

We love the detail that [Tomverbeure] provides and hope it does not drive up the prices of the pixel purse too much. If you are looking for a more fine pitched cube, look no further than this one. If you end up making your own, be sure to send us a link.

COSMAC ELF Lives Again, In FPGA

Looking around at the personal computing markets in modern times, there seem to be a lot of choices in the market. In reality, though, almost everything runs on hardware from a very small group of companies, and software is often available across platforms. This wasn’t the case in the personal computing boom of the 70s and 80s, where different computers were wildly different in hardware and even architecture. The Cosmac ELF was one of the more interesting specimens from this era, and this one has been meticulously reproduced on an FPGA.

The original hardware was based on an RCA 1802 microprocessor and had a rudimentary (by today’s standards) set of switches and buttons as the computer’s inputs. It was low cost, even for the time, but was one of the first single-board computers available. This recreation is coded in SpinalHDL and the simplicity of the original hardware makes it relatively easy to understand. The FPGA is cycle-accurate to the original hardware, too, which makes it nearly perfect even without any of the original hardware.

The project’s creator, [Winston] aka [wel97459], found that SpinalHDL made this project fun to work on (and released his code on his GitHub page), and was able to get the code down to just 1500 lines to recreate the original hardware. It’s very impressive, and also an accessible read for anyone interested in some of the more unique computers offered during the early computer renaissance in the 70s.

VexRISC-V Exposed

If you want to use FPGAs, you’ll almost always use an HDL like Verilog or VHDL. These are layers of abstraction just like using, say, a C compiler is to machine language or assembly code. There are other challengers to the throne such as SpinalHDL which have small but enthusiastic followings. [Tom] has a post about how the VexRISC-V CPU leverages SpinalHDL to make an extremely flexible system that is as efficient as plain Verilog. He says the example really shows off why you should be using SpinaHDL.

Like a conventional programming language, it is easy to find niche languages that will attract a little attention and either take off (say, C++, Java, or Rust) or just sort of fade away. The problem is you can’t ever tell which ones are going to become major and which are just flashes in the pan. Is SpinalHDL the next big thing? We don’t know.

Continue reading “VexRISC-V Exposed”

VexRiscv: A Modular RISC-V Implementation For FPGA

Since an FPGA is just a sea of digital logic components on a chip, it isn’t uncommon to build a CPU using at least part of the FPGA’s circuitry. VexRiscv is an implementation of the RISC-V CPU architecture using a language called SpinalHDL.

SpinalHDL is a high-level language conceptually similar to Verilog or VHDL and can compile to Verilog or VHDL, so it should be compatible with most tool chains. VexRiscv shows off well in this project since it is very modular. You can add instructions, an MMU, JTAG debugging, caches and more.

Continue reading “VexRiscv: A Modular RISC-V Implementation For FPGA”