Ztachip Accelerates Tensorflow And Image Workloads

[Vuong Nguyen] clearly knows his way around artificial intelligence accelerator hardware, creating ztachip: an open source implementation of an accelerator platform for AI and traditional image processing workloads. Ztachip (pronounced “zeta-chip”) contains an array of custom processors, and is not tied to one particular architecture. Ztachip implements a new tensor programming paradigm that [Vuong] has created, which can accelerate TensorFlow tasks, but is not limited to that. In fact it can process TensorFlow in parallel with non-AI tasks, as the video below shows.

A RISC-V core, based on the VexRiscV design, is used as the host processor handling the distribution of the application. VexRiscV itself is quite interesting. Written in SpinalHDL (a Scala variant), it’s super configurable, producing a Verilog core, ready to drop into the design.

A Digilent Arty-A7, Arducam and a VGA PMOD is all you need

From a hardware design perspective the RISC-V core hooks up to an AXI crossbar, with all the AXI-lite busses muxed as is usual for the AMBA AXI ecosystem. The Ztachip core as well as a DDR3 controller are also connected, together with a camera interface and VGA video.

Other than providing an FPGA-specific DDR3 controller and AXI crossbar IP, the rest of the design is generic RTL. This is good news. The demo below deploys onto an Artix-7 based Digilent (Arty-A7) with a VGA PMOD module, but little else needed. Pre-build Xilinx IP is provided, but targeting a different FPGA shouldn’t be a huge task for the experienced FPGA ninja.

Ztachip top level architecture

The magic happens in the Ztachip core, which is mostly an array of Pcores. Each Pcore has both vector and scalar processing capability, making it super flexible. The Tensor Engine (internally this is the ‘dataplane processor’) is in charge here, sending instructions from the RISC-V core into the Pcore array together with image data, as well as streaming video data out. That camera is only a 0.3 MP Arducam, and the video is VGA resolution, but give it a bigger FPGA and those limits could be raised.

This domain-specific approach uses a highly modified C-like language (with a custom compiler) to describe the application that is to be distributed across the accelerator array. We couldn’t find any documentation on this, but there are a few example algorithms.

The demo video shows a real-time mix of four algorithms running in parallel; one object classification (Google’s Tensorflow mobilenet-ssd, a pre-trained AI model) canny edge detection, a Harris corner detection, and Optical flow which gives it a predator-like motion vision.

[Vuong] reckons, efficiency wise it is 5.5x more computationally efficient than a Jetson Nano and 37x more than Google’s TPU edge. These are bold claims, to say the least, but who are we to argue with a clearly incredibly talented engineer?

We cover many AI-related topics, like this AI assisted tap-typing gadget, for starters. And not wanting to forget about the original AI hardware, the good old-fashioned neuron, we got that covered as well!

6 thoughts on “Ztachip Accelerates Tensorflow And Image Workloads

  1. Advert? Are you sure? It’s very much an open source hardware and software project by one man, trying to get it up and out into the world. I don’t see any big corporation behind this.
    Did you *look* at the linked Github? It’s pretty comprehensive!

  2. There are lot’s of things I don’t come to HaD to see. So I just scroll past and don’t click. Just sayin….

    Nevermind the fact that HaD is owned by a for-profit company or that it has employees who don’t and probably can’t work for free (we all have to eat). I bet a site that was dedicated to just exactly what any one of us wants to see would fail. There would be a lot less to show so days would go by with no new posts. That would lead to less clicks. Users, seeing nothing changed would visit less often, if at all. So it would also have fewer users who could come around submitting referrals. Making the site even less updated. The cycle accelerates.

    On the other hand having a bit of exposure to articles one might not be immediately interested in might expand one’s horizons. Leading to more projects. Leading to more articles. And a healthier, thriving community.

    But if you think you can succeed with “YourPersonalDefinitionOfHackDaily.org” Go ahead. Let’s see how it goes. Drop a link here I’ll even go check it out. Good luck!

  3. Sounds like a fascinating development and something to keep an eye on. I am short a few wetware, software and hardware tools for effectively evaluating it at present, as such I will reserve judgement.

    I smell marketroid weaselspeak in the “computationally efficient” claim though. Making me think that this isn’t gonna replace a Jetson for a bit yet, due to clockspeed or number of core constraints and maybe expense.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.