New Part Day: The RISC-V Chip With Built-In Neural Networks

After exploring a few random online shops one day, [David] (thanks for sending this in, by the way) ran across a very interesting chip. It’s a dual-core, RISC-V chip running at 400MHz. There’s 6 MB of SRAM on the CPU, and there’s 2MB for convolutional neural network acceleration. There is, apparently, WiFi on some versions. There are already SDKs available on GitHub, and a bare chip costs a dollar or two. Interested? Log in to Taobao, realize Taobao does pre-orders, and all this can be yours.

This is a preorder — because apparently you can do that as a seller on TaoBao, but the Sipeed M1 K210 is available as a ‘core’ board with 72 pins in a one-inch square package, a version with WiFi, or as a complete development board with an OV2640 camera, 2.4 inch LCD, microphone, and onboard USB. There are videos of this chip running a face detection routine. It found Obama.

A bit of googling tells us this chip comes from a company named Kendryte, and here the specs are repeated: this is a dual-core RISC-V with an FPU, a bunch of RAM, and can run TensorFlow. Documentation is available, although the datasheet will need to be translated, and as of this writing there’s a GitHub filled with SDKs and examples, with some of the repos updated in the last hour.

Over the years we’ve seen a few RISC-V chips given development boards, and you can buy them right now. The HiFive 1 is an exceptionally powerful microcontroller with processing power that puts it right up against the Teensy (which is built around a Freescale chip), but it’s also fairly expensive. We’re not sure the Arduino Cinque (also RISC-V) ever made it to production, but again, expensive. The idea that a RISC-V microcontroller could be available for just a few dollars is very interesting, it even comes with SDKs and utilities to make the chip useful.

87 thoughts on “New Part Day: The RISC-V Chip With Built-In Neural Networks

        1. We have best price for those solution, Sipeed M1 module in $6, M1w(with wifi) in $7.5, the simple dev board with LCD, 2M pixel Camera, I2S MIC, Speaker PA, on board downloader, in $15.

          1. You are extremely close to competing well with espressif, but there are a few issues so far. Price wise you are a bit higher ($7.50 vs <$5), but your system is a dual core 400 Mhz 64 bit RISC-V based system which is amazing.

            Buying it through TaoBao is a pain, while I did find your listings on there, it is a painful process buying it in the USA versus just getting it off Aliexpress or other sites. Not to mention of course the listing is all in Chinese.

            The documentation on your site through https://kendryte.com/downloads/ is very easy to find, but the documentation for both standalone and FREERTOS is not in english. Espressif put tons of work into their documentation and tooling.

            Your website in general loads **very** slowly for me in the USA. You should consider using a global/USA CDN like cloudflare so downloading your 1.34 MB FREE RTOS PDF doesn't take over 20 seconds (meager 150 KB/s) on a 100 mbit/s connection.

            How old is your GCC toolchain? Is it pre GCC 8? Have you considered supporting LLVM/Clang?

            It is a shame the IC doesn't have a bit more RAM so we can run mainline Linux on it. At that point, you would do some serious damage to Espressif via being able to run Linux on your chip.

  1. After reading the headline all that came to my mind is a quote from the film Mission Impossible 1

    Luther Stickell: “I’m talking about the 686 prototypes, with the artificial intelligence RISC chip”

    1. I was just watching that movie recently and had a fun dive down the Wikipedia rabbit hole after searching for those specifications. I am obsessed with accurate representations of hacker gear. Mr. Robot is probably the best. What are your favorite movie references? In other news, this chip sounds really cool.

    1. nvm.. looks like it’s the same guy under a different brand :P

      It’s great to see that the SDK uses CMake btw. It’s going to be so much easier to integrate this into other IDEs

  2. Hi, thanks for your report !
    I’m zepan, the shopkeeper of Sipeed Tech. Taobao shop, Sipeed M1 is our first AI moulde ,and the second RISC-V module.
    I need correct some mistake in your article, and supplement some information.
    1. The chip name is K210, not K201; and it is RV64GC.
    2. It have 8MB high speed SRAM, not 6MB.
    3. There are 5.9MB SRAM can be used for convolutional neural network acceleration, so, it is possible to run small model like tiny-yolo v2,MobileNet, as you see in face detection routine video.
    4. It isn’t preoder mode before 10.7, but it is too popular that it is sold out in 10.7, and I change it to preorder mode.

    More information:
    1. It can be up to 800MHz if you want, and give you more than 0.5TOPS, 240FPS detection speed @ QVGA.
    2. It is low power consume, unlike other ARM board; It only consume 0.35W when running face detection routine.
    3. Its IO speed in simple test is up to 300MHz
    4. It is possible to run openwrt on it as it have MMU, but too tight in ram.
    5. We’re adding micropython to the board, and soon you can simply run python on it, you know python is the best lang for DL.
    6. We also have Mic Array board, and another fancy board in development.
    7. We have best price for those solution, Sipeed M1 module in $6, M1w(with wifi) in $7.5, the simple dev board with LCD, 2M pixel Camera, I2S MIC, Speaker PA, on board downloader, in $15.
    8. We have model shop soon, you can sell your model on our online modelshop.
    9. Our telgram link: https://t.me/joinchat/IoJz2UoLTnlscC0WrCCNrg
    Our qq group: 878189804 for AI, 826307240 (3k members group) for common discussion.

    1. Micropython was the first thing that I thought of, when seeing this chip.
      Great that you’re already working on it.

      Would you mind selling the boards also anywhere else?
      It’s not easy for us Westeners to order from Taoboa. Aliexpress would be nice, or maybe you could talk Banggood into selling those things. They’re already a good source for ESp8266/ESP32 stuff, so this would fit perfectly.

      1. hi, my other board(based on linux), LicheePi is on banggood already, and I will put Sipeed M1 on crowdsupply this month. I have many other better AI board in dev, the board in pic is just simple version, the board on crowdsupply will be much more fancy ~

    1. I think Taobao require you to sign in now before the options work. I bought them through an agent in the end anyway though (although you can create an account successfully with an overseas phone number).

  3. Hi, thanks for your report !
    I’m zepan, the shopkeeper of Sipeed Tech. Taobao shop, Sipeed M1 is our first AI moulde ,and the second RISC-V module.
    I need correct some mistake in your article, and supplement some information.
    1. The chip name is K210, not K201; and it is RV64GC.
    2. It have 8MB high speed SRAM, not 6MB.
    3. There are 5.9MB SRAM can be used for convolutional neural network acceleration, so, it is possible to run small model like tiny-yolo v2,MobileNet, as you see in face detection routine video.
    4. It isn’t preoder mode before 10.7, but it is too popular that it is sold out in 10.7, and I change it to preorder mode.

    1. More information:
      1. It can be up to 800MHz if you want, and give you more than 0.5TOPS, 240FPS detection speed @ QVGA.
      2. It is low power consume, unlike other ARM board; It only consume 0.35W when running face detection routine.
      3. Its IO speed in simple test is up to 300MHz
      4. It is possible to run openwrt on it as it have MMU, but too tight in ram.
      5. We’re adding micropython to the board, and soon you can simply run python on it, you know python is the best lang for DL.
      6. We also have Mic Array board, and another fancy board in development.
      7. We have best price for those solution, Sipeed M1 module in $6, M1w(with wifi) in $7.5, the simple dev board with LCD, 2M pixel Camera, I2S MIC, Speaker PA, on board downloader, in $15.
      8. We have model shop soon, you can sell your model on our online modelshop.
      9. Our telgram link: https://t.me/joinchat/IoJz2UoLTnlscC0WrCCNrg
      Our qq group: 878189804 for AI, 826307240 (3k members group) for common discussion.

  4. OV2640 because it’s cheap and dead simple to interface too.
    The trade off is poor imaging ability to compared to newer sensors.
    Well lit, day time, indoor, etc fine. Outside of that YMMV drastically.
    Just think of the use case carefully.

    1. Hi, we have OV5640 with M12 lens option too, the small OV2640 option is in the cheapest $15 suit( dev board contains LCD, 2M pixel Camera, I2S MIC, Speaker PA, on board downloader, etc.).
      Also, we have HDR solution for some application.

      1. and sadly again, it’s an 8 year old sensor with a small form factor and poor image quality.
        But it’s easy to interface, well supported code, and cheap.

        Garbage in, garbage out goes the saying.
        Or in other words, a better sensor up front and you’ll get better results in the back in different and poorer lighting conditions.
        Depending on your application – as long as it’s indoors in a well lit office you’ll have no problems.

        1. Hi, what u focus on is just sensor. The chip support DVP interface, u can replace any DVP interface sensor, like AR0330, which often used on CDR or IPC.
          And we have MIPI2DVP adapter, u can connect new MIPI camera too.

  5. What sort of neural network accelerator does this us? I ask because with documentation it may be possible to harness that computing power for other applications than neural network based machine learning. (e.g. if it resembles a set of DSP cores you may be able to do other calculations with the same primitives or if it resembles an FPGA you may be able to configure it for other tasks). I was disappointed that the Modvius chip from Intel didn’t come with hardware docs but rather just an opaque blob to convert neural network definitions (topology + weights) to a configuration image.
    While neural networks are the hotness at this moment there are scores of similar signal processing applications that would benefit from this sort of computational power (many small accumulators with programmable sequencing and dependency mapping). Most such workloads could be crammed somewhat awkwardly into the TensorFlow framework but it’d be nice to understand the underlying hardware anyway both for non-traditional workloads and for that matter for optimizing the speed/performance balance when building a traditional neural network).

    1. Kendryte K210 contains a convolutional neural network accelerator KPU, a cluster of 64 KLUs (Kendryte Arithmetic Logic Unit). KLU is a collection of all kinds of mathematical operation units, which can carry out various calculations, such as convolutional layer, pooling layer(including max pooling and average pooling), batchnormalization layer, activation layer(support custom activation, e.g. ReLU, sigmoid). It is low power but high efficiency.

      This is a demo video, shows k210 face detection (based on tiny-yolo)

      https://youtu.be/dcoc0GrYujM

      This shows k210 voice recognition.

      https://youtu.be/Qd3UwzvL9r8

      It have an APU (audio processing unit) which can process microphone array, and audio beamforming, sound source localization.

      https://youtu.be/wwKYoEVyWXc

      And most toolchain will opensource, the AI demo and KPU usage demo will open source soon.

      https://github.com/kendryte

    2. Kendryte is a series of AI chips which focus on IoT, and the 1st-gen are named K210. It have dual core 64 bit RISC-V CPU, which support RV64IMAFDC (RV64GC) ISA. It can process Convolution neural network.

      It have a KPU (Knowledge processing unit which can process convolution neural network.) inside, in KPU it have 64 KLUs (KLU is KPU’s Arithmetic Logical Unit). It can calculate convolutional layer, pooling layer (including max pooling and average pooling), batchnormalization layer, activation layer (including ReLU, sigmoid, and other custom activation functions).

      Also, k210 have an APU (audio processing unit) which can process microphone array, and audio beamforming, sound source localization.

      k210 face detection:
      https://youtu.be/dcoc0GrYujM

      k210 voice recognition
      https://youtu.be/Qd3UwzvL9r8

      k210 sound field imaging using microphone array
      https://youtu.be/wwKYoEVyWXc

      Most importantly, all toolchain will opensource at here.
      https://github.com/kendryte

      1. The most important thing you MUST do is to offer comprehensive english documentation.
        While reading feature list, I must think of the success of the espressive esp chips, but this thing is in the same price class and all in all far better, although different from esp32 and lacking some of the esp32 features.
        But it has some outstanding features:
        – a simple board is in the sub 10$ price region
        – it has a high frequency – in relation to other cheap controllers
        – it has dual core
        – it is the FIRST RISC-V chip, which is affordable
        – it has KI
        So go forward and translate quickly the documentation. Then the success will arise alone.

        One question: Are the “KI ALUs” directly programmable (i.e. are there registers documented completely) or are they only usable through some APIs from a blob?

        1. Hi, in the tech manual, all KPU registers region is listed, but the usage detail is not done yet; it should be done in this year.
          And the $6 module price is initial price, if u have good amount, the price can be in $5 region.
          In addition, there are many interesting features not documented, maybe them will be released in the future.

          1. Is there example code for using the KPU? And ultimately, are you envisioning people will write custom code for the KPU? Or is there plans to have it targeted by something like OpenCL?

  6. is it just me or does this seem like significant game changer? and important milestone, specifically for what it is and the insanely low price.
    and here i thought the esp 32 was a big deal. man was i wrong

    1. I think it’s been mentioned before, that it can take a month for it to get picked out of the tip queue for follow up, if it’s been busy lately. Then there’s some lead time eaten up when the author it’s assigned to does due diligence and maybe plays phone tag with the manufacturer for a week, or slogs something out over one email a day due to opposite time zones.

  7. The K210 looks really powerful and interesting. Didn’t expect RISC-V controllers to be available this soon.

    Unfortunately the K210 is packaged in a BGA with 0.65mm pitch – that is hard to solder without a pick’n’place and a proper reflow oven. This makes it hard for hobbyists to design their own boards with this IC.

    1. You have to create an account at Taobao, or purchase it through a Taobao agent.
      In the latter case, you’ll log in automatically, using their account, and their mail address.
      They’ll receive the goods in China, and forward them to you.

  8. Wow, I can’t wait to get one in my hands. I’m interested how good will power management be. Static RAM can mean really low power sleep mode with memory retention and 8MB is plenty of memory! With that you can do things impossible on ESP32, STM32 etc, like advanced audio or graphics processing, route planning or even run dumb web browser. And with good power management you can do all of these on device running forever on small batteries like old Palm or Psion palmtops.

    1. Audio Processor looks pretty impressive but I’m affraid it will be in some way locked for hobby use. Unfortunately in current docs there is almost nothing about power management (only stopping clock for DMA)

      1. APU is open source too, I get alpha version already, and it will release soon.
        it don’t have power management, and in dual core wfi mode, the power consume is about 30mw.

        1. Thanks for your response! 30mW looks good but on Kendryte site it is <300mW. Have you done some tests what is minimal power consumption without losing memory? Can you disable one core, slow down clock or stop it completely? If it is really static RAM it should theoretically be possible to stop all clocks completely. Open source APU is great news.

          1. Hi, 300mw is the power consume when running face detection.
            I have said in last post, in dual core WFI mode, power consume is 30mw. WFI is an instruction that cpu Wait For Interrupt.

  9. I’m thinking something like this could be trained to do accurate pose self-estimation for a system like the VIVE. Along with the processor and a other inputs it would be very good for robot kinematics.

  10. I would love to be able to buy a couple of these for school. I can’t read the foreign language, and Google Translate didn’t work for me. Is there a way I can just buy 3 to 5 full kits? When MicroPython gets working, it would be awesome to pair it with scratch, as that is what I teach 5th graders for my robotics class.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.