256-Core RISC-V Megacluster

Supercomputers are always an impressive sight to behold, but also completely unobtainable for the ordinary person. But what if that wasn’t the case? [bitluni] shows us how it’s done with his 256-core RISC-V megacluster.

While the CH32V family of microcontrollers it’s based on aren’t nearly as powerful as what you’d traditionally find in a supercomputer, [bitluni] does use them to demonstrate a property of supercomputers: many, many cores doing the same task in parallel.

To recap our previous coverage, a single “supercluster” is made from 16 CH32V003 microcontrollers connected to each other with an 8-bit bus, with an LED on each and the remaining pins to an I/O expander. The megacluster is in turn made from 16 of these superclusters, which are put in pairs on 8 “blades” with a CH32V203 per square as a bridge between the supercluster and the main 8-bit bus of the megacluster, controlled by one last CH32V203.

[bitluni] goes into detail about designing PCBs that break KiCad, managing an overcrowded bus with 16 participants, culminating in a mesmerizing showcase of blinking LEDs showing that RC oscillators aren’t all that accurate.

16 thoughts on “256-Core RISC-V Megacluster

  1. Just in case anybody else is wonder what are the ballpark specs of the components used.

    Qingke V4B, up to 144MHz system clock frequency.
    Single-cycle multiplication and hardware division.
    20KB SRAM, 64KB Flash. (maybe up to 64K SRAM for some variant)

    QingKe 32-bit RISC-V2A processor, supporting 2 levels of interrupt nesting
    Maximum 48MHz system main frequency
    2KB SRAM, 16KB Flash

    While not amazing specs, it might but fun to does off the old Occam skills. Although not sure what is the current of state of RISC-V Occam compilers.

  2. Hmm .. it is probably about the price:
    >Qingke V4B, up to 144MHz system clock frequency.
    I found ~1.5$ per MCU

    QingKe 32-bit RISC-V2A processor,
    10 cent ?

    And to investigate the multicore principles it is probably better to have as much cores as possible.

  3. Make it calculate Mandelbrots in real time! Make it a fight against Steve Ciarcia’s Mandelbrot Machine. :)


    I always found Steve’s Mandelbrot Machine a great introduction into multiprocessing.

    These kind of things are basically educational, so the next challenge would be to teach people how they can write applications for these kind of computers.

    Mandelbrot/fractals, traffic simulation, fluid dynamics, any task that can be massively parallelized.

    1. We built a Beowulf cluster in college just to use up all the old computers we had lying around, and to see if we could do it.

      “Sweet, we got it to work… now what do we do with it?” The guy who owned it tried to set it up as an automated music sharing server for mIRC back when it was still free and all you had to do was copy paste the right string in the main window of a channel to request a download. Then he tried parallel processing for CFD in Matlab. It sort of worked.

      Its likely bitcoin mining in someone’s closet now.

  4. While this is pretty cool, i wonder if a bunch of cheap FGPAs in this configuration (with DDR3 or something for each) would offer any significant processing power. Being able to run small LLMs?

    The question i guess I’m asking is if DIY high performance computing truly out of reach for the average tinkerer. Are we limited to buying the latest and greatest GPU from nvidia or Intel or AMD?

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.