Supercomputers are always an impressive sight to behold, but also completely unobtainable for the ordinary person. But what if that wasn’t the case? [bitluni] shows us how it’s done with his 256-core RISC-V megacluster.
While the CH32V family of microcontrollers it’s based on aren’t nearly as powerful as what you’d traditionally find in a supercomputer, [bitluni] does use them to demonstrate a property of supercomputers: many, many cores doing the same task in parallel.
To recap our previous coverage, a single “supercluster” is made from 16 CH32V003 microcontrollers connected to each other with an 8-bit bus, with an LED on each and the remaining pins to an I/O expander. The megacluster is in turn made from 16 of these superclusters, which are put in pairs on 8 “blades” with a CH32V203 per square as a bridge between the supercluster and the main 8-bit bus of the megacluster, controlled by one last CH32V203.
[bitluni] goes into detail about designing PCBs that break KiCad, managing an overcrowded bus with 16 participants, culminating in a mesmerizing showcase of blinking LEDs showing that RC oscillators aren’t all that accurate.
As cool as it is – it is not mega – this is not even kilo.
Mega would be something in the NSA basement.
Yea just 1024x of those babies:
https://www.adapteva.com/announcements/epiphany-v-a-1024-core-64-bit-risc-processor/
… and story is …. Today is 2024 …..
Interesting exercise, kudos.
I wonder if this concept could have worked as a GreenArrays style type device, creating a kind of FORTHy FPGA?
Also, forgot to say this project is fascinating, kudos to bitluni!
That is what I am interested in as well. Communication with direct neighbors would be a game changer.
still old epiphany parallela is cheap and faster ;)
Just in case anybody else is wonder what are the ballpark specs of the components used.
CH32V203:
Qingke V4B, up to 144MHz system clock frequency.
Single-cycle multiplication and hardware division.
20KB SRAM, 64KB Flash. (maybe up to 64K SRAM for some variant)
CH32V003:
QingKe 32-bit RISC-V2A processor, supporting 2 levels of interrupt nesting
Maximum 48MHz system main frequency
2KB SRAM, 16KB Flash
While not amazing specs, it might but fun to does off the old Occam skills. Although not sure what is the current of state of RISC-V Occam compilers.
Hmm .. it is probably about the price:
>CH32V203:
>Qingke V4B, up to 144MHz system clock frequency.
I found ~1.5$ per MCU
CH32V003:
QingKe 32-bit RISC-V2A processor,
10 cent ?
And to investigate the multicore principles it is probably better to have as much cores as possible.
Make it calculate Mandelbrots in real time! Make it a fight against Steve Ciarcia’s Mandelbrot Machine. :)
https://archive.org/details/198810_byte_magazine_vol_13_10_hypertext_affordable_80386s_pdf__mlib/page/282/mode/2up
https://archive.org/details/198811_byte_magazine_vol_13_12_parallel_processing_next_project_management_pdf__mlib/page/398/mode/2up
https://archive.org/details/198812_byte_magazine_vol_13_13_mac_supplement_groupware_benchmark_update_pdf__mlib/page/326/mode/2up
I always found Steve’s Mandelbrot Machine a great introduction into multiprocessing.
These kind of things are basically educational, so the next challenge would be to teach people how they can write applications for these kind of computers.
Mandelbrot/fractals, traffic simulation, fluid dynamics, any task that can be massively parallelized.
*mumblemumble*Beowulf*mumblemumble*
We built a Beowulf cluster in college just to use up all the old computers we had lying around, and to see if we could do it.
“Sweet, we got it to work… now what do we do with it?” The guy who owned it tried to set it up as an automated music sharing server for mIRC back when it was still free and all you had to do was copy paste the right string in the main window of a channel to request a download. Then he tried parallel processing for CFD in Matlab. It sort of worked.
Its likely bitcoin mining in someone’s closet now.
While this is pretty cool, i wonder if a bunch of cheap FGPAs in this configuration (with DDR3 or something for each) would offer any significant processing power. Being able to run small LLMs?
The question i guess I’m asking is if DIY high performance computing truly out of reach for the average tinkerer. Are we limited to buying the latest and greatest GPU from nvidia or Intel or AMD?
2 questions:
Can you mine bitcoin?
Can it run Doom?
Yeah I know, evil sense of humor….