PentaPico: A Pi Pico Cluster For Image Convolution

May 20, 2025

Here’s something fun. Our hacker [Willow Cunningham] has sent us a copy of their homework. This is their final project for the “ECE 574: Cluster Computing” course at the University of Maine, Orono.

It was enjoyable going through the process of having a good look at everything in this project. The project is a “cluster” of 5x Raspberry Pi Pico microcontrollers — with one head node as the leader and four compute nodes that work on tasks. The software for both types of node is written in C. The head node is connected to a workstation via USB 1.1 allowing the system to be controlled with a Python script.

The cluster is configured to process an embarrassingly parallel image convolution. The input image is copied into the head node via USB which then divvies it up and distributes it to n compute nodes via I²C, one node at a time. Results are given for n = {1,2,4} compute nodes.

It turns out that the work of distributing the data dwarfs the compute by three orders of magnitude. The result is that the whole system gets slower the more nodes we add. But we’re not going to hold that against anyone. This was a fascinating investigation and we were impressed by [Willow]’s technical chops. This was a complicated project with diverse hardware and software challenges and they’ve done a great job making it all work and in the best scientific tradition.

It was fun reading their journal in which they chronicled their progress and frustrations during the project. Their final report in IEEE format was created using LaTeX and Overleaf, at only six pages it is an easy and interesting read.

For anyone interested in cluster tech be sure to check out the 256-core RISC-V megacluster and a RISC-V supercluster for very low cost.

22 thoughts on “PentaPico: A Pi Pico Cluster For Image Convolution”

Richard says:

May 20, 2025 at 1:30 am

“It turns out that the work of distributing the data dwarfs the compute by three orders of magnitude.”
An excellent project and an excellent learning exercise. Nice work. :) And this experience will inform his work for years. This is why doing these projects are so important, to learn. Core principled of science is reproducibility and not taking someone’s word for it. Detractors of these kinds of project say “what is the point, just get a cheap PC” and they are missing the point and the fun in building these mini clusters.

Good work Willow Cunningham. :)

Report comment

Reply
1. halogenek says:
  
  May 20, 2025 at 2:12 am
  
  +1
  
  Report comment
  
  Reply
2. Miles says:
  
  May 20, 2025 at 9:35 am
  
  Now you know why PCIe 7.0 is going to be so fast.
  
  Seems that another student could use this project as a springboard to implement faster data transfers via less conventional means.
  
  Report comment
  
  Reply
3. Greg A says:
  
  May 20, 2025 at 1:55 pm
  
  hahah yeah my heart was exploding with “that the work of distributing the data dwarfs the compute by three orders of magnitude!!!” but then there it was right there on the page
  
  Report comment
  
  Reply
4. Dan says:
  
  May 20, 2025 at 3:47 pm
  
  As is so often the case, it’s quicker to do the job yourself than to delegate it to someone else…
  
  Report comment
  
  Reply
Gardoni says:

May 20, 2025 at 2:16 am

But why waste time on hardware when the entire project could be done with QEMU on a single 8-core CPU. It’s not like it’s still 2008 and dual-core Wolfdale is the duck’s guts.

Report comment

Reply
1. John Elliot V says:
  
  May 20, 2025 at 5:05 am
  
  You’re no fun! :)
  
  Report comment
  
  Reply
2. Miles says:
  
  May 20, 2025 at 9:43 am
  
  12 and 20 core CPUs are cheap. 14700 is around $200 if you go looking. Tough to beat for raw compute. Did a BMW Blender render in 24 seconds on the one I built. 14400 is probably the price/performance leader though. Although when I see a $154 12700KF (Amazon) or $170 Ryzen 7700 (SZCPU) it gives me pause.
  
  Report comment
  
  Reply
  1. bebop says:
    
    May 20, 2025 at 11:55 am
    
    The point is the challenge, not the result.
    
    Report comment
    
    Reply
    1. Miles says:
      
      May 20, 2025 at 3:40 pm
      
      I was just saying if you are going QEMU why stop at 8 cores?
      8 is midrange, soon to be entry level.
      Also a modern $80 6 core like 8400F can probably do 8 cores worth of work.
      
      Report comment
      
      Reply
  2. fhunter says:
    
    May 21, 2025 at 6:19 am
    
    Power to run them, though is not cheap.
    
    Report comment
    
    Reply
frenchone says:

May 20, 2025 at 2:59 am

Is it convolution or deconvolution in the image ?

Report comment

Reply
1. John Elliot V says:
  
  May 20, 2025 at 5:02 am
  
  I am not an expert but I do believe convolution is the correct term. https://en.wikipedia.org/wiki/Convolution#Discrete_convolution
  
  Report comment
  
  Reply
2. ziggurat29 says:
  
  May 20, 2025 at 5:25 am
  
  looks like deconvolution since it’s deblurring, but since deconvolution is convolution with the inverse point spread function, then I guess you can also call it ‘convolution with a kernel for sharpening’
  
  Report comment
  
  Reply
shinsukke says:

May 20, 2025 at 3:00 am

Maybe he shouldn’t have used I2C for inter node communication? Its all about the learning

Report comment

Reply
1. C says:
  
  May 20, 2025 at 4:13 am
  
  I was thinking the same. I hate I2C with a passion. Slow and unreliable. SPI would have been better. I think you can even implement QSPI with the PIO blocks.
  
  Report comment
  
  Reply
2. RunnerPack says:
  
  May 20, 2025 at 4:48 am
  
  There’s enough (in this case unused) I/O to make an extremely high bandwidth custom communications channel between these chips, but I somewhat understand the desire to go with something standard, to reduce complexity of the overall project.
  
  Report comment
  
  Reply
3. John Elliot V says:
  
  May 20, 2025 at 5:05 am
  
  Yes I was thinking the same thing. It would be interesting to run the experiment again using SPI. Just for fun, naturally. I would be that the system would still be IO bound even if we used SPI or QSPI.
  
  Report comment
  
  Reply
4. TDT says:
  
  May 20, 2025 at 6:19 am
  
  Eyup
  QSPI, or even better if the project allows it, shared RAM module.
  Granted, one have to order the access, but that shouldn’t be too hard to do.
  
  But kudoes anyway to him, that’s a very nice project as is!
  
  Report comment
  
  Reply
  1. Miles says:
    
    May 20, 2025 at 9:39 am
    
    I think a Pico can be used as a virtual RAM chip, so it’s possible that could handle the accesses.
    
    Also if the project scales up it might be better to use bare RP2040 (or the new chip), and share the power lanes. Imagine a SBC size device with 10-20 RP2040 chips on it 😂
    
    Report comment
    
    Reply
5. Charles Springer says:
  
  May 20, 2025 at 7:20 am
  
  I think SPI with DMA would be nearly ideal for moving image rows between devices. Or maybe a 4 or 8 bit parallel with DMA. I did not see DMA in the docs but the M0 can do it.
  
  Report comment
  
  Reply
RunnerPack says:

May 20, 2025 at 4:49 am

Might be fun to use the cluster to run some machine learning stuff.

Report comment

Reply

Hackaday

PentaPico: A Pi Pico Cluster For Image Convolution

22 thoughts on “PentaPico: A Pi Pico Cluster For Image Convolution”

Leave a Reply to shinsukkeCancel reply

Search

Never miss a hack

If you missed it

Big Chemistry: Fuel Ethanol

The World Wide Web And The Death Of Graceful Degradation

Remembering The ISP That David Bowie Ran For Eight Years

Falling Down The Land Camera Rabbit Hole

RTEMS Statement Deepens Libogc License Controversy

Our Columns

Field Testing An Antenna, Using A Field

The Make-roscope

Keebin’ With Kristina: The One With The Wafer-Thin Keyboard

Hackaday Links: May 18, 2025

Open Source Hiding In Plain Sight

22 thoughts on “PentaPico: A Pi Pico Cluster For Image Convolution”

Leave a Reply to shinsukkeCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns