One way to get around limitations in computing resources is to throw more computers at the problem. That’s why even cheap consumer-grade computers and phones have multiple cores in them. In supercomputing, it is common to have lots of processors with sophisticated sharing mechanisms.
[Henk Verbeek] decided to take 80 inexpensive PIC32 chips and build his own cluster programmed in — of all things — BASIC. The devices talk to each other via I2C. His example application plots fractals on another PIC32-based computer that has a VGA output. You can see a video of the device in action, below.
The slave boards are simple and use wire jumpers to select a different address on the I2C bus. Each has a multi-color LED that shows when it is working on a task and when the task is complete. So from a blinking light perspective, the computer is a success.
One problem with setups like this is having an efficient way to communicate between processors. [Henk] found that I2C is the bottleneck. Even though he has 80 CPUs, he found the fractal program bogged down if you applied more than twelve processors to the job.
One nice thing about Hackaday is you never have to ask why you did something like this. The fact is, this probably isn’t very practical as a parallel supercomputer. But it is still an interesting and educational project and might be the most CPUs we’ve ever seen running BASIC together.
Clusters of Raspberry Pis, of course, are nothing new. We’ve also looked at some that are more practical.
[wpvideo G99TUlky]
To make better use of all these CPUs, try pipe-lining the work through a few processors to complete each unit of work.
what form/protocol do you envisage this “pipeline” to take/use???
Since throughput seems to be the problem, have every 13th processor compress the data and send it on to the end where it is uncompressed and compiled. Not sure if you’ll save time, which is the whole point.
the best way to “save time” with a distributed Mandlebrot solution like this is “divide and conquer” mode that Fractint uses.
rather than calculating line by line, you divide your target area in 2, check the corners, when you do the check to see if a point is “in” the set, don’t bother with doing the square root, if it gets over 4, it’s out.
recursion on a microcontroller with limited stack space can be tricky though
Ordered parts for a similar build last week. but my design will not be using i2c or basic. and is a little more inspired by http://apt.cs.manchester.ac.uk/projects/SpiNNaker/project/
I am thinking of using stm32 “bluepill” as it have 3x UART ports, then i can connect each node to up to 3 other nodes (with full duplex on each connection?). no shared bus. it will be really funny to write the network protocol to handle data buffering and node failures, or dynamic expansion of the network?
I also plan to have a bootloader that can reflash the FW, so you can push FW upgrades to individual nodes in the network (because a proper multi-device jtag interface is boring).
but this is one of those “i buy the parts now, then if i get the time, i will implement this”. projects..
you can always get access to the real thing if you just want to try your ideas without bothering with infrastructure..
http://apt.cs.manchester.ac.uk/projects/SpiNNaker/project/Access/
see this:
https://www.youtube.com/watch?v=khRPnlDekIg
80 PIC32! 1 should be enough! I used Fractint software in the 80’s on PC of the time CPU were running at about same clock speed as a PIC32MX. FractInt was only using integer computation so I infer that single a PIC32MX could do the job as fast as 80386 of the time using a port of fractint to PIC32MX.
I suspect that most of the computing power is going to interpreting BASIC.
This. Why pick a slow interpreted language that runs about 80x+ too slow, then try to make a cluster out of it (with a not good enough interconnect)? Just write it in another language and you won’t need the cluster for the same performance.
or just use wolfram alpha or matlab etc etc. I think your’e missing the point.
The setup definitely looks cool, but there looks like plenty of room for improvement for firmware.
If your intent is speed, why in the world would you use BASIC?
Functional programming is perfect for a project like this.
For high speed communications, I would use the built in Parallel port peripheral
http://ww1.microchip.com/downloads/en/DeviceDoc/60001128H.pdf
You can even set it up to use DMA. So incoming data (and outgoing data) just show up in the local micro’s RAM with no CPU intervention!
Well that’s fast, but it’s not exactly a great bus architecture on its own. How would you build a cluster with that port?
All micros share the same Parallel Data and Address bus.
Set up the Parallel Master port for auto address increment.
Set the slave parallel port on each micro to different address(s).
Set both DMA controllers to the same data length transfer.
Master micro puts data chunks to be processed by salve micros into the DMA ram.
DMA controller takes care of sending the data chunk via the parallel bus to the first slave micro.
Then the DMA controller auto increments to the next salve micro and sends the next data chunk… etc for all the slave micros…
Slave micros get a DMA interrupt once the entire data chunk has been placed in its local RAM.
Slave micro then does its processing on the data and places the results back in DMA RAM.
Sets interrupt to master micro (or master polls a status byte in DMA RAM).
Master micro initiates a DMA transfer to get the processed data back from the slave micros.
Repeat as needed. ;)
Hmm I haven’t used the PIC parallel bus for more than direct communication with a display, so I have no idea about it’s features. But if the slave mode can be set to reply to a specific address request, that would probably work. For some reason, I didn’t even consider an address bus XD
I don’t get why people who otherwise seem quite intelligent seem incapable of turning their phone the right way. https://www.youtube.com/watch?v=Bt9zSfinwFA
+1, Darren! I think sites should start rejecting videos that are not recorded horizontally. How many grade school Christmas plays have you seen from this season with everybody’s “kiddos” that were recorded vertically?
“I think sites should start rejecting videos that are not recorded horizontally. ”
This!
This.
For a higher speed bus, could he not use the DMA channels to interface the UART without much CPU control and daisy chain them?
As a learning experiment, this is great, but by no means efficient, one pic32 can do this job. Don’t get me wrong, this is neat, and I’m sure the author learned a lot.
A bad idea whose time has come.
How about putting an FPGA in the middle, implement lots of i2c interfaces, all attached to a bunch of dual port BRAMs (like the ones within a Spartan 6) and get the data really flowing between the cores. :) Glue logic.
For this kind of built, a standard message passing interface has proven to work well:
Connect all neighbors (up, down, left, right, maybe also diagonal) with a parallel bus and design the software around that model.
Take a look at the Intel Paragon parallel computer
6 links gives you the ability to do a 3D torus network.