80-PIC32 Cluster Does Fractals

One way to get around limitations in computing resources is to throw more computers at the problem. That’s why even cheap consumer-grade computers and phones have multiple cores in them. In supercomputing, it is common to have lots of processors with sophisticated sharing mechanisms.

[Henk Verbeek] decided to take 80 inexpensive PIC32 chips and build his own cluster programmed in — of all things — BASIC. The devices talk to each other via I2C. His example application plots fractals on another PIC32-based computer that has a VGA output. You can see a video of the device in action, below.

The slave boards are simple and use wire jumpers to select a different address on the I2C bus. Each has a multi-color LED that shows when it is working on a task and when the task is complete. So from a blinking light perspective, the computer is a success.

One problem with setups like this is having an efficient way to communicate between processors. [Henk] found that I2C is the bottleneck. Even though he has 80 CPUs, he found the fractal program bogged down if you applied more than twelve processors to the job.

One nice thing about Hackaday is you never have to ask why you did something like this. The fact is, this probably isn’t very practical as a parallel supercomputer. But it is still an interesting and educational project and might be the most CPUs we’ve ever seen running BASIC together.

Clusters of Raspberry Pis, of course, are nothing new. We’ve also looked at some that are more practical.

26 thoughts on “80-PIC32 Cluster Does Fractals

      1. Since throughput seems to be the problem, have every 13th processor compress the data and send it on to the end where it is uncompressed and compiled. Not sure if you’ll save time, which is the whole point.

        1. the best way to “save time” with a distributed Mandlebrot solution like this is “divide and conquer” mode that Fractint uses.

          rather than calculating line by line, you divide your target area in 2, check the corners, when you do the check to see if a point is “in” the set, don’t bother with doing the square root, if it gets over 4, it’s out.

          recursion on a microcontroller with limited stack space can be tricky though

  1. Ordered parts for a similar build last week. but my design will not be using i2c or basic. and is a little more inspired by http://apt.cs.manchester.ac.uk/projects/SpiNNaker/project/

    I am thinking of using stm32 “bluepill” as it have 3x UART ports, then i can connect each node to up to 3 other nodes (with full duplex on each connection?). no shared bus. it will be really funny to write the network protocol to handle data buffering and node failures, or dynamic expansion of the network?

    I also plan to have a bootloader that can reflash the FW, so you can push FW upgrades to individual nodes in the network (because a proper multi-device jtag interface is boring).

    but this is one of those “i buy the parts now, then if i get the time, i will implement this”. projects..
    you can always get access to the real thing if you just want to try your ideas without bothering with infrastructure..
    http://apt.cs.manchester.ac.uk/projects/SpiNNaker/project/Access/
    see this:

  2. 80 PIC32! 1 should be enough! I used Fractint software in the 80’s on PC of the time CPU were running at about same clock speed as a PIC32MX. FractInt was only using integer computation so I infer that single a PIC32MX could do the job as fast as 80386 of the time using a port of fractint to PIC32MX.

      1. This. Why pick a slow interpreted language that runs about 80x+ too slow, then try to make a cluster out of it (with a not good enough interconnect)? Just write it in another language and you won’t need the cluster for the same performance.

      1. All micros share the same Parallel Data and Address bus.

        Set up the Parallel Master port for auto address increment.
        Set the slave parallel port on each micro to different address(s).
        Set both DMA controllers to the same data length transfer.

        Master micro puts data chunks to be processed by salve micros into the DMA ram.
        DMA controller takes care of sending the data chunk via the parallel bus to the first slave micro.
        Then the DMA controller auto increments to the next salve micro and sends the next data chunk… etc for all the slave micros…

        Slave micros get a DMA interrupt once the entire data chunk has been placed in its local RAM.

        Slave micro then does its processing on the data and places the results back in DMA RAM.
        Sets interrupt to master micro (or master polls a status byte in DMA RAM).

        Master micro initiates a DMA transfer to get the processed data back from the slave micros.

        Repeat as needed. ;)

        1. Hmm I haven’t used the PIC parallel bus for more than direct communication with a display, so I have no idea about it’s features. But if the slave mode can be set to reply to a specific address request, that would probably work. For some reason, I didn’t even consider an address bus XD

    1. +1, Darren! I think sites should start rejecting videos that are not recorded horizontally. How many grade school Christmas plays have you seen from this season with everybody’s “kiddos” that were recorded vertically?

  3. For this kind of built, a standard message passing interface has proven to work well:
    Connect all neighbors (up, down, left, right, maybe also diagonal) with a parallel bus and design the software around that model.
    Take a look at the Intel Paragon parallel computer

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s