80-PIC32 Cluster Does Fractals

December 19, 2016

One way to get around limitations in computing resources is to throw more computers at the problem. That’s why even cheap consumer-grade computers and phones have multiple cores in them. In supercomputing, it is common to have lots of processors with sophisticated sharing mechanisms.

[Henk Verbeek] decided to take 80 inexpensive PIC32 chips and build his own cluster programmed in — of all things — BASIC. The devices talk to each other via I2C. His example application plots fractals on another PIC32-based computer that has a VGA output. You can see a video of the device in action, below.

The slave boards are simple and use wire jumpers to select a different address on the I2C bus. Each has a multi-color LED that shows when it is working on a task and when the task is complete. So from a blinking light perspective, the computer is a success.

One problem with setups like this is having an efficient way to communicate between processors. [Henk] found that I2C is the bottleneck. Even though he has 80 CPUs, he found the fractal program bogged down if you applied more than twelve processors to the job.

One nice thing about Hackaday is you never have to ask why you did something like this. The fact is, this probably isn’t very practical as a parallel supercomputer. But it is still an interesting and educational project and might be the most CPUs we’ve ever seen running BASIC together.

Clusters of Raspberry Pis, of course, are nothing new. We’ve also looked at some that are more practical.

[wpvideo G99TUlky]

26 thoughts on “80-PIC32 Cluster Does Fractals”

Ken says:

December 19, 2016 at 10:10 pm

To make better use of all these CPUs, try pipe-lining the work through a few processors to complete each unit of work.

Reply
1. cyberteque says:
  
  December 19, 2016 at 10:39 pm
  
  what form/protocol do you envisage this “pipeline” to take/use???
  
  Reply
  1. John says:
    
    December 21, 2016 at 2:44 pm
    
    Since throughput seems to be the problem, have every 13th processor compress the data and send it on to the end where it is uncompressed and compiled. Not sure if you’ll save time, which is the whole point.
    
    Reply
    1. cyberteque says:
      
      December 21, 2016 at 5:54 pm
      
      the best way to “save time” with a distributed Mandlebrot solution like this is “divide and conquer” mode that Fractint uses.
      
      rather than calculating line by line, you divide your target area in 2, check the corners, when you do the check to see if a point is “in” the set, don’t bother with doing the square root, if it gets over 4, it’s out.
      
      recursion on a microcontroller with limited stack space can be tricky though
      
      Reply
Anton Fosselius (@MaidenOne) says:

December 19, 2016 at 10:41 pm

Ordered parts for a similar build last week. but my design will not be using i2c or basic. and is a little more inspired by http://apt.cs.manchester.ac.uk/projects/SpiNNaker/project/

I am thinking of using stm32 “bluepill” as it have 3x UART ports, then i can connect each node to up to 3 other nodes (with full duplex on each connection?). no shared bus. it will be really funny to write the network protocol to handle data buffering and node failures, or dynamic expansion of the network?

I also plan to have a bootloader that can reflash the FW, so you can push FW upgrades to individual nodes in the network (because a proper multi-device jtag interface is boring).

but this is one of those “i buy the parts now, then if i get the time, i will implement this”. projects..
you can always get access to the real thing if you just want to try your ideas without bothering with infrastructure..
http://apt.cs.manchester.ac.uk/projects/SpiNNaker/project/Access/
see this:
https://www.youtube.com/watch?v=khRPnlDekIg

Reply
Jacques1956 says:

December 20, 2016 at 5:36 am

80 PIC32! 1 should be enough! I used Fractint software in the 80’s on PC of the time CPU were running at about same clock speed as a PIC32MX. FractInt was only using integer computation so I infer that single a PIC32MX could do the job as fast as 80386 of the time using a port of fractint to PIC32MX.

Reply
1. Stickben says:
  
  December 20, 2016 at 5:59 am
  
  I suspect that most of the computing power is going to interpreting BASIC.
  
  Reply
  1. meh says:
    
    December 20, 2016 at 6:07 pm
    
    This. Why pick a slow interpreted language that runs about 80x+ too slow, then try to make a cluster out of it (with a not good enough interconnect)? Just write it in another language and you won’t need the cluster for the same performance.
    
    Reply
    1. Benjamin Bird says:
      
      December 20, 2016 at 6:30 pm
      
      or just use wolfram alpha or matlab etc etc. I think your’e missing the point.
      
      Reply
      1. stickben says:
        
        December 21, 2016 at 11:55 am
        
        The setup definitely looks cool, but there looks like plenty of room for improvement for firmware.
Stickben says:

December 20, 2016 at 6:01 am

If your intent is speed, why in the world would you use BASIC?

Reply
1. FrankenPC says:
  
  December 20, 2016 at 8:29 am
  
  Functional programming is perfect for a project like this.
  
  Reply
Wolf says:

December 20, 2016 at 6:24 am

For high speed communications, I would use the built in Parallel port peripheral
http://ww1.microchip.com/downloads/en/DeviceDoc/60001128H.pdf
You can even set it up to use DMA. So incoming data (and outgoing data) just show up in the local micro’s RAM with no CPU intervention!

Reply
1. Fuchs says:
  
  December 20, 2016 at 7:42 am
  
  Well that’s fast, but it’s not exactly a great bus architecture on its own. How would you build a cluster with that port?
  
  Reply
  1. Wolf says:
    
    December 20, 2016 at 1:52 pm
    
    All micros share the same Parallel Data and Address bus.
    
    Set up the Parallel Master port for auto address increment.
    Set the slave parallel port on each micro to different address(s).
    Set both DMA controllers to the same data length transfer.
    
    Master micro puts data chunks to be processed by salve micros into the DMA ram.
    DMA controller takes care of sending the data chunk via the parallel bus to the first slave micro.
    Then the DMA controller auto increments to the next salve micro and sends the next data chunk… etc for all the slave micros…
    
    Slave micros get a DMA interrupt once the entire data chunk has been placed in its local RAM.
    
    Slave micro then does its processing on the data and places the results back in DMA RAM.
    Sets interrupt to master micro (or master polls a status byte in DMA RAM).
    
    Master micro initiates a DMA transfer to get the processed data back from the slave micros.
    
    Repeat as needed. ;)
    
    Reply
    1. Fuchs says:
      
      December 21, 2016 at 12:56 am
      
      Hmm I haven’t used the PIC parallel bus for more than direct communication with a display, so I have no idea about it’s features. But if the slave mode can be set to reply to a specific address request, that would probably work. For some reason, I didn’t even consider an address bus XD
      
      Reply
Darren says:

December 20, 2016 at 7:39 am

I don’t get why people who otherwise seem quite intelligent seem incapable of turning their phone the right way. https://www.youtube.com/watch?v=Bt9zSfinwFA

Reply
1. ScottV says:
  
  December 20, 2016 at 7:48 am
  
  +1, Darren! I think sites should start rejecting videos that are not recorded horizontally. How many grade school Christmas plays have you seen from this season with everybody’s “kiddos” that were recorded vertically?
  
  Reply
  1. JB says:
    
    December 20, 2016 at 9:36 am
    
    “I think sites should start rejecting videos that are not recorded horizontally. ”
    
    This!
    
    Reply
    1. opless says:
      
      December 21, 2016 at 12:09 pm
      
      This.
      
      Reply
Morofry says:

December 20, 2016 at 9:13 am

For a higher speed bus, could he not use the DMA channels to interface the UART without much CPU control and daisy chain them?

Reply
Mike bradley says:

December 20, 2016 at 12:42 pm

As a learning experiment, this is great, but by no means efficient, one pic32 can do this job. Don’t get me wrong, this is neat, and I’m sure the author learned a lot.

Reply
Tucson Tom says:

December 20, 2016 at 4:58 pm

A bad idea whose time has come.

Reply
Nimajamin says:

December 21, 2016 at 3:38 am

How about putting an FPGA in the middle, implement lots of i2c interfaces, all attached to a bunch of dual port BRAMs (like the ones within a Spartan 6) and get the data really flowing between the cores. :) Glue logic.

Reply
movax says:

December 21, 2016 at 8:41 am

For this kind of built, a standard message passing interface has proven to work well:
Connect all neighbors (up, down, left, right, maybe also diagonal) with a parallel bus and design the software around that model.
Take a look at the Intel Paragon parallel computer

Reply
1. chango says:
  
  December 21, 2016 at 9:07 pm
  
  6 links gives you the ability to do a 3D torus network.
  
  Reply

Hackaday

80-PIC32 Cluster Does Fractals

26 thoughts on “80-PIC32 Cluster Does Fractals”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Ask Hackaday: How Much Compute Is Enough?

WheatForce: Learning From CPU Architecture Mistakes

Improving FDM Filament Drying With A Spot Of Vacuum

Spy Tech: Conflicts Bring A New Number Station

The Most Secure, Modern Computer Might Be A Mac

Our Columns

Hackaday Podcast Episode 364: Clocks, Cameras, And Free Will

This Week In Security: The Supply Chain Has Problems

Sega Meganet: Online Gaming In 1990

Ask Hackaday: Using CoPilot? Are You Entertained?

Solar Balconies Take Europe By Storm

26 thoughts on “80-PIC32 Cluster Does Fractals”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns