Several Raspberry Pi Picos connected to each other

Raspberry Pi Pico Parallel Mandelbrot Computation

The Mandelbrot set is — when visualized with some colors — an interesting shape with infinite detail. While the patterns are immediately obvious to the human eye, anyone who’s run one can tell you that they’re pretty computationally expensive to produce. Fortunately, as with many things in graphics, rendering the Mandelbrot set can be easily parallelized.

That’s what [rak277] and [ir93] demonstrate in their RP2040-based finals project. Computron, as they call it, is a network of Raspberry Pi Picos that work together to compute a visualization of the Mandelbrot set and show it on a VGA display. The Computron is made of two or more “math units” and one “projection unit”. The math units communicate over a shared I²C bus with the projection unit to first divide the workload and then compute their share of the work.

This project shows both the strengths and limitations of parallel computation. It makes use of multiple math units on a highly parallelizable workload, but as more math units are added there are diminishing performance gains due to the increased communications load on the network, which [rak277] and [ir93] suspect to be the current bottleneck in the Computron.

If you’re fresh out of Pi Picos, and don’t mind waiting awhile, you could always crank out a Mandelbrot set on your trusty Atari 800 in BASIC.

PicoCray - Raspberry Pi Pico Cluster

Parallel Computing On The PicoCray RP2040 Cluster

[ExtremeElectronics] cleverly demonstrates that if one Raspberry Pi Pico is good, then nine must be awesome.  The PicoCray project connects multiple Raspberry Pi Pico microcontroller modules into a parallel architecture leveraging an I2C bus to communicate between nodes.

The same PicoCray code runs on all nodes, but a grounded pin on one of the Pico modules indicates that it is to operate as the controller node.  All of the remaining nodes operate as processor nodes.  Each processor node implements a random back-off technique to request an address from the controller on the shared bus. After waiting a random amount of time, a processor will check if the bus is being used.  If the bus is in use, the processor will go back to waiting.  If the bus is not in use, the processor can request an address from the controller.

Once a processor node has an address, it can be sent tasks from the controller node.  In the example application, these tasks involve computing elements of the Mandelbrot Set. The particular elements to be computed in a given task are allocated by the controller node which then later collects the results from each processor node and aggregates the results for display.

The name for this project is inspired by Seymore Cray. Our Father of the Supercomputer biography tells his story including why the Cray-1 Supercomputer was referred to as “the world’s most expensive loveseat.” For even more Cray-1 inspiration, check out this Raspberry Pi Zero Cluster.

Parsing PNGs Differently

There are millions of tiny bugs all around us, in everything from our desktop applications to the appliances in the kitchen. Hidden, arbitrary conditions that cause unintended outputs and behaviors. There are many ways to find these bugs, but one way we don’t hear about very often is finding a bug in your own code, only to realize someone else made the same mistake. For example, [David Buchanan] found a bug in his multi-threaded PNG decoder and realized that the Apple PNG decoder had the same bug.

PNG (Portable Network Graphics) is an image format just like JPEG, WEBP, or TIFF designed to replace GIFs. After a header, the rest of the file is entirely chunks. Each chunk is prepended by a four-letter identifier, with a few chunks being critical chunks. The essential sections are IHDR (the header), IDAT (actual image data), PLTE (the palette information), and IEND (the last chunk in the file). Compression is via the DEFLATE method used in zlib, which is inherently serial. If you’re interested, there’s a convenient poster about the format from a great resource we covered a while back.

Continue reading “Parsing PNGs Differently”

Linux-Fu: Parallel Universe

At some point, you simply run out of processing power. Admittedly, that point keeps getting further and further away, but you can still get there. If you run out of CPU time, the answer might be to add more CPUs. However, sometimes there are other bottlenecks like memory or disk space. However, it is also likely that you have access to multiple computers. Who doesn’t have a few Raspberry Pis sitting around their network? Or maybe a server in the basement? Or even some remote servers “in the cloud.” GNU Parallel is a tool that lets you spread work across multiple tasks either locally to remote machines. In some ways, it is simple, since it looks sort of like xargs but with parallel execution. On the other hand, it has myriad options and configurations that can make it a little daunting to use. Continue reading “Linux-Fu: Parallel Universe”

Neural Nets In The Browser: Why Not?

We keep seeing more and more Tensor Flow neural network projects. We also keep seeing more and more things running in the browser. You don’t have to be Mr. Spock to see this one coming. TensorFire runs neural networks in the browser and claims that WebGL allows it to run as quickly as it would on the user’s desktop computer. The main page is a demo that stylizes images, but if you want more detail you’ll probably want to visit the project page, instead. You might also enjoy the video from one of the creators, [Kevin Kwok], below.

TensorFire has two parts: a low-level language for writing massively parallel WebGL shaders that operate on 4D tensors and a high-level library for importing models from Keras or TensorFlow. The authors claim it will work on any GPU and–in some cases–will be actually faster than running native TensorFlow.

Continue reading “Neural Nets In The Browser: Why Not?”

Using The GPU From JavaScript

Everyone knows that writing programs that exploit the GPU (Graphics Processing Unit) in your computer’s video card requires special arcane tools, right? Well, thanks to [Matthew Saw], [Fazil Sapuan], and [Cheah Eugene], perhaps not. At a hackathon, they turned out a Javascript library that allows you to create “kernel” functions to execute on the GPU of the target system. There’s a demo available with a benchmark which on our machine sped up a 512×512 calculation by well over five times. You can download the library from the same page. There’s also a GitHub page.

The documentation is a bit sparse but readable. You simply define the function you want to execute and the dimensions of the problem. You can specify one, two, or three dimensions, as suits your problem space. When you execute the associated function it will try to run the kernels on your GPU in parallel. If it can’t, it will still get the right answer, just slowly.

Continue reading “Using The GPU From JavaScript”

80-PIC32 Cluster Does Fractals

One way to get around limitations in computing resources is to throw more computers at the problem. That’s why even cheap consumer-grade computers and phones have multiple cores in them. In supercomputing, it is common to have lots of processors with sophisticated sharing mechanisms.

[Henk Verbeek] decided to take 80 inexpensive PIC32 chips and build his own cluster programmed in — of all things — BASIC. The devices talk to each other via I2C. His example application plots fractals on another PIC32-based computer that has a VGA output. You can see a video of the device in action, below.

Continue reading “80-PIC32 Cluster Does Fractals”