Going To The (Parallel) Chapel

July 6, 2025 by Al Williams 11 Comments

There is always the promise of using more computing power for a single task. Your computer has multiple CPUs now, surely. Your video card has even more. Your computer is probably networked to a slew of other computers. But how do you write software to take advantage of that? There are many complex systems, of course, but there’s also Chapel.

Chapel is a reasonably simple programming language, but it supports parallelism in various forms. The run time controls how computers — whatever that means — communicate with one another. You can have code running on your local CPUs, your GPU, and other processing elements over the network without much work on your part.

Continue reading “Going To The (Parallel) Chapel” →

Musings On A Good Parallel Computer

March 23, 2025 by Maya Posch 28 Comments

Until the late 1990s, the concept of a 3D accelerator card was something generally associated with high-end workstations. Video games and kin would run happily on the CPU in one’s desktop system, with later extensions like MMX, 3DNow!, and SSE providing a significant performance boost for games that supported them. As 3D accelerator cards (colloquially called graphics processing units, or GPUs) became prevalent, they took over almost all SIMD vector tasks, but one thing that they’re not good at is being a general-purpose parallel computer. This really ticked [Raph Levien] off and it inspired him to cover his grievances.

Although the interaction between CPUs and GPUs has become tighter over the decades, with PCIe in particular being a big improvement over AGP and PCI, GPUs are still terrible at running arbitrary computing tasks, and even PCIe links are still glacial compared to communication within the GPU and CPU dies. With the introduction of asynchronous graphic APIs this divide became even more intense. [Raph]’s proposal is to invert this relationship.

There’s precedent for this already, with Intel’s Larrabee and IBM’s Cell processor merging CPU and GPU characteristics on a single die, though both struggled with developing for such a new kind of architecture. Sony’s PlayStation 3 was forced to add a GPU due to these issues. There is also the DirectStorage API in DirectX, which bypasses the CPU when loading assets from storage, effectively adding CPU features to GPUs.

As [Raph] notes, so-called AI accelerators also have these characteristics, with often multiple SIMD-capable, CPU-like cores. Maybe the future is Cell after all.

So What Is A Supercomputer Anyway?

March 19, 2025 by Maya Posch 33 Comments

Over the decades there have been many denominations coined to classify computer systems, usually when they got used in different fields or technological improvements caused significant shifts. While the very first electronic computers were very limited and often not programmable, they would soon morph into something that we’d recognize today as a computer, starting with World War 2’s Colossus and ENIAC, which saw use with cryptanalysis and military weapons programs, respectively.

The first commercial digital electronic computer wouldn’t appear until 1951, however, in the form of the Ferranti Mark 1. These 4.5 ton systems mostly found their way to universities and kin, where they’d find welcome use in engineering, architecture and scientific calculations. This became the focus of new computer systems, effectively the equivalent of a scientific calculator. Until the invention of the transistor, the idea of a computer being anything but a hulking, room-sized monstrosity was preposterous.

A few decades later, more computer power could be crammed into less space than ever before including ever higher density storage. Computers were even found in toys, and amidst a whirlwind of mini-, micro-, super-, home-, minisuper- and mainframe computer systems, one could be excused for asking the question: what even is a supercomputer?

Continue reading “So What Is A Supercomputer Anyway?” →

Several Raspberry Pi Picos connected to each other

Raspberry Pi Pico Parallel Mandelbrot Computation

December 27, 2023 by Julian Scheffers 11 Comments

The Mandelbrot set is — when visualized with some colors — an interesting shape with infinite detail. While the patterns are immediately obvious to the human eye, anyone who’s run one can tell you that they’re pretty computationally expensive to produce. Fortunately, as with many things in graphics, rendering the Mandelbrot set can be easily parallelized.

That’s what [rak277] and [ir93] demonstrate in their RP2040-based finals project. Computron, as they call it, is a network of Raspberry Pi Picos that work together to compute a visualization of the Mandelbrot set and show it on a VGA display. The Computron is made of two or more “math units” and one “projection unit”. The math units communicate over a shared I²C bus with the projection unit to first divide the workload and then compute their share of the work.

This project shows both the strengths and limitations of parallel computation. It makes use of multiple math units on a highly parallelizable workload, but as more math units are added there are diminishing performance gains due to the increased communications load on the network, which [rak277] and [ir93] suspect to be the current bottleneck in the Computron.

If you’re fresh out of Pi Picos, and don’t mind waiting awhile, you could always crank out a Mandelbrot set on your trusty Atari 800 in BASIC.

Parallel Computing On The PicoCray RP2040 Cluster

April 9, 2023 by Joseph Long 28 Comments

[ExtremeElectronics] cleverly demonstrates that if one Raspberry Pi Pico is good, then nine must be awesome. The PicoCray project connects multiple Raspberry Pi Pico microcontroller modules into a parallel architecture leveraging an I2C bus to communicate between nodes.

The same PicoCray code runs on all nodes, but a grounded pin on one of the Pico modules indicates that it is to operate as the controller node. All of the remaining nodes operate as processor nodes. Each processor node implements a random back-off technique to request an address from the controller on the shared bus. After waiting a random amount of time, a processor will check if the bus is being used. If the bus is in use, the processor will go back to waiting. If the bus is not in use, the processor can request an address from the controller.

Once a processor node has an address, it can be sent tasks from the controller node. In the example application, these tasks involve computing elements of the Mandelbrot Set. The particular elements to be computed in a given task are allocated by the controller node which then later collects the results from each processor node and aggregates the results for display.

The name for this project is inspired by Seymore Cray. Our Father of the Supercomputer biography tells his story including why the Cray-1 Supercomputer was referred to as “the world’s most expensive loveseat.” For even more Cray-1 inspiration, check out this Raspberry Pi Zero Cluster.

Parsing PNGs Differently

January 17, 2022 by Matthew Carlson 5 Comments

There are millions of tiny bugs all around us, in everything from our desktop applications to the appliances in the kitchen. Hidden, arbitrary conditions that cause unintended outputs and behaviors. There are many ways to find these bugs, but one way we don’t hear about very often is finding a bug in your own code, only to realize someone else made the same mistake. For example, [David Buchanan] found a bug in his multi-threaded PNG decoder and realized that the Apple PNG decoder had the same bug.

PNG (Portable Network Graphics) is an image format just like JPEG, WEBP, or TIFF designed to replace GIFs. After a header, the rest of the file is entirely chunks. Each chunk is prepended by a four-letter identifier, with a few chunks being critical chunks. The essential sections are IHDR (the header), IDAT (actual image data), PLTE (the palette information), and IEND (the last chunk in the file). Compression is via the DEFLATE method used in zlib, which is inherently serial. If you’re interested, there’s a convenient poster about the format from a great resource we covered a while back.

Continue reading “Parsing PNGs Differently” →

Linux-Fu: Parallel Universe

June 29, 2020 by Al Williams 38 Comments

At some point, you simply run out of processing power. Admittedly, that point keeps getting further and further away, but you can still get there. If you run out of CPU time, the answer might be to add more CPUs. However, sometimes there are other bottlenecks like memory or disk space. However, it is also likely that you have access to multiple computers. Who doesn’t have a few Raspberry Pis sitting around their network? Or maybe a server in the basement? Or even some remote servers “in the cloud.” GNU Parallel is a tool that lets you spread work across multiple tasks either locally to remote machines. In some ways, it is simple, since it looks sort of like xargs but with parallel execution. On the other hand, it has myriad options and configurations that can make it a little daunting to use. Continue reading “Linux-Fu: Parallel Universe” →

Hackaday

parallel computing

14 Articles

Going To The (Parallel) Chapel

Musings On A Good Parallel Computer

So What Is A Supercomputer Anyway?

Raspberry Pi Pico Parallel Mandelbrot Computation

Parallel Computing On The PicoCray RP2040 Cluster

Parsing PNGs Differently

Linux-Fu: Parallel Universe

Search

Never miss a hack

If you missed it

Power Line Patrols: The Grid’s Eye In The Sky

A History Of Pong

Supersonic Flight May Finally Return To US Skies

The Death Of Industrial Design And The Era Of Dull Electronics

Power Grid Stability: From Generators To Reactive Power

Our Columns

Be More Axolotl: How Humans May One Day Regrow Limbs And Organs

Hackaday Links: July 27, 2025

Personalization, Industrial Design, And Hacked Devices

Hackaday Podcast Episode 330: Hover Turtles, Dull Designs, And K’nex Computers

This Week In Security: Sharepoint, Initramfs, And More

Search

Never miss a hack

Subscribe

If you missed it

Our Columns