The five picos on two breadboards and the results of image convolution.

PentaPico: A Pi Pico Cluster For Image Convolution

Here’s something fun. Our hacker [Willow Cunningham] has sent us a copy of their homework. This is their final project for the “ECE 574: Cluster Computing” course at the University of Maine, Orono.

It was enjoyable going through the process of having a good look at everything in this project. The project is a “cluster” of 5x Raspberry Pi Pico microcontrollers — with one head node as the leader and four compute nodes that work on tasks. The software for both types of node is written in C. The head node is connected to a workstation via USB 1.1 allowing the system to be controlled with a Python script.

The cluster is configured to process an embarrassingly parallel image convolution. The input image is copied into the head node via USB which then divvies it up and distributes it to n compute nodes via I2C, one node at a time. Results are given for n = {1,2,4} compute nodes.

It turns out that the work of distributing the data dwarfs the compute by three orders of magnitude. The result is that the whole system gets slower the more nodes we add. But we’re not going to hold that against anyone. This was a fascinating investigation and we were impressed by [Willow]’s technical chops. This was a complicated project with diverse hardware and software challenges and they’ve done a great job making it all work and in the best scientific tradition.

It was fun reading their journal in which they chronicled their progress and frustrations during the project. Their final report in IEEE format was created using LaTeX and Overleaf, at only six pages it is an easy and interesting read.

For anyone interested in cluster tech be sure to check out the 256-core RISC-V megacluster and a RISC-V supercluster for very low cost.

Make A Secret File Stash In The Slack Space

Disk space is allocated in clusters of a certain size. When a file is written to disk and the file size is smaller than the cluster(s) allocated for it, there is an unused portion of varying size between the end of the file’s data and the end of the allocated clusters. This unused space is the slack space, it’s perfectly normal, and [Zachary Parish] had an idea to write a tool to hide data in it.

The demo uses a usb drive, using the slack space from decoy files to read and write data.

[Zachary]’s tool is in Python and can map available slack space and perform read and write operations on it, treating the disparate locations as a single unified whole in which to store arbitrary files. A little tar and gzip even helps makes things more efficient in the process.

There’s a whole demo implemented on Linux using a usb drive with some decoy files to maximize the slack space, and you can watch it in action in the video embedded below. It’s certainly more practical than hiding data in a podcast!

Note that this is just a demo of the concept. The approach does have potential for handling secret data, but [Zachary] points out that there are — from a serious data forensics point of view– a number of shortcomings in its current form. For example, the way the tool currently structures and handles data makes it quite obvious that something is going on in the slack space.

[Zachary] created this a few years ago and has some ideas about how to address those shortcomings and evolve the tool, so if you have ideas of your own or just want to try it out, the slack_hider GitHub repository is where you want to go.

Continue reading “Make A Secret File Stash In The Slack Space”

Printed Rack Holds Pair Of LattePandas In Style

ARM single-board computers like the Raspberry Pi are great for some applications — if you need something that’s energy efficient or can fit into a tight space, they’re tough to beat. But sometimes you’re stuck in the middle: you need more computational muscle than the average SBC can bring to the table, but at the same time, a full-size computer isn’t going to work for you.

Luckily, we now have options such as the LattePanda Mu powered by Intel’s quad-core N100 processor. Put a pair of these modules (with their associated carrier boards) on your desktop, and you’ve got considerable number-crunching capabilities in a relatively small package. Thanks to [Jay Doscher] we’ve got a slick 3D printed rack that can keep them secure and cool, complete with the visual flair that we’ve come to expect from his creations.

Continue reading “Printed Rack Holds Pair Of LattePandas In Style”

Fully 3D Printed Case Is Stacked High With Mini PCs

Over the years we’ve seen no shortage of 3D printed cases designed to hold several Raspberry Pi computers, often with the intent to use them as convenient desktop-sized platforms for experimenting with concepts such as server load balancing and redundancy.

The reason the Pi was always the star of the show is simple enough to explain: they were small and cheap. But while the Pi has only gotten more expensive over the years, x86 machines have gotten smaller and cheaper. Which is how a project like the N100 Obelisk was born.

Continue reading “Fully 3D Printed Case Is Stacked High With Mini PCs”

A RISC-V Supercluster For Very Low Cost

As ARM continues to make inroads in the personal computing space thanks to its more modern and streamlined instruction set architecture (ISA) and its reduced power demands especially compared to x86 machines, the main reason it continues to become more widespread is how easy it is to get a license to make chips using this ISA. It’s still not a fully open source instruction set, though, so if you want something even more easily accessible than ARM you’ll need to find something like these chips running the fully open-source RISC-V ISA and possibly put them to work in a custom supercluster.

[bitluni] recently acquired a large number of CH32V003 microcontrollers and managed to configure them all to work together in a cluster. The entire array is only $2 (not including all of the other components attached to the board) so a cluster of arbitrary size is potentially possible. [bitluni] built a four-layer PCB for this project with an 8-bit bus so the microcontrollers can communicate with each other. Each chip has its own ADC and I/O that are wired to a set of GPIO pins on the sides of the board. The build is rounded out with a USB interface for programming and power.

There were a few quirks to get this supercluster up and running, including some issues with the way the reset and debug pins work on these specific microcontrollers. With some bugs like this out of the way, the entire cluster is up and running, and [bitluni] hints that his design could be easily interfaced with even larger RISC-V superclusters. As for a use for this build, sometimes clusters like these are built just to build them, but since the I/O and ADCs are accessible in theory this cluster could do anything a larger microcontroller might be able to do, only at a much lower price.

Continue reading “A RISC-V Supercluster For Very Low Cost”

Picture of the miniJen structure on a presentation desk

A Jenkins Demo Stand For Modern Times

Once you’re working on large-scale software projects, automation is a lifesaver, and Jenkins is a strong player in open-source automation – be it software builds, automated testing or deploying onto your servers. Naturally, it’s historically been developed with x86 infrastructure in mind, and let’s be fair, x86 is getting old. [poddingue], a hacker and a Jenkins contributor, demonstrates that Jenkins keeps up with the times, with a hardware demo stand called miniJen, that has Jenkins run on three non-x86 architectures – arm8v (aarch64), armv7l and RISC-V.

There’s four SBCs of different architectures involved in this, three acting as Jenkins agents executing tasks, and one acting as a controller, all powered with a big desktop PSU from Pine64. The controller’s got a bit beefier CPU for a reason – at FOSDEM, we’ve seen it drive a separate display with a Jenkins dashboard. It’s very much a complete demo for its purpose, and definitely an eyecatcher for FOSDEM attendees passing by the desk! As a bonus, there’s also a fascinating blog post about how [poddingue] got to running Jenkins on RISC-V in particular.

Even software demonstrations get better with hardware, and this stood out no doubt! Looking to build a similar demo, or wondering how it came together? [poddingue] has blog posts on the demo’s structure, a repo with OpenSCAD files, and a trove of videos demonstrating the planning, design and setup process. As it goes with continuous integrations, we’ve generally seen hackers and Jenkins collide when it comes to build failure alerts, from rotating warning lights to stack lights to a Christmas tree; however, we’ve also seen a hacker use it to keep their firmware size under control between code changes. And, if you’re wondering what continuous integration holds for you, here’s our hacker-oriented deep dive.

The 13.5 Million Core Computer

Having a dual- or quad-core CPU is not very exotic these days and CPUs with 12 or even 16 cores aren’t that rare. The Andromeda from Cerebras is a supercomputer with 13.5 million cores. The company claims it is one of the largest AI supercomputers ever built (but not the largest) and can perform 120 Petaflops of “dense compute.”

We aren’t sure about the methodology, but they also claim more than one exaflop of “AI computing.” The computer has a fabric backplane that can handle 96.8 terabits per second between nodes. According to a post on Extreme Tech, the core technology is a 3-plane wafer processor, WSE-2. One plane is for communications, one holds 40 GB of static RAM, and the math plane has 850,000 independent cores and 3.4 million floating point units.

The data is sent to the cores and collected by a bank of 64-core AMD EPYC 3 processors. Andromeda is optimized to handle sparse matrix computations. The company claims that the performance scales “almost linearly.” That is, as you double the number of cores used, you roughly half the total run time.

The machine is available for remote use and cost about $35 million to build. Since it uses 500 kW at peak run times, it isn’t free to operate, either. Extreme Tech notes that the Frontier computer at Oak Ridge National Labs is both larger and more precise, but it cost $600 million, so you’d expect it to be more capable.

Most homebrew “supercomputers” we see are more for learning how to work with clusters than trying to hit this sort of performance. Of course, if you have a modern graphics card, OpenCL and CUDA will let you do some of this, too, but at a much lesser scale.