Programming Ada: Designing A Lock-Free Ring Buffer

Ring buffers are incredibly useful data structures that allow for data to be written and read continuously without having to worry about where the data is being written to or read from. Although they present a continuous (ring) buffer via their API, internally a definitely finite buffer is being maintained. This makes it crucial that at no point in time the reading and writing events can interfere with each other, something which can be guaranteed in a number of ways. Obviously the easiest solution here is to use a mutual exclusion mechanism like a mutex, but this comes with a severe performance penalty.

A lock-free ring buffer (LFRB) accomplishes the same result without something like a mutex (lock), instead using a hardware feature like atomics. In this article we will be looking at how to design an LFRB in Ada, while comparing and contrasting it with the C++-based LFRB that it was ported from. Although similar in some respects, the Ada version involves Ada-specific features such as access types and the rendezvous mechanism with task types (‘threads’).

Continue reading “Programming Ada: Designing A Lock-Free Ring Buffer”

Building The Unreleased Lemmings Arcade Cabinet From 1991

Back in the early 90s the world was almost graced with an arcade version of Lemmings, but after a few board revisions it was abandoned in 1991. Now the folk over at UK-based [RMC – The Cave] on YouTube have managed to not only get their mitts on a nearly finished prototype board, but have also designed and built a period-appropriate cabinet to go with it. This involved looking at a range of arcade cabinets created by Data East and picking a design that would allow both for the two-player mode of the game, and fit the overall style.

The finished Lemmings arcade cabinet. (Credit: RMC – The Cave, YouTube)

Arcade cabinets came in a wide range of cabinet styles and control layouts, largely defined by the game’s requirements, but sometimes with flourishes to distinguish the cabinet from the hundred others in the same arcade.

In this particular case the typical zig-zag (Z-back) style was found to be a good fit as on the Data East Night Slashers 1993-era cabinet, which then mostly left the controls (with two trackballs) and cabinet art to figure out. Fortunately there is plenty of inspiration when it comes to Lemmings art, leading to the finished cabinet with the original mainboard, the JAMMA wiring harness with MultiPi JAMMA controller, a 19″ CRT monitor and other components including the 3D printed controls panel.

With more and more new arcades popping up in the US and elsewhere, perhaps we’ll see these Lemmings arcade cabinets appear there too, especially since the ROMs on the prototype board were dumped for convenient MAME-ing.

Continue reading “Building The Unreleased Lemmings Arcade Cabinet From 1991″

Getting Linux Process List Without Forking Using Just A Bash Script

The ps command is extremely useful when you want to get some quick information on active system processes (hence the name), especially followed by piping it into grep and kin for some filtering. One gotcha is of course that ps doesn’t run in the current shell process, but is forked off into its own process, so what if everything goes wrong and you absolutely need to run ps aux on a system that is completely and utterly out of fresh process IDs to hand out? In that scenario, you fortunately can write a shell script that does the same, but all within the same shell, as [Isabella Bosia] did, with a Bash shell script.

The how and why is mostly covered in the shell script itself, using detailed comments. Initially the hope was to just read out and parse the contents of /proc/<pid>/status, but that doesn’t have details like CPU%. The result is a bit more parsing to get the desired result, as well as a significant amount of cussing in the comments. Even if it’s not entirely practical, as the odds of ending up on a system with zero free PIDs are probably between zero and NaN, but as an ‘entertaining’ job interview question and example of all the fun things one can do with shell scripting it’s definitely highly recommended.

A Look At The Intel N100 Radxa X4 SBC

Recently Radxa released the X4, which is an SBC containing not only an N100 x86_64 SoC but also an RP2040  MCU connected to a Raspberry Pi-style double pin header. The Intel N100 is one of a range of Alder Lake-N SoCs which are based on a highly optimized version of the Skylake core, first released in 2015. These cores are also used as ‘efficiency’ cores in Intel’s desktop CPUs. Being x86-based, this means that the Radxa X4 can run any Linux, Windows and other OS from either NVMe (PCIe 3.0 x4) or eMMC storage. After getting his hands on one of these SBCs, [Bret] couldn’t wait to take a gander at what it can do.

Installing Windows 11 and Debian 12 on a 500 GB NVMe (2230) SSD installed on the X4 board worked pretty much as expected on an x86 system, with just some missing drivers for the onboard Intel 2.5 Gbit Ethernet and WiFi, depending on the OS, but these were easily obtained via the Intel site and installed. The board comes with an installed RTC battery and a full-featured AMI BIOS, as well as up to 16 GB of LPPDR5 RAM.

Using the system with the Radxa PoE+ HAT via the 2.5 Gbit Ethernet port also worked a treat once using a quality PoE switch, even with the N100’s power level set to 15 Watt from the default 6. The RP2040 MCU on the mainboard is connected to the SoC using both USB 2.0 and UART, according to the board schematic. This means that from the N100 all of the Raspberry Pi-style pins can be accessed, making it in many ways a more functional SBC than the Raspberry Pi 5, with a similar power envelope and cost picture.

At $80 USD before shipping for the 8 GB (no eMMC) version that [Bret] looked at one might ask whether an N100-based MiniPC could be competitive, albeit that features like PoE+  and integrated RPi-compatible header are definite selling points.

The BiVACOR Total Artificial Heart: A Maglev Bridge To Life

The BiVACOR THA hooked up with the CTO Dianiel Timms in the background. (Credit: BiVACOR)
The BiVACOR THA hooked up with the CTO Dianiel Timms in the background. (Credit: BiVACOR)

Outside of the brain, the heart is probably the organ that you miss the most when it ceases to function correctly. Unfortunately, as we cannot grow custom replacement hearts yet, we have to keep heart patients alive long enough for them to receive a donor heart. Yet despite the heart being essentially a blood pump, engineering even a short-term artificial replacement has been a struggle for many decades. A new contender has now arrived in the BiVACOR TAH (total artificial heart), which just had the first prototype implanted in a human patient.

Unlike the typical membrane-based pumps, the BiVACOR TAH is a rotary pump that uses an impeller-based design with magnetic levitation replacing bearings and theoretically minimizing damage to the blood. This design should also mean a significant flowrate, enough even for an exercising adult. Naturally, this TAH is only being tested as a bridge-to-transplant solution, for patients with a failing heart who do not qualify for a ventricular assist device. This may give more heart patients a chance to that donor heart transplant, even if a TAH as a destination therapy could save so many more lives.

The harsh reality is that the number of donor hearts decreases each year while demand increases, leading to unconventional approaches like xenotransplantation using specially bred pigs as donor, as well as therapeutic cloning to grow a new heart from the patient’s own cells. Having a universal TAH that could be left in-place (destination therapy) for decades would offer a solid option next to the latter, but remains elusive. As shown by e.g. the lack of progress with a TAH like the ReinHeart despite a promising 2014 paper in a bovine model.

Hopefully before long we’ll figure out a reliable way to fix this ‘just a blood pump’ in our bodies, regardless of whether it’s a biological or mechanical solution.

AMD Returns To 1996 With Zen 5’s Two-Block Ahead Branch Predictor

An interesting finding in fields like computer science is that much of what is advertised as new and innovative was actually pilfered from old research papers submitted to ACM and others. Which is not to say that this is necessarily a bad thing, as many of such ideas were not practical at the time. Case in point the new branch predictor in AMD’s Zen 5 CPU architecture, whose two-block ahead design is based on an idea coined a few decades ago. The details are laid out by [George Cozma] and [Camacho] in a recent article, which follows on a recent interview that [George] did with AMD’s [Mike Clark].

The 1996 ACM paper by [André Seznec] and colleagues titled “Multiple-block ahead branch predictors” is a good start before diving into [George]’s article, as it will help to make sense of many of the details. The reason for improving the branch prediction in CPUs is fairly self-evident, as today’s heavily pipelined, superscalar CPUs rely heavily on branch prediction and speculative execution to get around the glacial speeds of system memory once past the CPU’s speediest caches. While predicting the next instruction block after a branch is commonly done already, this two-block ahead approach as suggested also predicts the next instruction block after the first predicted one.

Perhaps unsurprisingly, this multi-block ahead branch predictor by itself isn’t the hard part, but making it all fit in the hardware is. As described in the paper by [Seznec] et al., the relevant components are now dual-ported, allowing for three prediction windows. Theoretically this should result in a significant boost in IPC and could mean that more CPU manufacturers will be looking at adding such multi-block branch prediction to their designs. We will just have to see how Zen 5 works once released into the wild.

Analyzing Feature Learning In Artificial Neural Networks And Neural Collapse

Artificial Neural Networks (ANNs) are commonly used for machine vision purposes, where they are tasked with object recognition. This is accomplished by taking a multi-layer network and using a training data set to configure the weights associated with each ‘neuron’. Due to the complexity of these ANNs for non-trivial data sets, it’s often hard to make head or tails of what the network is actually matching in a given (non-training data) input. In a March 2024 study (preprint) by [A. Radhakrishnan] and colleagues in Science an approach is provided to elucidate and diagnose this mystery somewhat, by using what they call the average gradient outer product (AGOP).

Defined as the uncentered covariance matrix of the ANN’s input-output gradients averaged over the training dataset, this property can provide information on the data set’s features used for predictions. This turns out to be strongly correlated with repetitive information, such as the presence of eyes in recognizing whether lipstick is being worn and star patterns in a car and truck data set rather than anything to do with the (highly variable) vehicles. None of this was perhaps too surprising, but a number of the same researchers used the same AGOP for elucidating the mechanism behind neural collapse (NC) in ANNs.

NC occurs when an ANN gets overtrained (overparametrized). In the preprint paper by [D. Beaglehole] et al. the AGOP is used to provide evidence for the mechanism behind NC during feature learning. Perhaps the biggest take-away from these papers is that while ANNs can be useful, they’re also incredibly complex and poorly understood. The more we learn about their properties, the more appropriately we can use them.