The Shady School

We can understand why shaderacademy.com chose that name over “the shady school,” but whatever they call it, if you are looking to brush up on graphics programming with GPUs, it might be just what you are looking for.

The website offers challenges that task you to draw various 2D and 3D graphics using code in your browser. Of course, this presupposes you have WebGPU enabled in your browser which means no Firefox or Safari. It looks like you can do some exercises without WebGPU, but the cool ones will need you to use a Chrome-style browser.

You can search by level of difficulty, so maybe start with “Intro” and try doing “the fragment shader.” You’ll notice they already provide some code for you along with a bit of explanation. It also shows you a picture of what you should draw and what you really drew. You get a percentage based on the matching. There’s also a visual diff that can show you what’s different about your picture from the reference picture.

We admit that one is pretty simple. Consider moving on to “Easy” with options like “two images blend,” for example. There are problems at every level of difficulty. Although there is a part for compute shaders, none seem to be available yet. Too bad, because that’s what we find most interesting. If you prefer a different approach, there are other tutorials out there.

Screenshot of audio noise graph

Whispers From The Void, Transcribed With AI

‘Hearing voices’ doesn’t have to be worrisome, for instance when software-defined radio (SDR) happens to be your hobby. It can take quite some of your time and attention to pull voices from the ether and decode them. Therefore, [theckid] came up with a nifty solution: RadioTranscriptor. It’s a homebrew Python script that captures SDR audio and transcribes it using OpenAI’s Whisper model, running on your GPU if available. It’s lean and geeky, and helps you hear ‘the voice in the noise’ without actively listening to it yourself.

This tool goes beyond the basic listening and recording. RadioTranscriptor combines SDR, voice activity detection (VAD), and deep learning. It resamples 48kHz audio to 16kHz in real time. It keeps a rolling buffer, and only transcribes actual voice detected from the air. It continuously writes to a daily log, so you can comb through yesterday’s signal hauntings while new findings are being logged. It offers GPU support with CUDA, with fallback to CPU.

It sure has its quirks, too: ghost logs, duplicate words – but it’s dead useful and hackable to your liking. Want to change the model, tweak the threshold, add speaker detection: the code is here to fork and extend. And why not go the extra mile, and turn it into art?

ATTiny85 as fan controller

An ATTiny GPU Fan Controller That Sticks

When your GPU fan goes rogue with an unholy screech, you either shell out for a new one or you go full hacker mode. Well, [ashafq] did the latter. The result is a delightfully nerdy fan controller powered by an ATTiny85 and governed by a DS18B20 temperature sensor. We all know a silent workstation is golden, and there’s no fun in throwing money at an off-the-shelf solution. [ashafq]’s custom build transforms a whiny Radeon RX 550 into a cool, quiet operator. Best of all: it’s built from bits likely already in your junk drawer.

To challenge himself a bit, [ashafq] rolled his own temperature-triggered PWM logic using 1-wire protocol on an ATtiny85, all without libraries or bloated firmware. The fan’s speed only ramps up when the GPU gets toasty, just like it should. It’s efficient and clever, and that makes it a fine hack. The entire system runs off a scavenged 12V fan. He could have used a 3D printer, but decided to stick onto the card with double-sided tape. McGyver would approve.

The results don’t lie: idle temps at 40 °C, load peaking at 60 °C. Quieter than stock, smarter than stock, and way cheaper too. The double-sided tape may not last, but that leaves room for improvement. In case you want to start on it yourself, read the full write-up and feel inspired to build your own. Hackaday.io is ready for the documentation of your take on it.

Modifying fans is a tradition around here. Does it always take a processor? Nope.

How To Train A New Voice For Piper With Only A Single Phrase

[Cal Bryant] hacked together a home automation system years ago, which more recently utilizes Piper TTS (text-to-speech) voices for various undisclosed purposes. Not satisfied with the robotic-sounding standard voices available, [Cal] set about an experiment to fine-tune the Piper TTS AI voice model using a clone of a single phrase created by a commercial TTS voice as a starting point.

Before the release of Piper TTS in 2023, existing free-to-use TTS systems such as espeak and Festival sounded robotic and flat. Piper delivered much more natural-sounding output, without requiring massive resources to run. To change the voice style, the Piper AI model can be either retrained from scratch or fine-tuned with less effort. In the latter case, the problem to be solved first was how to generate the necessary volume of training phrases to run the fine-tuning of Piper’s AI model. This was solved using a heavyweight AI model, ChatterBox, which is capable of so-called zero-shot training. Check out the Chatterbox demo here.

As the loss function gets smaller, the model’s accuracy gets better

Training began with a corpus of test phrases in text format to ensure decent coverage of everyday English. [Cal] used ChatterBox to clone audio from a single test phrase generated by a ‘mystery TTS system’ and created 1,300 test phrases from this new voice. This audio set served as training data to fine-tune the Piper AI model on the lashed-up GPU rig.

To verify accuracy, [Cal] used OpenAI’s Whisper software to transcribe the audio back to text, in order to compare with the original text corpus. To overcome issues with punctuation and differences between US and UK English, the text was converted into phonemes using espeak-ng, resulting in a 98% phrase matching accuracy.

After down-sampling the training set using SoX, it was ready for the Piper TTS training system. Despite all the preparation, running the software felt anticlimactic. A few inconsistencies in the dataset necessitated the removal of some data points. After five days of training parked outside in the shade due to concerns about heat, TensorBoard indicated that the model’s loss function was converging. That’s AI-speak for: the model was tuned and ready for action! We think it sounds pretty slick.

If all this new-fangled AI speech synthesis is too complex and, well, a bit creepy for you, may we offer a more 1980s solution to making stuff talk? Finally, most people take the ability to speak for granted, until they can no longer do so. Here’s a team using cutting-edge AI to give people back that ability.

A Crypto Miner Takes The Straight And Narrow

As it stands, cryptocurrency largely seems to be a fad of the previous decade, at least as far as technology goes. During that time, many PC users couldn’t get reasonably priced graphics cards since most of them were going into these miners. In contrast, nowadays any shortages are because they’re being used to turn the Internet into an AI-fueled wasteland. But nonetheless, there is a lot of leftover mining hardware from the previous decade and unlike the modern AI tools getting crammed into everything we own, this dated hardware is actually still useful. [Zendrael] demonstrates this by turning an old mining rig into a media server.

The mining rig is essentially nothing more than a motherboard with a large number of PCI slots, each designed for a GPU. PCI slots can do many other things, though, so [Zendrael] puts a terabyte solid state drive in each but one of the PCI cards using NVMe to PCI adapters. The final slot still hosts a GPU since the computer is being converted to a media server, and this allows it to do various encodings server-side. Even with only 4 GB of memory, the machine in its new configuration is more than capable of running Debian and spinning up all of the necessary software needed for a modern media server like Jellyfin, Nextcloud, and Transmission.

With many people abandoning miners as the value of them declines over time, it’s possible to find a lot of hardware like this that’s ready to be put to work on something new and useful. Hopefully all of the GPUs and other hardware being put to use today in AI will find a similar useful future, but until then we’ll note that you don’t need super powerful hardware to run some of those models on your own.

Continue reading “A Crypto Miner Takes The Straight And Narrow”

Using Integer Addition To Approximate Float Multiplication

Once the domain of esoteric scientific and business computing, floating point calculations are now practically everywhere. From video games to large language models and kin, it would seem that a processor without floating point capabilities is pretty much a brick at this point. Yet the truth is that integer-based approximations can be good enough to hit the required accuracy. For example, approximating floating point multiplication with integer addition, as [Malte Skarupke] recently had a poke at based on an integer addition-only LLM approach suggested by [Hongyin Luo] and [Wei Sun].

As for the way this works, it does pretty much what it says on the tin: adding the two floating point inputs as integer values, followed by adjusting the exponent. This adjustment factor is what gets you close to the answer, but as the article and comments to it illustrate, there are plenty of issues and edge cases you have to concern yourself with. These include under- and overflow, but also specific floating point inputs.

Unlike in scientific calculations where even minor inaccuracies tend to propagate and cause much larger errors down the line, graphics and LLMs do not care that much about float point precision, so the ~7.5% accuracy of the integer approach is good enough. The question is whether it’s truly more efficient as the paper suggests, rather than a fallback as seen with e.g. integer-only audio decoders for platforms without an FPU.

Since one of the nice things about FP-focused vector processors like GPUs and derivatives (tensor, ‘neural’, etc.) is that they can churn through a lot of data quite efficiently, the benefits of shifting this to the ALU of a CPU and expecting (energy) improvements seem quite optimistic.

Musings On A Good Parallel Computer

Until the late 1990s, the concept of a 3D accelerator card was something generally associated with high-end workstations. Video games and kin would run happily on the CPU in one’s desktop system, with later extensions like MMX, 3DNow!, and SSE providing a significant performance boost for games that supported them. As 3D accelerator cards (colloquially called graphics processing units, or GPUs) became prevalent, they took over almost all SIMD vector tasks, but one thing that they’re not good at is being a general-purpose parallel computer. This really ticked [Raph Levien] off and it inspired him to cover his grievances.

Although the interaction between CPUs and GPUs has become tighter over the decades, with PCIe in particular being a big improvement over AGP and PCI, GPUs are still terrible at running arbitrary computing tasks, and even PCIe links are still glacial compared to communication within the GPU and CPU dies. With the introduction of asynchronous graphic APIs this divide became even more intense. [Raph]’s proposal is to invert this relationship.

There’s precedent for this already, with Intel’s Larrabee and IBM’s Cell processor merging CPU and GPU characteristics on a single die, though both struggled with developing for such a new kind of architecture. Sony’s PlayStation 3 was forced to add a GPU due to these issues. There is also the DirectStorage API in DirectX, which bypasses the CPU when loading assets from storage, effectively adding CPU features to GPUs.

As [Raph] notes, so-called AI accelerators also have these characteristics, with often multiple SIMD-capable, CPU-like cores. Maybe the future is Cell after all.