How To Train A New Voice For Piper With Only A Single Phrase

July 9, 2025 by Dave Rowntree 7 Comments

[Cal Bryant] hacked together a home automation system years ago, which more recently utilizes Piper TTS (text-to-speech) voices for various undisclosed purposes. Not satisfied with the robotic-sounding standard voices available, [Cal] set about an experiment to fine-tune the Piper TTS AI voice model using a clone of a single phrase created by a commercial TTS voice as a starting point.

Before the release of Piper TTS in 2023, existing free-to-use TTS systems such as espeak and Festival sounded robotic and flat. Piper delivered much more natural-sounding output, without requiring massive resources to run. To change the voice style, the Piper AI model can be either retrained from scratch or fine-tuned with less effort. In the latter case, the problem to be solved first was how to generate the necessary volume of training phrases to run the fine-tuning of Piper’s AI model. This was solved using a heavyweight AI model, ChatterBox, which is capable of so-called zero-shot training. Check out the Chatterbox demo here.

As the loss function gets smaller, the model’s accuracy gets better

Training began with a corpus of test phrases in text format to ensure decent coverage of everyday English. [Cal] used ChatterBox to clone audio from a single test phrase generated by a ‘mystery TTS system’ and created 1,300 test phrases from this new voice. This audio set served as training data to fine-tune the Piper AI model on the lashed-up GPU rig.

To verify accuracy, [Cal] used OpenAI’s Whisper software to transcribe the audio back to text, in order to compare with the original text corpus. To overcome issues with punctuation and differences between US and UK English, the text was converted into phonemes using espeak-ng, resulting in a 98% phrase matching accuracy.

After down-sampling the training set using SoX, it was ready for the Piper TTS training system. Despite all the preparation, running the software felt anticlimactic. A few inconsistencies in the dataset necessitated the removal of some data points. After five days of training parked outside in the shade due to concerns about heat, TensorBoard indicated that the model’s loss function was converging. That’s AI-speak for: the model was tuned and ready for action! We think it sounds pretty slick.

If all this new-fangled AI speech synthesis is too complex and, well, a bit creepy for you, may we offer a more 1980s solution to making stuff talk? Finally, most people take the ability to speak for granted, until they can no longer do so. Here’s a team using cutting-edge AI to give people back that ability.

A Crypto Miner Takes The Straight And Narrow

July 1, 2025 by Bryan Cockfield 29 Comments

As it stands, cryptocurrency largely seems to be a fad of the previous decade, at least as far as technology goes. During that time, many PC users couldn’t get reasonably priced graphics cards since most of them were going into these miners. In contrast, nowadays any shortages are because they’re being used to turn the Internet into an AI-fueled wasteland. But nonetheless, there is a lot of leftover mining hardware from the previous decade and unlike the modern AI tools getting crammed into everything we own, this dated hardware is actually still useful. [Zendrael] demonstrates this by turning an old mining rig into a media server.

The mining rig is essentially nothing more than a motherboard with a large number of PCI slots, each designed for a GPU. PCI slots can do many other things, though, so [Zendrael] puts a terabyte solid state drive in each but one of the PCI cards using NVMe to PCI adapters. The final slot still hosts a GPU since the computer is being converted to a media server, and this allows it to do various encodings server-side. Even with only 4 GB of memory, the machine in its new configuration is more than capable of running Debian and spinning up all of the necessary software needed for a modern media server like Jellyfin, Nextcloud, and Transmission.

With many people abandoning miners as the value of them declines over time, it’s possible to find a lot of hardware like this that’s ready to be put to work on something new and useful. Hopefully all of the GPUs and other hardware being put to use today in AI will find a similar useful future, but until then we’ll note that you don’t need super powerful hardware to run some of those models on your own.

Continue reading “A Crypto Miner Takes The Straight And Narrow” →

Using Integer Addition To Approximate Float Multiplication

April 10, 2025 by Maya Posch 14 Comments

Once the domain of esoteric scientific and business computing, floating point calculations are now practically everywhere. From video games to large language models and kin, it would seem that a processor without floating point capabilities is pretty much a brick at this point. Yet the truth is that integer-based approximations can be good enough to hit the required accuracy. For example, approximating floating point multiplication with integer addition, as [Malte Skarupke] recently had a poke at based on an integer addition-only LLM approach suggested by [Hongyin Luo] and [Wei Sun].

As for the way this works, it does pretty much what it says on the tin: adding the two floating point inputs as integer values, followed by adjusting the exponent. This adjustment factor is what gets you close to the answer, but as the article and comments to it illustrate, there are plenty of issues and edge cases you have to concern yourself with. These include under- and overflow, but also specific floating point inputs.

Unlike in scientific calculations where even minor inaccuracies tend to propagate and cause much larger errors down the line, graphics and LLMs do not care that much about float point precision, so the ~7.5% accuracy of the integer approach is good enough. The question is whether it’s truly more efficient as the paper suggests, rather than a fallback as seen with e.g. integer-only audio decoders for platforms without an FPU.

Since one of the nice things about FP-focused vector processors like GPUs and derivatives (tensor, ‘neural’, etc.) is that they can churn through a lot of data quite efficiently, the benefits of shifting this to the ALU of a CPU and expecting (energy) improvements seem quite optimistic.

Musings On A Good Parallel Computer

March 23, 2025 by Maya Posch 28 Comments

Until the late 1990s, the concept of a 3D accelerator card was something generally associated with high-end workstations. Video games and kin would run happily on the CPU in one’s desktop system, with later extensions like MMX, 3DNow!, and SSE providing a significant performance boost for games that supported them. As 3D accelerator cards (colloquially called graphics processing units, or GPUs) became prevalent, they took over almost all SIMD vector tasks, but one thing that they’re not good at is being a general-purpose parallel computer. This really ticked [Raph Levien] off and it inspired him to cover his grievances.

Although the interaction between CPUs and GPUs has become tighter over the decades, with PCIe in particular being a big improvement over AGP and PCI, GPUs are still terrible at running arbitrary computing tasks, and even PCIe links are still glacial compared to communication within the GPU and CPU dies. With the introduction of asynchronous graphic APIs this divide became even more intense. [Raph]’s proposal is to invert this relationship.

There’s precedent for this already, with Intel’s Larrabee and IBM’s Cell processor merging CPU and GPU characteristics on a single die, though both struggled with developing for such a new kind of architecture. Sony’s PlayStation 3 was forced to add a GPU due to these issues. There is also the DirectStorage API in DirectX, which bypasses the CPU when loading assets from storage, effectively adding CPU features to GPUs.

As [Raph] notes, so-called AI accelerators also have these characteristics, with often multiple SIMD-capable, CPU-like cores. Maybe the future is Cell after all.

Import GPU: Python Programming With CUDA

February 25, 2025 by Bryan Cockfield 17 Comments

Every few years or so, a development in computing results in a sea change and a need for specialized workers to take advantage of the new technology. Whether that’s COBOL in the 60s and 70s, HTML in the 90s, or SQL in the past decade or so, there’s always something new to learn in the computing world. The introduction of graphics processing units (GPUs) for general-purpose computing is perhaps the most important recent development for computing, and if you want to develop some new Python skills to take advantage of the modern technology take a look at this introduction to CUDA which allows developers to use Nvidia GPUs for general-purpose computing.

Of course CUDA is a proprietary platform and requires one of Nvidia’s supported graphics cards to run, but assuming that barrier to entry is met it’s not too much more effort to use it for non-graphics tasks. The guide takes a closer look at the open-source library PyTorch which allows a Python developer to quickly get up-to-speed with the features of CUDA that make it so appealing to researchers and developers in artificial intelligence, machine learning, big data, and other frontiers in computer science. The guide describes how threads are created, how they travel along within the GPU and work together with other threads, how memory can be managed both on the CPU and GPU, creating CUDA kernels, and managing everything else involved largely through the lens of Python.

Getting started with something like this is almost a requirement to stay relevant in the fast-paced realm of computer science, as machine learning has taken center stage with almost everything related to computers these days. It’s worth noting that strictly speaking, an Nvidia GPU is not required for GPU programming like this; AMD has a GPU computing platform called ROCm but despite it being open-source is still behind Nvidia in adoption rates and arguably in performance as well. Some other learning tools for GPU programming we’ve seen in the past include this puzzle-based tool which illustrates some of the specific problems GPUs excel at.

Asahi Linux Brings Better Gaming To Apple Silicon

October 29, 2024 by Navarre Bartz 8 Comments

For those of you longing for better gaming on an Apple Silicon device, Asahi Linux is here to help.

While Apple’s own line of CPUs are relatively new kids on the block, they’ve still been around for four years now, giving hackers ample time to dissect their innards. The team behind Asahi Linux has now brought us “the only conformant OpenGL®, OpenCL™, and Vulkan® drivers” for Apple’s M1 and M2.

The emulation overhead of the system means that most games will need at least 16 GB of RAM to run. Many games are playable, but newer titles can’t yet hit 60 frames per second. The developers are currently focused on “correctness” and hope to improve performance in future updates. Many indie titles are reported to already be working at full speed though.

You can hear more about some of the fiddly bits of how to “tessellate with arcane compute shaders” in the video below. Don’t worry, it’s only 40 minutes of the nine hour video and it should start right at the presentation by GPU dev [Alyssa Rosenzweig].

If you want to see some of how Linux on Apple Silicon started or some of the previous work on hacking the M1 GPU, we have you covered.

Continue reading “Asahi Linux Brings Better Gaming To Apple Silicon” →

C64 Gets A Graphics Upgrade Courtesy Of Your Favorite Piano Manufacturer

October 11, 2024 by Lewin Day 38 Comments

The Commodore 64 was quite a machine in its time, though a modern assessment would say that it’s severely lacking in the graphical department. [Vossi] has whipped up a bit of an upgrade for the C64 and C128, in the form of a graphics expansion card running Yamaha hardware.

As you might expect, the expansion is designed to fit neatly into a C64 cartridge slot. The card runs the Yamaha V9958—the video display processor known for its appearance in the MSX2+ computers. In this case, it’s paired with a healthy 128 kB of video RAM so it can really do its thing. The V9958 has an analog RGB output that can be set for PAL or NTSC operation, and can perform at resolutions up to 512×212 or even 512×424 interlaced. Naturally, it needs to be hooked directly up to a compatible screen, like a 1084, or one with SCART input. [Vossi] took the time to create some demos of the chip’s capabilities, drawing various graphics in a way that the C64 couldn’t readily achieve on its own.

It’s a build that almost feels like its from an alternate universe, where Yamaha decided to whip up a third-party graphics upgrade for the C64. That didn’t happen, but stranger team ups have occurred over the years.

[Thanks to Stephen Walters for the tip!]