How To Train A New Voice For Piper With Only A Single Phrase

July 9, 2025 by Dave Rowntree 7 Comments

[Cal Bryant] hacked together a home automation system years ago, which more recently utilizes Piper TTS (text-to-speech) voices for various undisclosed purposes. Not satisfied with the robotic-sounding standard voices available, [Cal] set about an experiment to fine-tune the Piper TTS AI voice model using a clone of a single phrase created by a commercial TTS voice as a starting point.

Before the release of Piper TTS in 2023, existing free-to-use TTS systems such as espeak and Festival sounded robotic and flat. Piper delivered much more natural-sounding output, without requiring massive resources to run. To change the voice style, the Piper AI model can be either retrained from scratch or fine-tuned with less effort. In the latter case, the problem to be solved first was how to generate the necessary volume of training phrases to run the fine-tuning of Piper’s AI model. This was solved using a heavyweight AI model, ChatterBox, which is capable of so-called zero-shot training. Check out the Chatterbox demo here.

As the loss function gets smaller, the model’s accuracy gets better

Training began with a corpus of test phrases in text format to ensure decent coverage of everyday English. [Cal] used ChatterBox to clone audio from a single test phrase generated by a ‘mystery TTS system’ and created 1,300 test phrases from this new voice. This audio set served as training data to fine-tune the Piper AI model on the lashed-up GPU rig.

To verify accuracy, [Cal] used OpenAI’s Whisper software to transcribe the audio back to text, in order to compare with the original text corpus. To overcome issues with punctuation and differences between US and UK English, the text was converted into phonemes using espeak-ng, resulting in a 98% phrase matching accuracy.

After down-sampling the training set using SoX, it was ready for the Piper TTS training system. Despite all the preparation, running the software felt anticlimactic. A few inconsistencies in the dataset necessitated the removal of some data points. After five days of training parked outside in the shade due to concerns about heat, TensorBoard indicated that the model’s loss function was converging. That’s AI-speak for: the model was tuned and ready for action! We think it sounds pretty slick.

If all this new-fangled AI speech synthesis is too complex and, well, a bit creepy for you, may we offer a more 1980s solution to making stuff talk? Finally, most people take the ability to speak for granted, until they can no longer do so. Here’s a team using cutting-edge AI to give people back that ability.

Import GPU: Python Programming With CUDA

February 25, 2025 by Bryan Cockfield 17 Comments

Every few years or so, a development in computing results in a sea change and a need for specialized workers to take advantage of the new technology. Whether that’s COBOL in the 60s and 70s, HTML in the 90s, or SQL in the past decade or so, there’s always something new to learn in the computing world. The introduction of graphics processing units (GPUs) for general-purpose computing is perhaps the most important recent development for computing, and if you want to develop some new Python skills to take advantage of the modern technology take a look at this introduction to CUDA which allows developers to use Nvidia GPUs for general-purpose computing.

Of course CUDA is a proprietary platform and requires one of Nvidia’s supported graphics cards to run, but assuming that barrier to entry is met it’s not too much more effort to use it for non-graphics tasks. The guide takes a closer look at the open-source library PyTorch which allows a Python developer to quickly get up-to-speed with the features of CUDA that make it so appealing to researchers and developers in artificial intelligence, machine learning, big data, and other frontiers in computer science. The guide describes how threads are created, how they travel along within the GPU and work together with other threads, how memory can be managed both on the CPU and GPU, creating CUDA kernels, and managing everything else involved largely through the lens of Python.

Getting started with something like this is almost a requirement to stay relevant in the fast-paced realm of computer science, as machine learning has taken center stage with almost everything related to computers these days. It’s worth noting that strictly speaking, an Nvidia GPU is not required for GPU programming like this; AMD has a GPU computing platform called ROCm but despite it being open-source is still behind Nvidia in adoption rates and arguably in performance as well. Some other learning tools for GPU programming we’ve seen in the past include this puzzle-based tool which illustrates some of the specific problems GPUs excel at.

This Week In Security: Browser Exploits, Play Protect, And Turn ON Your Firewall!

October 20, 2023 by Jonathan Bennett 5 Comments

Google Chrome has done a lot of work on JavaScript performance, pushing the V8 engine to more and more impressive feats. Recently, that optimization has one more piece, the Maglev compiler, which sits between Sparkplug and TurboFan, as a mid-tier optimization step. With a Just In Time (JIT) system, the time saving of code optimization steps has to be carefully weighed against the time costs, and Maglev is another tool in that endless hunt for speed. And with anything this complicated, there’s the occasional flaw found in the system. And of course, because we’re talking about it here, it’s a security vulnerability that results in Remote Code Execution (RCE).

The trick is to use Maglev’s optimization against it. Set up a pair of classes, such that B extends A. Calling new B() results in an attempt to use the constructor from A. Which works, because the compiler checks to make sure that the constructors match before doing so. There’s another way to call a constructor in JS, something like Reflect.construct(B, [], Array);. This calls the B constructor, but indicates that the constructor should return an Array object. You may notice, there’s no array in the A class below. Tricking the compiler into using the parent class constructor in this fashion results in the array being uninitialized, and whatever happens to be in memory will set the length of the array. Continue reading “This Week In Security: Browser Exploits, Play Protect, And Turn ON Your Firewall!” →

The Hello World Of GPT?

April 10, 2023 by Al Williams 73 Comments

Someone wants to learn about Arduino programming. Do you suggest they blink an LED first? Or should they go straight for a 3D laser scanner with galvos, a time-of-flight sensor, and multiple networking options? Most of us need to start with the blinking light and move forward from there. So what if you want to learn about the latest wave of GPT — generative pre-trained transformer — programs? Do you start with a language model that looks at thousands of possible tokens in large contexts? Or should you start with something simple? We think you should start simple, and [Andrej Karpathy] agrees. He has a workbook that makes a tiny GPT that can predict the next bit in a sequence. It isn’t any more practical than a blinking LED, but it is a manageable place to start.

The simple example starts with a vocabulary of two. In other words, characters are 1 or 0. It also uses a context size of 3, so it will look at 3 bits and use that to infer the 4th bit. To further simplify things, the examples assume you will always get a fixed-size sequence of tokens, in this case, eight tokens. Then it builds a little from there.

Continue reading “The Hello World Of GPT?” →

This Week In Security: Lastpass Takeaway, Bitcoin Loss, And PyTorch

January 6, 2023 by Jonathan Bennett 10 Comments

We mentioned the LastPass story in closing a couple weeks ago, but details were still a bit scarce. The hope was that LastPass would release more transparent information about what happened, and how many accounts were accessed. Unfortunately it looks like the December 22nd news release is all we’re going to get. For LastPass users, it’s time to make some decisions.

To recap, an attacker used information from the August 2022 breach to target a LastPass Employee with a social engineering ploy. This succeeded, and the attacker managed to access LastPass backups, specifically a customer account database and customer vaults. There has been no official word of how many users’ data were included, but the indication is that it was the entire dataset. And to make matters worse, the encrypted vault is only partially encrypted. Saved URLs were exposed as plain-text to the attacker, though usernames and passwords are still encrypted using your master password.

So what should a LastPass user do now? It depends. We can assume that whoever has the LastPass vault data is currently throwing every password list available at it. If you used a weak password — derived from words in any language or previously compromised — then it’s time to change all of your passwords that were in the vault. They are burned. Continue reading “This Week In Security: Lastpass Takeaway, Bitcoin Loss, And PyTorch” →

Edging Ahead When Learning On The Edge

June 21, 2022 by Matthew Carlson 10 Comments

“With the power of edge AI in the palm of your hand, your business will be unstoppable.”

That’s what the marketing seems to read like for artificial intelligence companies. Everyone seems to have cloud-scale AI-powered business intelligence analytics at the edge. While sounding impressive, we’re not convinced that marketing mumbo jumbo means anything. But what does AI on edge devices look like these days?

Being on the edge just means that the actual AI evaluation and maybe even fine-tuning runs locally on a user’s device rather than in some cloud environment. This is a double win, both for the business and for the user. Privacy can more easily be preserved as less information is transmitted back to a central location. Additionally, the AI can work in scenarios where a server somewhere might not be accessible or provide a response quickly enough.

Google and Apple have their own AI libraries, ML Kit and Core ML, respectively. There are tools to convert Tensorflow, PyTorch, XGBoost, and LibSVM models into formats that CoreML and ML Kit understand. But other solutions try to provide a platform-agnostic layer for training and evaluation. We’ve also previously covered Tensorflow Lite (TFL), a trimmed-down version of Tensorflow, which has matured considerably since 2017.

For this article, we’ll be looking at PyTorch Live (PTL), a slimmed-down framework for adding PyTorch models to smartphones. Unlike TFL (which can run on RPi and in a browser), PTL is focused entirely on Android and iOS and offers tight integration. It uses a react-native backed environment which means that it is heavily geared towards the node.js world.

Continue reading “Edging Ahead When Learning On The Edge” →

Putting Perseverance Rover’s View Into Satellite View Context

March 30, 2021 by Roger Cheng 3 Comments

It’s always fun to look over aerial and satellite maps of places we know, seeing a perspective different from our usual ground level view. We lose that context when it’s a place we don’t know by heart. Such as, say, Mars. So [Matthew Earl] sought to give Perseverance rover’s landing video some context by projecting onto orbital imagery from ESA’s Mars Express. The resulting video (embedded below the break) is a fun watch alongside the technical writeup Reprojecting the Perseverance landing footage onto satellite imagery.

Some telemetry of rover position and orientation were transmitted live during the landing process, with the rest recorded and downloaded later. Surprisingly, none of that information was used for this project, which was based entirely on video pixels. This makes the results even more impressive and the techniques more widely applicable to other projects. The foundational piece is SIFT (Scale Invariant Feature Transform), which is one of many tools in the OpenCV toolbox. SIFT found correlations between Perseverance’s video frames and Mars Express orbital image, feeding into a processing pipeline written in Python for results rendered in Blender.

While many elements of this project sound enticing for applications in robot vision, there are a few challenges touched upon in the “Final Touches” section of the writeup. The falling heatshield interfered with automated tracking, implying this process will need help to properly understand dynamically changing environments. Furthermore, it does not seem to run fast enough for a robot’s real-time needs. But at first glance, these problems are not fundamental. They merely await some motivated people to tackle in the future.

This process bears some superficial similarities to projection mapping, which is a category of projects we’ve featured on these pages. Except everything is reversed (camera instead of video projector, etc.) making the math an entirely different can of worms. But if projection mapping sounds more to your interest, here is a starting point.

[via Dr. Tanya Harrison @TanyaOfMars]

Continue reading “Putting Perseverance Rover’s View Into Satellite View Context” →