Volume Controller Rejects Skeumorphism, Embraces The Physical

The volume slider on our virtual desktops is a skeuomorphic callback to the volume sliders on professional audio equipment on actual, physical desktops. [Maker Vibe] decided that this skeuomorphism was so last century, and made himself a physical audio control box for his PC.

Since he has three audio outputs he needs to consider, the peripheral he creates could conceivably be called a fader. It certainly has that look, anyway: each output is controlled by a volume slider — connected to a linear potentiometer — and a mute button. Seeing a linear potentiometer used for volume control threw us for a second, until we remembered this was for the computer’s volume control, not an actual volume control circuit. The computer’s volume slider already does the logarithmic conversion. A Seeed Studio Xiao ESP32S3 lives at the heart of this thing, emulating a Bluetooth gamepad using a library by LemmingDev. A trio of LEDs round out the electronics to provide an indicator for which audio channels are muted or active.

Those Bluetooth signals are interpreted by a Python script feeding a software called Voicmeeter Banana, because [Maker Vibe] uses Windows, and Redmond’s finest operating system doesn’t expose audio controls in an easily-accessible way. Voicmeeter Banana (and its attendant Python script) takes care of telling Windows what to do. 

The whole setup lives on [Maker Vibe]’s desk in a handsome 3D printed box. He used a Circuit vinyl cutter to cut out masks so he could airbrush different colours onto the print after sanding down the layer lines. That’s another one for the archive of how to make front panels.

If volume sliders aren’t doing it for you, perhaps you’d prefer to control your audio with a conductor’s baton. 

Continue reading “Volume Controller Rejects Skeumorphism, Embraces The Physical”

How To Train A New Voice For Piper With Only A Single Phrase

[Cal Bryant] hacked together a home automation system years ago, which more recently utilizes Piper TTS (text-to-speech) voices for various undisclosed purposes. Not satisfied with the robotic-sounding standard voices available, [Cal] set about an experiment to fine-tune the Piper TTS AI voice model using a clone of a single phrase created by a commercial TTS voice as a starting point.

Before the release of Piper TTS in 2023, existing free-to-use TTS systems such as espeak and Festival sounded robotic and flat. Piper delivered much more natural-sounding output, without requiring massive resources to run. To change the voice style, the Piper AI model can be either retrained from scratch or fine-tuned with less effort. In the latter case, the problem to be solved first was how to generate the necessary volume of training phrases to run the fine-tuning of Piper’s AI model. This was solved using a heavyweight AI model, ChatterBox, which is capable of so-called zero-shot training. Check out the Chatterbox demo here.

As the loss function gets smaller, the model’s accuracy gets better

Training began with a corpus of test phrases in text format to ensure decent coverage of everyday English. [Cal] used ChatterBox to clone audio from a single test phrase generated by a ‘mystery TTS system’ and created 1,300 test phrases from this new voice. This audio set served as training data to fine-tune the Piper AI model on the lashed-up GPU rig.

To verify accuracy, [Cal] used OpenAI’s Whisper software to transcribe the audio back to text, in order to compare with the original text corpus. To overcome issues with punctuation and differences between US and UK English, the text was converted into phonemes using espeak-ng, resulting in a 98% phrase matching accuracy.

After down-sampling the training set using SoX, it was ready for the Piper TTS training system. Despite all the preparation, running the software felt anticlimactic. A few inconsistencies in the dataset necessitated the removal of some data points. After five days of training parked outside in the shade due to concerns about heat, TensorBoard indicated that the model’s loss function was converging. That’s AI-speak for: the model was tuned and ready for action! We think it sounds pretty slick.

If all this new-fangled AI speech synthesis is too complex and, well, a bit creepy for you, may we offer a more 1980s solution to making stuff talk? Finally, most people take the ability to speak for granted, until they can no longer do so. Here’s a team using cutting-edge AI to give people back that ability.

No Tension For Tensors?

We always enjoy [FloatHeadPhysics] explaining any math or physics topic. We don’t know if he’s acting or not, but he seems genuinely excited about every topic he covers, and it is infectious. He also has entertaining imaginary conversations with people like Feynman and Einstein. His recent video on tensors begins by showing the vector form of Ohm’s law, making it even more interesting. Check out the video below.

If you ever thought you could use fewer numbers for many tensor calculations, [FloatHeadPhysics] had the same idea. Luckily, imaginary Feynman explains why this isn’t right, and the answer shows the basic nature of why people use tensors.

Continue reading “No Tension For Tensors?”

FLOSS Weekly Episode 840: End-of-10; Not Just Some Guy In A Van

This week Jonathan chats with Joseph P. De Veaugh-Geiss about KDE’s eco initiative and the End of 10 campaign! Is Open Source really a win for environmentalism? How does the End of 10 campaign tie in? And what does Pewdiepie have to do with it? Watch to find out!

Continue reading “FLOSS Weekly Episode 840: End-of-10; Not Just Some Guy In A Van”

Dithering With Quantization To Smooth Things Over

It should probably come as no surprise to anyone that the images which we look at every day – whether printed or on a display – are simply illusions. That cat picture isn’t actually a cat, but rather a collection of dots that when looked at from far enough away tricks our brain into thinking that we are indeed looking at a two-dimensional cat and happily fills in the blanks. These dots can use the full CMYK color model for prints, RGB(A) for digital images or a limited color space including greyscale.

Perhaps more interesting is the use of dithering to further trick the mind into seeing things that aren’t truly there by adding noise. Simply put, dithering is the process of adding noise to reduce quantization error, which in images shows up as artefacts like color banding. Within the field of digital audio dithering is also used, for similar reasons. Part of the process of going from an analog signal to a digital one involves throwing away data that falls outside the sampling rate and quantization depth.

By adding dithering noise these quantization errors are smoothed out, with the final effect depending on the dithering algorithm used.

Continue reading “Dithering With Quantization To Smooth Things Over”

Kids Vs Computers: Chisanbop Remembered

If you are a certain age, you probably remember the ads and publicity around Chisanbop — the supposed ancient art of Korean finger math. Was it Korean? Sort of. Was it faster than a calculator? Sort of. [Chris Staecker] offers a great look at Chisanbop, not just how to do it, but also how it became such a significant cultural phenomenon. Take a look at the video below. Long, but worth it.

Technically, the idea is fairly simple. Your right-hand thumb is worth 5, and each finger is worth 1. So to identify 8, you hold down your thumb and the first three digits. The left hand has the same arrangement, but everything is worth ten times the right hand, so the thumb is 50, and each digit is worth 10.

With a little work, it is easy to count and add using this method. Subtraction is just the reverse. As you might expect, multiplication is just repeated addition. But the real story here isn’t how to do Chisanbop. It is more the story of how a Korean immigrant’s system went viral decades before the advent of social media.

You can argue that this is a shortcut that hurts math understanding. Or, you could argue the reverse. However, the truth is that this was around the time the calculator became widely available. Math education would shift from focusing on getting the right answer to understanding the underlying concepts. In a world where adding ten 6-digit numbers is easy with a $5 device, being able to do it with your fingers isn’t necessarily a valuable skill.

If you enjoy unconventional math methods, you may appreciate peasant multiplication.

Continue reading “Kids Vs Computers: Chisanbop Remembered”

Crunching The News For Fun And Little Profit

Do you ever look at the news, and wonder about the process behind the news cycle? I did, and for the last couple of decades it’s been the subject of one of my projects. The Raspberry Pi on my shelf runs my word trend analysis tool for news content, and since my journey from curious geek to having my own large corpus analysis system has taken twenty years it’s worth a second look.

How Career Turmoil Led To A Two Decade Project

A hanging sign surrounded by ornate metalwork, with the legend "Cyder house".
This is very much a minority spelling. Colin Smith, CC BY-SA 2.0.

In the middle of the 2000s I had come out of the dotcom crash mostly intact, and was working for a small web shop. When they went bust I was casting around as one does, and spent a while as a Google quality rater while I looked for a new permie job. These teams are employed by the search giant through temporary employment agencies, and in loose terms their job is to be the trained monkeys against whom the algorithm is tested. The algorithm chose X, and if the humans also chose X, the algorithm is probably getting it right. Being a quality rater is not in any way a high-profile job, but with the big shiny G on my CV I soon found myself in demand from web companies seeking some white-hat search engine marketing expertise. What I learned mirrored my lesson from a decade earlier in the CD-ROM business, that on the web as in any other electronic publishing medium, good content well presented has priority over any black-hat tricks.

But what makes good content? Forget an obsession with stuffing bogus keywords in the text, and instead talk about the right things, and do it authoritatively. What are the right things in this context? If you are covering a subject, you need to do so using the right language; that which the majority uses rather than language only you use. I can think of a bunch of examples which I probably shouldn’t talk about, but an example close to home for me comes in cider. In the UK, cider is a fermented alcoholic drink made from apples, and as a craft cidermaker of many years standing I have a good grasp of its vocabulary. The accepted spelling is “Cider”, but there’s an alternate spelling of “Cyder” used by some commercial producers of the drink. It doesn’t take long to realise that online, hardly anyone uses cyder with a Y, and thus pages concentrating on that word will do less well than those talking about cider.

A graph of the word football versus the word soccer in British news.
We Brits rarely use the word “soccer” unless there’s a story about the Club World Cup in America.

I started to build software to analyse language around a given topic, with the aim of discerning the metaphorical cider from the cyder. It was a great surprise a few years later to discover that I had invented for myself the already-existing field of computational linguistics, something that would have saved me a lot of time had I known about it when I began. I was taking a corpus of text and computing the frequencies and collocates (words that appear alongside each other) of the words within it, and from that I could quickly see which wording mattered around a subject, and which didn’t. This led seamlessly to an interest in what the same process would look like for news data with a time axis added, so I created a version which harvested its corpus from RSS feeds. Thus began my decades-long project.

Continue reading “Crunching The News For Fun And Little Profit”