A circuit board in the shape of a business card is shown. The circuitry is confined to the left side of the board, and the rest is used for text.

(Neural) Networking With A Business Card

A PCB business card is a great way for electrical engineers to impress employers with their design skills, but the software they run can be just as impressive as the card itself. As a programmer with an interest in embedded machine learning, [Dave McKinnon] wanted a card that showcased his skills, so he designed one that runs voice recognition.

[Dave] specifically wanted to run a neural network on his card, but needed to make it small enough to run on a microcontroller. Voice recognition looked like a good fit for this, since audio can be represented with relatively little data, a microphone is cheap and easy to add to a circuit board, and there was already an example of someone running such a voice recognition network on an Arduino. To fit the neural network into 46 kB, it only distinguishes the words “one” through “nine,” and displays its guess on an LED seven-segment display. [Dave] first prototyped the system with an Arduino, then designed the circuit board around an RP2040.

Continue reading “(Neural) Networking With A Business Card”

Christmas Comes Early With AI Santa Demo

With only two hundred odd days ’til Christmas, you just know we’re already feeling the season’s magic. Well, maybe not, but [Sean Dubois] has decided to give us a head start with this WebRTC demo built into a Santa stuffie.

The details are a little bit sparse (hopefully he finishes the documentation on GitHub by the time this goes out) but the project is really neat. Hardware-wise, it’s an audio-enabled ESP32-S3 dev board living inside Santa, running the OpenAI’s OpenRealtime Embedded SDK (as implemented by ExpressIf), with some customization by [Sean]. Looks like the audio is going through the newest version of LibPeer and the heavy lifting is all happening in the cloud, as you’d expect with this SDK. (A key is required, but hey! It’s all open source; if you have an AI that can do the job locally-hosted, you can probably figure out how to connect to it instead.)

This speech-to-speech AI doesn’t need to emulate Santa Claus, of course; you can prime the AI with any instructions you’d like. If you want to delight children, though, its hard to beat the Jolly Old Elf, and you certainly have time to get it ready for Christmas. Thanks to [Sean] for sending in the tip.

If you like this project but want to avoid paying OpenAI API fees, here’s a speech-to-text model to get you started.We covered this AI speech generator last year to handle the talky bit. If you put them together and make your own Santa Claus (or perhaps something more seasonal to this time of year), don’t forget to drop us a tip!

Be Careful What You Ask For: Voice Control

We get it. We also watched Star Trek and thought how cool it would be to talk to our computer. From Kirk setting a self-destruct sequence, to Scotty talking into a mouse, or Picard ordering Earl Grey, we intuitively know that talking to a computer is better than typing, right? Well, computers talking back and forth to us is no longer science fiction, and maybe we aren’t as happy about it as we thought we’d be.

We weren’t able to pinpoint the first talking computer in fiction. Asimov and van Vogt had talking computers in the 1940s. “I, Robot” by Eando Binder, and not the more famous Asimov story, had a fully speaking robot in 1939. You could argue that “The Machine” in E. M. Forster’s “The Machine Stops” was probably speaking — the text is a little vague — and that was in 1909. The robot from Metropolis (1927) spoke after transforming, but you could argue that doesn’t count.

Meanwhile, In Real Life

In real life, computers weren’t as quick to speak. Before the middle of the twentieth century, machine-generated speech was an oddity. In 1779, a mechanical contrivance by Wolfgang von Kempelen, famous for the mechanical Turk chess-playing automaton, could form simple words. By 1939, Bell Labs could do even better speech synthesis electronically but with a human operator. It didn’t sound very good, as you can see in the video below, but it was certainly expressive.

Continue reading “Be Careful What You Ask For: Voice Control”

CH32V003 Provides Ultra Cheap Speech Recognition

Speech recognition was once the stuff of science fiction, but it’s now possible with relatively modest hardware. Just how modest, you ask? How about a 10 cent microcontroller?

[Brian Smith] has achieved a very basic form of speech recognition on a CH32V003 RISC-V microcontroller. It may only recognize spoken digits, but that it does so at all on such a modest platform is impressive in itself.

For training purposes it enlists the help of a desktop Linux computer, however the recognition process is purely in the ten cent chip. He goes into much detail about how it achieves this on a system without floating point arithmetic, as well as the other shortcomings of such a limited platform.

We’ve become used to thinking of super-cheap chips as of limited use, but the truth is they’re surprisingly more capable than expected. We’re seeing them starting to appear as subsidiary processors on some badges, so it will be interesting to see them proliferate in more projects now their availability problems have eased. Go on – for ten cents, what do you have to lose?

Here’s A Plain C/C++ Implementation Of AI Speech Recognition, So Get Hackin’

[Georgi Gerganov] recently shared a great resource for running high-quality AI-driven speech recognition in a plain C/C++ implementation on a variety of platforms. The automatic speech recognition (ASR) model is fully implemented using only two source files and requires no dependencies. As a result, the high-quality speech recognition doesn’t involve calling remote APIs, and can run locally on different devices in a fairly straightforward manner. The image above shows it running locally on an iPhone 13, but it can do more than that.

Implementing a robust speech transcription that runs locally on a variety of devices is much easier with [Georgi]’s port of OpenAI’s Whisper.
[Georgi]’s work is a port of OpenAI’s Whisper model, a remarkably-robust piece of software that does a truly impressive job of turning human speech into text. Whisper is easy to set up and play with, but this port makes it easier to get the system working in other ways. Having such a lightweight implementation of the model means it can be more easily integrated over a variety of different platforms and projects.

The usual way that OpenAI’s Whisper works is to feed it an audio file, and it spits out a transcription. But [Georgi] shows off something else that might start giving hackers ideas: a simple real-time audio input example.

By using a tool to stream audio and feed it to the system every half-second, one can obtain pretty good (sort of) real-time results! This of course isn’t an ideal method, but the robustness and accuracy of Whisper is such that the results look pretty great nevertheless.

You can watch a quick demo of that in the video just under the page break. If it gives you some ideas, head over to the project’s GitHub repository and get hackin’!

Continue reading “Here’s A Plain C/C++ Implementation Of AI Speech Recognition, So Get Hackin’”

On Getting A Computer’s Attention And Striking Up A Conversation

With the rise in voice-driven virtual assistants over the years, the sight of people talking to various electrical devices in public and in private has become rather commonplace. While such voice-driven interfaces are decidedly useful for a range of situations, they also come with complications. One of these are the trigger phrases or wake words that voice assistants listen to when in standby. Much like in Star Trek, where uttering ‘Computer’ would get the computer’s attention, so do we have our ‘Siri’, ‘Cortana’ and a range of custom trigger phrases that enable the voice interface.

Unlike in Star Trek, however, our virtual assistants do not know when we really desire to interact. Unable to distinguish context, they’ll happily respond to someone on TV mentioning their trigger phrase. This possibly followed by a ludicrous purchase order or other mischief. The realization here is the complexity of voice-based interfaces, while still lacking any sense of self-awareness or intelligence.

Another issue is that the process of voice recognition itself is very resource-intensive, which limits the amount of processing that can be performed on the local device. This usually leads to the voice assistants like Siri, Alexa, Cortana and others processing recorded voices in a data center, with obvious privacy implications.

Continue reading “On Getting A Computer’s Attention And Striking Up A Conversation”

OpenAI Hears You Whisper

Should you wish to try high-quality voice recognition without buying something, good luck. Sure, you can borrow the speech recognition on your phone or coerce some virtual assistants on a Raspberry Pi to handle the processing for you, but those aren’t good for major work that you don’t want to be tied to some closed-source solution. OpenAI has introduced Whisper, which they claim is an open source neural net that “approaches human level robustness and accuracy on English speech recognition.” It appears to work on at least some other languages, too.

If you try the demonstrations, you’ll see that talking fast or with a lovely accent doesn’t seem to affect the results. The post mentions it was trained on 680,000 hours of supervised data. If you were to talk that much to an AI, it would take you 77 years without sleep!

Continue reading “OpenAI Hears You Whisper”