Speech Recognition On An Arduino Nano?

Like most of us, [Peter] had a bit of extra time on his hands during quarantine and decided to take a look back at speech recognition technology in the 1970s. Quickly, he started thinking to himself, “Hmm…I wonder if I could do this with an Arduino Nano?” We’ve all probably had similar thoughts, but [Peter] really put his theory to the test.

The hardware itself is pretty straightforward. There is an Arduino Nano to run the speech recognition algorithm and a MAX9814 microphone amplifier to capture the voice commands. However, the beauty of [Peter’s] approach, lies in his software implementation. [Peter] has a bit of an interplay between a custom PC program he wrote and the Arduino Nano. The learning aspect of his algorithm is done on a PC, but the implementation is done in real-time on the Arduino Nano, a typical approach for really any machine learning algorithm deployed on a microcontroller. To capture sample audio commands, or utterances, [Peter] first had to optimize the Nano’s ADC so he could get sufficient sample rates for speech processing. Doing a bit of low-level programming, he achieved a sample rate of 9ksps, which is plenty fast for audio processing.

To analyze the utterances, he first divided each sample utterance into 50 ms segments. Think of dividing a single spoken word into its different syllables. Like analyzing the “se-” in “seven” separate from the “-ven.” 50 ms might be too long or too short to capture each syllable cleanly, but hopefully, that gives you a good mental picture of what [Peter’s] program is doing. He then calculated the energy of 5 different frequency bands, for every segment of every utterance. Normally that’s done using a Fourier transform, but the Nano doesn’t have enough processing power to compute the Fourier transform in real-time, so Peter tried a different approach. Instead, he implemented 5 sets of digital bandpass filters, allowing him to more easily compute the energy of the signal in each frequency band.

The energy of each frequency band for every segment is then sent to a PC where a custom-written program creates “templates” based on the sample utterances he generates. The crux of his algorithm is comparing how closely the energy of each frequency band for each utterance (and for each segment) is to the template. The PC program produces a .h file that can be compiled directly on the Nano. He uses the example of being able to recognize the numbers 0-9, but you could change those commands to “start” or “stop,” for example, if you would like to.

[Peter] admits that you can’t implement the type of speech recognition on an Arduino Nano that we’ve come to expect from those covert listening devices, but he mentions small, hands-free devices like a head-mounted multimeter could benefit from a single word or single phrase voice command. And maybe it could put your mind at ease knowing everything you say isn’t immediately getting beamed into the cloud and given to our AI overlords. Or maybe we’re all starting to get used to this. Whatever your position is on the current state of AI, hopefully, you’ve gained some inspiration for your next project.

Death Of The Turing Test In An Age Of Successful AIs

IBM has come up with an automatic debating system called Project Debater that researches a topic, presents an argument, listens to a human rebuttal and formulates its own rebuttal. But does it pass the Turing test? Or does the Turing test matter anymore?

The Turing test was first introduced in 1950, often cited as year-one for AI research. It asks, “Can machines think?”. Today we’re more interested in machines that can intelligently make restaurant recommendations, drive our car along the tedious highway to and from work, or identify the surprising looking flower we just stumbled upon. These all fit the definition of AI as a machine that can perform a task normally requiring the intelligence of a human. Though as you’ll see below, Turing’s test wasn’t even for intelligence or even for thinking, but rather to determine a test subject’s sex.

Continue reading “Death Of The Turing Test In An Age Of Successful AIs”

Code Talkers: Programming With Voice

IEEE Spectrum had an interesting post covering several companies trying to sell voice programming interfaces. Not programming APIs for speech recognition, but the replacement of the traditional text editor to produce programs.

The companies, Serenade and Talon, have very different styles. Serenade has fairly normal-sounding language, whereas Talon has you use very specific phrases and can even use eye tracking to figure out what you are looking at when you issue a command. There’s also mention of two open-source products (Aenae and Caster) that require you to use a third-party speech engine.

For an example of Talon’s input, imagine you want this line of code in your program:

name=extract_word(m)

You’d say this out loud: “Phrase name op equals snake extract word paren mad.” Not exactly how Star Trek envisioned voice programming.

For accessibility, this might be workable. It is hard for us to imagine a room full of developers all talking to make their computers enter C or Python code. Until we can say, “Computer, build a graphic using the data in file hackaday-27,” we think this is not going to go mainstream.

The actual speech recognition part is pretty much a commodity now. Making a reasonable set of guesses about what people will say and what they mean by it is something else. It seems like this works best when you have a very specific and limited vocabulary, like operating a 3D printer.

Hackaday Links Column Banner

Hackaday Links: December 6, 2020

By now you’ve no doubt heard of the sudden but not unexpected demise of the iconic Arecibo radio telescope in Puerto Rico. We have been covering the agonizing end of Arecibo from almost the moment the first cable broke in August to a eulogy, and most recently its final catastrophic collapse this week. That last article contained amazing video of the final collapse, including up-close and personal drone shots of the cable breaking. For a more in-depth analysis of the collapse, it’s hard to beat Scott Manley’s frame-by-frame analysis, which really goes into detail about what happened. Seeing the paint spalling off the cables as they stretch and distort under loads far greater than they were designed for is both terrifying and fascinating.

Exciting news from Australia as the sample return capsule from JAXA’s Hayabusa2 asteroid explorer returned safely to Earth Saturday. We covered Hayabusa2 in our roundup of extraterrestrial excavations a while back, describing how it used both a tantalum bullet and a shaped-charge penetrator to blast regolith from the surface of asteroid 162173 Ryugu. Samples of the debris were hoovered up and hermetically sealed for the long ride back to Earth, which culminated in the fiery re-entry and safe landing in the midst of the Australian outback. Planetary scientists are no doubt eager to get a look inside the capsule and analyze the precious milligrams of space dust. In the meantime, Hayabusa2, with 66 kilograms of propellant remaining, is off on an extended mission to visit more asteroids for the next eleven years or so.

The 2020 Remoticon has been wrapped up for most of a month now, but one thing we noticed was how much everyone seemed to like the Friday evening Bring-a-Hack event that was hosted on Remo. To kind of keep that meetup momentum going and to help everyone slide into the holiday season with a little more cheer, we’re putting together a “Holiday with Hackaday & Tindie” meetup on Tuesday, December 15 at noon Pacific time. The details haven’t been shared yet, but our guess is that this will certainly be a “bring-a-hack friendly” event. We’ll share more details when we get them this week, but for now, hop over to the Remo event page and reserve your spot.

On the Buzzword Bingo scorecard, “Artificial Intelligence” is a square that can almost be checked off by default these days, as companies rush to stretch the definition of the term to fit almost every product in the neverending search for market share. But even those products that actually have machine learning built into them are only as good as the data sets used to train them. That can be a problem for voice-recognition systems; while there are massive databases of utterances in just about every language, the likes of Amazon and Google aren’t too willing to share what they’ve leveraged from their smart speaker using customer base. What’s the little person to do? Perhaps the People’s Speech database will help. Part of the MLCommons project, it has 86,000 hours of speech data, mostly derived from audiobooks, a clever source indeed since the speech and the text can be easily aligned. The database also pulls audio and the corresponding text from Wikipedia and other random sources around the web. It’s a small dataset, to be sure, but it’s a start.

And finally, divers in the Baltic Sea have dredged up a bit of treasure: a Nazi Enigma machine. Divers in Gelting Bay near the border of Germany and Denmark found what appeared to be an old typewriter caught in one of the abandoned fishing nets they were searching for. When they realized what it was — even crusted in 80-years-worth of corrosion and muck some keys still look like they’re brand new — they called in archaeologists to take over recovery. Gelting Bay was the scene of a mass scuttling of U-boats in the final days of World War II, so this Engima may have been pitched overboard before by a Nazi commander before pulling the plug on his boat. It’ll take years to restore, but it’ll be quite a museum piece when it’s done.

Control Anything With A Chat Bot

In the world of Internet of Things, it’s easy enough to get something connected to the Internet. But what should you use to communicate with and control it? There are many standards and tools available, but the best choice is always to use the tools you have on hand. [Victor] found himself in this situation, and found that the best way to control an Internet-connected car was to use the Flask server he already had.

The remote controlled car was originally supposed to come with an Arduino, but the microcontroller was missing upon arrival. He had a Raspberry Pi around, and was able to set that up to replace the Arduino. He also took the opportunity to use the expanded functionality of the Pi compared to the Arduino and wrote a Flask server to control it, which is accessed as if you are communicating with a chat bot. Sending the words “go left/forward” to the Flask server will control the car accordingly, for example.

The chat bot itself contains some gems as well, and would be useful for any project that makes use of regular expressions. It also seems to be easily expandable. The project also uses voice commands, and does so by making extensive use of Mozilla’s voice recognition suite. If you want to get deep in the weeds of voice recognition on your own though, you can also explore TensorFlow at your leisure.

Picovoice Puts Smarts Offline In 512K Of Memory

We live in the future. You can ask your personal assistant to turn on the lights, plan your commute, or set your thermostat. If they ever give Alexa sudo, she might be able to make a sandwich. However, you almost always see these devices sending data to some remote server in the sky to do the analysis and processing. There are some advantages to that, but it isn’t great for privacy as several recent news stories have pointed out. It also doesn’t work well when the network or those remote servers crash — another recent news story. But what’s the alternative? If Picovoice has its way, you’ll just do all the speech recognition offline.

Have a look at the video below. There’s an ARM board not too different from several we have lying around in the Hackaday bunker. It is listening for a wake-up phrase and processing audio commands. All in about 512K of memory. The libraries are apparently quite portable and the Linux and Raspberry Pi versions are already open source. The company says they will make other platforms available in upcoming releases and claim to support ARM Cortex-M, Cortex-A, Android, Mac, Windows, and WebAssembly.

Continue reading “Picovoice Puts Smarts Offline In 512K Of Memory”

Speech Recognition Without A Voice

The biggest change in Human Computer Interaction over the past few years is the rise of voice assistants. The Siris and Alexas are our HAL 9000s, and soon we’ll be using these assistants to open the garage door. They might just do it this time.

What would happen if you could talk to these voice assistants without saying a word? Would that be telepathy? That’s exactly what [Annie Ho] is doing with Cerebro Voice, a project in this year’s Hackaday Prize.

At its core, the idea behind Cerebro Voice is based on subvocal recognition, a technique that detects electrical signals from the vocal cords and other muscles involved in speaking. These electrical signals are collected by surface EMG devices, then sent to a computer for processing and reconstruction into words. It’s a proven technology, and even NASA is calling it ‘synthetic telepathy’.

The team behind this project is just in the early stages of prototyping this device, and so far they’re using EMG hardware and microphones to train a convolutional neural network that will translate electrical signals into a user’s inner monologue. It’s an amazing project, and one of the best we’ve seen in the Human Computer Interface challenge in this year’s Hackaday Prize.