Speech Recognition Without A Voice

The biggest change in Human Computer Interaction over the past few years is the rise of voice assistants. The Siris and Alexas are our HAL 9000s, and soon we’ll be using these assistants to open the garage door. They might just do it this time.

What would happen if you could talk to these voice assistants without saying a word? Would that be telepathy? That’s exactly what [Annie Ho] is doing with Cerebro Voice, a project in this year’s Hackaday Prize.

At its core, the idea behind Cerebro Voice is based on subvocal recognition, a technique that detects electrical signals from the vocal cords and other muscles involved in speaking. These electrical signals are collected by surface EMG devices, then sent to a computer for processing and reconstruction into words. It’s a proven technology, and even NASA is calling it ‘synthetic telepathy’.

The team behind this project is just in the early stages of prototyping this device, and so far they’re using EMG hardware and microphones to train a convolutional neural network that will translate electrical signals into a user’s inner monologue. It’s an amazing project, and one of the best we’ve seen in the Human Computer Interface challenge in this year’s Hackaday Prize.

Speech recognition to control a faucet

Talk To The Faucet

Your hands are filthy from working on your latest project and you need to run the water to wash them. But you don’t want to get the taps filthy too. Wouldn’t it be nice if you could just tell them to turn on hot, or cold? Or if the water’s too cold, you could tell them to make it warmer. [Vije Miller] did just that, he added servo motors to his kitchen tap and enlisted an AI to interpret his voice commands.

Look closely at the photo and you can guess that he started with a single-lever type of tap, the kind which can be worked with an elbow, so this project was probably just for fun and judging by his video below, he does have a sense of humor. But the idea is practical for dual taps with rotating knobs. He did realize, however, that in future versions he should move the servo motor openings from the top plate to the bottom instead, to avoid any water getting in. A NodeMCU ESP8266 ESP-12E board serves for communicating with the speech recognition side but other than the name, JacobAI, he’s keeping the speech part to himself. We secretly suspect that he has a friend named Jacob.

However, we can think of a number of options for it such as DeepSpeech and Wit.ai which we covered when talking about natural language phone bots, and the ubiquitous Alexa as used here with another NodeMCU for turning on Christmas tree lights.

Continue reading “Talk To The Faucet”

Make A Natural Language Phone Bot Like Google’s Duplex AI

After seeing how Google’s Duplex AI was able to book a table at a restaurant by fooling a human maître d’ into thinking it was human, I wondered if it might be possible for us mere hackers to pull off the same feat. What could you or I do without Google’s legions of ace AI programmers and racks of neural network training hardware? Let’s look at the ways we can make a natural language bot of our own. As you’ll see, it’s entirely doable.

Continue reading “Make A Natural Language Phone Bot Like Google’s Duplex AI”

Speech Recognition For Linux Gets A Little Closer

It has become commonplace to yell out commands to a little box and have it answer you. However, voice input for the desktop has never really gone mainstream. This is particularly slow for Linux users whose options are shockingly limited, although decent speech support is baked into recent versions of Windows and OS X Yosemite and beyond.

There are four well-known open speech recognition engines: CMU Sphinx, Julius, Kaldi, and the recent release of Mozilla’s DeepSpeech (part of their Common Voice initiative). The trick for Linux users is successfully setting them up and using them in applications. [Michael Sheldon] aims to fix that — at least for DeepSpeech. He’s created an IBus plugin that lets DeepSpeech work with nearly any X application. He’s also provided PPAs that should make it easy to install for Ubuntu or related distributions.

You can see in the video below that it works, although [Michael] admits it is just a starting point. However, the great thing about Open Source is that armed with a working set up, it should be easy for others to contribute and build on the work he’s started.

Continue reading “Speech Recognition For Linux Gets A Little Closer”

Fooling Speech Recognition With Hidden Voice Commands

It’s 2018, and while true hoverboards still elude humanity, some future predictions have come true. It’s now possible to talk to computers, and most of the time they might even understand you. Speech recognition is usually achieved through the use of neural networks to process audio, in a way that some suggest mimics the operation of the human brain. However, as it turns out, they can be easily fooled.

The attack begins with an audio sample, generally of a simple spoken phrase, though music can also be used. The desired text that the computer should hear instead is then fed into an algorithm along with the audio sample. This function returns a low value when the output of the speech recognition system matches the desired attack phrase. The input audio file is gradually modified using the mathematics of gradient descent, creating a result that to a human sounds like one thing, and to a machine, something else entirely.

The audio files are available on the site for your own experimental purposes. In a noisy environment with poor audio coupling between speakers and a Google Pixel, results were poor – OK Google only heard the human phrase, not the encoded attack phrase. Given that the sound quality was poor, and the files were generated with a different speech model, this is not entirely surprising. We’d love to hear the results of your experiments in the comments.

It’s all a part of [Nicholas]’s PhD studies around the strengths and pitfalls of neural networks. It highlights the fact that neural networks don’t always work in the way we think they do. Google’s Inception is susceptible to similar attacks with images, as we’ve seen recently.

[Thanks to Wolfgang for the tip!]

Ten Minute TensorFlow Speech Recognition

Like a lot of people, we’ve been pretty interested in TensorFlow, the Google neural network software. If you want to experiment with using it for speech recognition, you’ll want to check out [Silicon Valley Data Science’s] GitHub repository which promises you a fast setup for a speech recognition demo. It even covers which items you need to install if you are using a CUDA GPU to accelerate processing or if you aren’t.

Another interesting thing is the use of TensorBoard to visualize the resulting neural network. This tool offers up a page in your browser that lets you visualize what’s really going on inside the neural network. There’s also speech data in the repository, so it is practically a one-stop shop for getting started. If you haven’t seen TensorBoard in action, you might enjoy the video from Google, below.

Continue reading “Ten Minute TensorFlow Speech Recognition”

Arduino Clock Is HAL 1000

In the movie 2001: A Space Odyssey, HAL 9000 — the neurotic computer — had a birthday in 1992 (for some reason, in the book it is 1997). In the late 1960s, that date sounded impossibly far away, but now it seems like a distant memory. The only thing is, we are only now starting to get computers with voice I/O that are practical and even they are a far cry from HAL.

[GeraldF6] built an Arduino-based clock. That’s nothing new but thanks to a MOVI board (ok, shield), this clock has voice input and output as you can see in the video below. Unlike most modern speech-enabled devices, the MOVI board (and, thus, the clock) does not use an external server in the cloud or any remote processing at all. On the other hand, the speech quality isn’t what you might expect from any of the modern smartphone assistants that talk. We estimate it might be about 1/9 the power of the HAL 9000.

Continue reading “Arduino Clock Is HAL 1000”