Speech Recognition On An Arduino Nano?

Like most of us, [Peter] had a bit of extra time on his hands during quarantine and decided to take a look back at speech recognition technology in the 1970s. Quickly, he started thinking to himself, “Hmm…I wonder if I could do this with an Arduino Nano?” We’ve all probably had similar thoughts, but [Peter] really put his theory to the test.

The hardware itself is pretty straightforward. There is an Arduino Nano to run the speech recognition algorithm and a MAX9814 microphone amplifier to capture the voice commands. However, the beauty of [Peter’s] approach, lies in his software implementation. [Peter] has a bit of an interplay between a custom PC program he wrote and the Arduino Nano. The learning aspect of his algorithm is done on a PC, but the implementation is done in real-time on the Arduino Nano, a typical approach for really any machine learning algorithm deployed on a microcontroller. To capture sample audio commands, or utterances, [Peter] first had to optimize the Nano’s ADC so he could get sufficient sample rates for speech processing. Doing a bit of low-level programming, he achieved a sample rate of 9ksps, which is plenty fast for audio processing.

To analyze the utterances, he first divided each sample utterance into 50 ms segments. Think of dividing a single spoken word into its different syllables. Like analyzing the “se-” in “seven” separate from the “-ven.” 50 ms might be too long or too short to capture each syllable cleanly, but hopefully, that gives you a good mental picture of what [Peter’s] program is doing. He then calculated the energy of 5 different frequency bands, for every segment of every utterance. Normally that’s done using a Fourier transform, but the Nano doesn’t have enough processing power to compute the Fourier transform in real-time, so Peter tried a different approach. Instead, he implemented 5 sets of digital bandpass filters, allowing him to more easily compute the energy of the signal in each frequency band.

The energy of each frequency band for every segment is then sent to a PC where a custom-written program creates “templates” based on the sample utterances he generates. The crux of his algorithm is comparing how closely the energy of each frequency band for each utterance (and for each segment) is to the template. The PC program produces a .h file that can be compiled directly on the Nano. He uses the example of being able to recognize the numbers 0-9, but you could change those commands to “start” or “stop,” for example, if you would like to.

[Peter] admits that you can’t implement the type of speech recognition on an Arduino Nano that we’ve come to expect from those covert listening devices, but he mentions small, hands-free devices like a head-mounted multimeter could benefit from a single word or single phrase voice command. And maybe it could put your mind at ease knowing everything you say isn’t immediately getting beamed into the cloud and given to our AI overlords. Or maybe we’re all starting to get used to this. Whatever your position is on the current state of AI, hopefully, you’ve gained some inspiration for your next project.

Stay Smarter Than Your Smart Speaker

Smart speakers have always posed a risk to privacy and security — that’s just the price we pay for getting instant answers to life’s urgent and not-so-urgent questions the moment they arise. But it seems that many owners of the 76 million or so smart speakers on the active install list have yet to wake up to the reality that this particular trick of technology requires a microphone that’s always listening. Always. Listening.

With so much of the world’s workforce now working from home due to the global SARS-CoV-2 pandemic, smart speakers have suddenly become a big risk for business, too — especially those where confidential conversations are as common and crucial as coffee.

Imagine the legions of lawyers out there, suddenly thrust from behind their solid-wood doors and forced to set up ramshackle sub rosa sanctuaries in their homes to discuss private matters with their equally out-of-sorts clients. How many of them don’t realize that their smart speaker bristles with invisible thorns, and is even vulnerable to threats outside the house? Given the recent study showing that smart speakers can and do activate accidentally up to 19 times per day, the prevalence of the consumer-constructed surveillance state looms like a huge crisis of confidentiality.

So what are the best practices of confidential work in earshot of these audio-triggered gadgets?

Continue reading “Stay Smarter Than Your Smart Speaker”

Smart Speakers “Accidentally” Listen Up To 19 Times A Day

In the spring of 2018, a couple in Portland, OR reported to a local news station that their Amazon Echo had recorded a conversation without their knowledge, and then sent that recording to someone in their contacts list. As it turned out, the commands Alexa followed came were issued by television dialogue. The whole thing took a sitcom-sized string of coincidences to happen, but it happened. Good thing the conversation was only about hardwood floors.

But of course these smart speakers are listening all the time, at least locally. How else are they going to know that someone uttered one of their wake words, or something close enough? It would sure help a lot if we could change the wake word to something like ‘rutabaga’ or ‘supercalifragilistic’, but they probably have ASICs that are made to listen for a few specific words. On the Echo for example, your only choices are “Alexa”, “Amazon”, “Echo”, or “Computer”.

So how often are smart speakers listening when they shouldn’t? A team of researchers at Boston’s Northeastern University are conducting an ongoing study to determine just how bad the problem really is. They’ve set up an experiment to generate unexpected activation triggers and study them inside and out.

Continue reading “Smart Speakers “Accidentally” Listen Up To 19 Times A Day”

Almond: Open Personal Assistant From Stanford

The current state of virtual personal assistants — Alexa, Cortana, Google, and Siri — leaves something to be desired. The speech recognition is mostly pretty good. However, customization options are very limited. Beyond that, many people are worried about the privacy of their data when using one of these assistants. Stanford Open Virtual Assistant Lab has rolled out Almond, which is open and is reported to have better privacy features.

Like most other virtual assistants, Almond has skills that determine what it can do. You can use Almond in a browser, on a Google phone, or as a command line application. It all lives on GitHub, so if you don’t like something you are free to fix it.

Continue reading “Almond: Open Personal Assistant From Stanford”

Say Hello To This Cortana Hologram

Halo’s Cortana enters the real world with this internet appliance. [Jarem Archer] has built an amazing “holographic” home for Cortana of Halo and Windows fame. The display isn’t really a hologram, it uses the age-old Pepper’s ghost illusion. A monitor reflects onto 3 angled half mirrored panels. This creates a convincing 3D effect. Cortana herself is a 3D model. [Jarem’s] wife provided gave Cortana her moves by walking in front of dual Kinect depth-sensing cameras. This motion capture performance drives the 3D Cortana model on the screen.

The brain behind this hack is the standard Windows 10 Cortana voice assistant. Saying “Hey Cortana” wakes the device up. To make the whole experience more interactive, [Jarem] added a face detection camera to the front of the device. When a face is detected, the Cortana model turns toward the user. Even if several people are watching the device, it would seem as if Cortana was “talking to” one person in the audience.

The cherry on top of this hack is the enclosure. [Jarem] 3D printed a black plastic stage. An Arduino drives RGB LEDs whenever Cortana is activated. The LEDs project a blue glue that works well with the Pepper’s ghost illusion. The result is a project that looks like something Microsoft might have cooked up in one of their research labs.

Continue reading “Say Hello To This Cortana Hologram”

How Has Amazon Managed To Make Hackers Love Alexa?

Our hackspace has acquired an Amazon Dot, courtesy of a member. It mostly seems to be used as a source of background music, but it has also spawned a seemingly never-ending new entertainment in which the hackspace denizens ceaselessly bait their new electronic companion with ever more complex and esoteric requests. From endless rephrasing and careful enunciation of obscure early reggae artists to try to settle a musical argument to hilarious mis-hearing on the part of our silicon friend, the fun never stops. “Alexa, **** off!” it seems results in “I’m sorry, I can’t find a device of that name on this network”.

amazon-dot-always-listeningThat is just the experience of one hackspace, but it evidently does not end there. Every other day it seems that new projects using Alexa pass through the Hackaday timeline, so it looks as though Amazon’s online personal assistant has been something of a hit within our community.

Fair enough, you might say, we’re always early adopters of any new technology. But it’s a development over which I wonder; am I alone in finding it surprising? It’s worth taking a moment to look at the subject.

Continue reading “How Has Amazon Managed To Make Hackers Love Alexa?”

How To Control Siri Through Headphone Wires

Last week saw the revelation that you can control Siri and Google Now from a distance, using high power transmitters and software defined radios. Is this a risk? No, it’s security theatre, the fine art of performing an impractical technical achievement while disclosing these technical vulnerabilities to the media to pad a CV. Like most security vulnerabilities it is very, very cool and enough details have surfaced that this build can be replicated.

The original research paper, published by researchers [Chaouki Kasmi] and [Jose Lopes Esteves] attacks the latest and greatest thing to come to smartphones, voice commands. iPhones and Androids and Windows Phones come with Siri and Google Now and Cortana, and all of these voice services can place phone calls, post something to social media, or launch an application. The trick to this hack is sending audio to the microphone without being heard.

googleThe ubiquitous Apple earbuds have a single wire for a microphone input, and this is the attack vector used by the researchers. With a 50 Watt VHF power amplifier (available for under $100, if you know where to look), a software defined radio with Tx capability ($300), and a highly directional antenna (free clothes hangers with your dry cleaning), a specially crafted radio message can be transmitted to the headphone wire, picked up through the audio in of the phone, and understood by Siri, Cortana, or Google Now.

There is of course a difference between a security vulnerability and a practical and safe security vulnerability. Yes, for under $400 and the right know-how, anyone could perform this technological feat on any cell phone. This feat comes at the cost of discovery; because of the way the earbud cable is arranged, the most efficient frequency varies between 80 and 108 MHz. This means a successful attack would sweep through the band at various frequencies; not exactly precision work. The power required for this attack is also intense – about 25-30 V/m, about the limit for human safety. But in the world of security theatre, someone with a backpack, carrying around a long Yagi antenna, pointing it at people, and having FM radios cut out is expected.

Of course, the countermeasures to this attack are simple: don’t use Siri or Google Now. Leaving Siri enabled on a lock screen is a security risk, and most Androids disable Google Now on the lock screen by default. Of course, any decent set of headphones would have shielding in the cable, making inducing a current in the microphone wire even harder. The researchers are at the limits of what is acceptable for human safety with the stock Apple earbuds. Anything more would be seriously, seriously dumb.