In the spring of 2018, a couple in Portland, OR reported to a local news station that their Amazon Echo had recorded a conversation without their knowledge, and then sent that recording to someone in their contacts list. As it turned out, the commands Alexa followed came were issued by television dialogue. The whole thing took a sitcom-sized string of coincidences to happen, but it happened. Good thing the conversation was only about hardwood floors.
But of course these smart speakers are listening all the time, at least locally. How else are they going to know that someone uttered one of their wake words, or something close enough? It would sure help a lot if we could change the wake word to something like ‘rutabaga’ or ‘supercalifragilistic’, but they probably have ASICs that are made to listen for a few specific words. On the Echo for example, your only choices are “Alexa”, “Amazon”, “Echo”, or “Computer”.
So how often are smart speakers listening when they shouldn’t? A team of researchers at Boston’s Northeastern University are conducting an ongoing study to determine just how bad the problem really is. They’ve set up an experiment to generate unexpected activation triggers and study them inside and out.
Sequestered Smart Speakers
The team corralled a group of mainstream smart speakers into a box representing all the major players — four Alexas and one each of her cohorts. We’d love to see them maximize the test subjects by including enough devices of each type to cover all the possible assigned wake words, but that would be pretty expensive.
Then they piped in 125 hours worth of audio from TV shows with rapid-fire dialogue using Netflix. The shows they chose are healthy cross-section of televised entertainment — mostly newer stuff, but some going back a decade or more. Everything from comedy to drama. A video camera trained on the speakers will record any lights that indicate a successful activation. There’s also a microphone to pick up anything the devices say in response to the dialogue stream, and a WAP to capture network traffic in and out of the box.
While the results indicate that these devices aren’t constantly recording (phew!), they do tend to wake up quite frequently for short periods of time — up to 19 times in a 24-hour period. The worst offenders were the Apple and Microsoft speakers, both of which activated more often than the others. Not all of the activations were short and sweet, though — both the Microsoft Invoke and the Echo Dot had accidental activations lasting up to 43 seconds long. That’s plenty of time to record and/or distribute your late-night 16-digit utterances to the QVC operators, or the secret ingredient in your mother-in-law’s Quiche Lorraine.
Are You Talkin’ To Me?
The researchers saw patterns emerge in the dialogue that caused activations lasting five seconds or longer, but the patterns aren’t terribly surprising. Basically, any phrase starting with a word that contains the ‘ey/ay’ sound (e.g. they/may/pay/sleigh) followed with a hard ‘g’ sound (or anything close to it) will wake up a Google Home mini set to listen for ‘hey Google’. The other speakers acted the same way when they heard strings that rhyme with their wake word(s).
At present, the group is still studying activations that lead to recordings being uploaded to the cloud. They’re also trying to determine whether human modifiers such as gender, ethnicity, and accent have any impact on the probability of accidental activation.
Just like the humans that designed them, smart speakers occasionally mishear things, including music lyrics and their own names. Whether they are learning from their mistakes remains to be seen.
66 thoughts on “Smart Speakers “Accidentally” Listen Up To 19 Times A Day”
This is (one of the reasons) why none of that garbage is allowed in my home.
Do you have a smart phone?
A smart home?
Serving a different purpose and hence can be manually triggered, unlike the current hardware under test.
Yes, but it’s not Android, iOS or Windows. Running tcpdump/wireshark is astonishingly quiet. It’s also a prepaid SIM (cash) which never connects to Wi-Fi. It’s about as safe as it can get while stiill retaining usability. I can’t speak to the baseband (nearly all modems require proprietary blobs), but at least I can audit the bulk of the OS. Disabling microphones and sensors via dbus is also pretty handy. I’m not saying it’s perfect, but it’s as close as most people can get.
What do you use? Linux on your phone?
Essentially, yes. Sailfish OS. Most of it is open minus a few closed UX elements. Traditional hierarchy, standard Linux tools readily available. Obviously not for everyone, but I’ve been using it exclusively for several years. Like I said, the baseband is likely a blackbox, but at least I can see what the OS itself is doing or, in this case, not doing.
I hate this utterly defeatist attitude you’re no doubt implying.
Smartphones were tools that useless tools turned into privacy violating & planned obsolescence ~trendy~ lifestyle devices instead of being user intended functions before form.
Yeah, me too. The outcome of this line of reasoning is that we should fix phones too, not that we should be fine with useless AI assistants.
There’s also the value/risk assessment. Smartphones are fairly necessary for most people in today’s world. AI assistants are virtually always used a couple times for a couple months and then never (intentionally) spoken to ever again. They’re stupid. Nobody really wants or needs them. Want to secure your AI assistant? Throw it in the trash, you never needed one for anything anyway. No, controlling some color-changing bulbs doesn’t count. You literally will do that twice for a laugh and then never again. C’mon.
No, i don’t. And no Facebook and no Windows and almost no Google too. Other questions?
Am i paranoid? NO. Just sensible (not sure if this translation is right).
Yeah, and it is turned off when it needs to be.
I’m so woke to the dangers of IOT, I live in a cave without electricity. Take that you demons of modern technology.
And of course you are using your “PC” (Pigeon Cluster) for communication ;)
They must be using IPoAC
Uhhh.. Internet Protocol over Air Conditioning?
Hey guys, we found Richard Stallman.
It’s fine to reject some tech, doesn’t make you a Luddite. Especially when the tech in question is largely a toy and not really used for anything helpful or necessary. There’s no reason why even a very high-tech household needs an Alexa. They’re gimmicks to the core.
It’s even more fine to criticize the execution of technology even though you use it in your daily life. Maybe we should fix the huge problems in IoT—and phones too, while we’re at it. Nah. Let’s make fun of people with concerns instead.
“There’s no reason why even a very high-tech household needs an Alexa. They’re gimmicks to the core.”
And another dream dies.
>>“There’s no reason why even a very high-tech household needs an Alexa.”
Says the guy who has never used one
– or –
the guy who has two fully functioning arms and two fully functioning legs.
I hate ceiling lights, but was required to install a smart bulb in the kitchen ceiling for a family member.
I became an early-adopter, for the very first time, with a smart speaker, because I could see a future that would enable me to replace my crappy X-10 system, even though there were no “smart bulbs” available back in those days. It took years of doing virtually nothing with my smart speaker, before the useful apps and devices came onboard.
Do you warm up before making those leaps? Don’t wanna pull a hammy.
When I was young I used to dream of “the future,” one in which C3PO-like androids existed to do laundry, mow the lawn, make the beds, mop the floors, or what have you. The older I get, the more helpful, I realize, such a “helper” could be.
They’re aren’t here yet, but they’re coming. And I now realize I will never have one in my home.
The problem is that my vision of the future did not anticipate household computing hardware whose primary purpose is to run back tattling to the mothership everytime a confidential word is uttered in the home. If memory serves, Hackaday itself had an article some time back about a guy with a packet sniffer who discovered his “smart” tv was spying on him. It’s disgusting.
And yes…I have a smart phone. I’ve disabled all the voice command crap to the extent possible. I try to be mindful of what I say around it. Cell phones are not allowed in my bedroom.
There IS such a thing as privacy. Unfortunately the tech world has trained an entire generation to not only abandon that idea, but to voluntarily post the most intimate details of their life on social media. I can do nothing but shake my head.
…now you kids get off my lawn!
This is why nerds are the worst people to design a product, they come up with “brilliant” ideas and then they say “my vision of the future did not anticipate…” which only shows why these people should stick to mowing lawns.
The nerds are the ones that understands the problems so your world would be frightening. Normal people even if smart and well informed about technology (on a normie level) don’t understand how unsafe and fragile our world is especially on a local level. Nerds are generally informed to see the dangers in many cases even being paranoids.
That’s all fine and good, my statement stands, nerds don’t understand human nature and have no place designing products.
Maybe you can tell us all about how the nerds anticipated Facebook and its problems? No, because they did not! It was nerds who brought us unfettered social media and amplified hate.
It was Hollywood that first warned us about the dangers of Facebook in that Star Trek Next Generation episode.
In reply to N: It was nerds who gave us wheels, iron, coffee, steam engines, electricity, radio and the internet. please accept the fact that the world is made of rulers, nerds and consumers. A consumer will never create something, a ruler will never create something. only nerds do.
Nerds invented coffee and iron? huh? You were there when the wheel was invented so you know for sure? double huh? And again what I said stands. maybe nerds did invent electricity but that does not mean that they understand social issues. In fact it is their dedication to their narrow fields that makes them useless as product designers that must accommodate the entire of the human experience.
In reply to N: (Sigh) Social engineering talks at defcon anyone.
Oh and yes hollywood the same hollywood that made Hackers (1995)?
You seem like someone who can barely understand the community they are in.
Your use of language speaks of a skilled arguer/debater so I won’t
even try arguing with you because it will waste my time.
I advise anyone else, don’t waste your time with this person.
The people you should be blaming are corporate dicks, not nerds. That’s where all the psychopaths and grandmother-sellers are, constantly looking for ways to do horrible things, partly to make money but also because they excessively crave risk and novelty because their brains are wrong.
Star Trek TNG was written by, and for, nerds! As was all halfway decent scifi. Not to say all nerd-written scifi is good, the relationship doesn’t go back in that direction.
Half of Facebook’s “problems” are deliberate features and the other half the result of most people being fucking morons. Where are the guilty nerds, in that picture? They’re not on Facebook because they wouldn’t want an account on there.
Alexas are for people like Stephen Fry, loudly self-proclaimed “nerd” but really just a consumer fetishist. He buys expensive toys cos he hasn’t got a wife to tell him to grow up and stop acting like a teenage boy. Instead, he’s married to a teenage boy.
Actual geeks wouldn’t have an Alexa etc in the house. They’re giving the cheaper ones away free with boxes of washing powder now and I still wouldn’t have one. Actual geeks look up the free voice-rec libraries and implement them on Ras Pis or nano-ATX PCs, controlling their house through Nordic radio chips or the like. Maybe ESP chips through Wifi. They might use some other system’s socket and light controllers, but sniff the packets and spoof the signals from their own transmitter. I know cos I’ve read about them all on here.
Sorry, Greenaum, but nobody is innocent, here. The nerds KNOW that their corporate bosses will abuse any kind of power they’re given, and yet continue to give them new powers. You can’t do that twice and remain blameless.
M&Mes: so you want to put out your opinion, but reject in advance any counterargument, because your opponent may be better at arguing than you are. Got it.
so you trust these nerds to find obscure bugs in your software even though they are easily misled
My guy, you’re on hackaday. Everyone here is a nerd.
Nerd wannabe at the very least
N: it’s not that they are easily misled. It’s that they are mercenary – they will work for people whose morals they disagree with, just because they enjoy the work. Nerds want to work on the cool technology and develop new capabilities, even though they can clearly see how these might be misused. They can always say, “my intentions were pure – it was my corporate overlords who twisted it and used it to screw you over.”
Indeed, I’ve played with Voice control/assistant but only from the opensource projects and never letting data out to the net.. Unfortunately its harder to avoid smartphones and scumware on them than ever – far to many devices that are only controllable by the app, and lots of folk will only communicate with you via platform x/y (so even if its possible to use on other devices with most platforms assume everything is at least data slurped by one company out for a profit)
That said as there is a limit to how much I give a crap, anything that really needs to be kept secret can be – the rest is like bowel voiding, everyone does it and nobody much cares to learn about somebody else’s so why be embarrassed or overly worried by it. In short keep they spyware laden devices in only the places their benefits outweigh the inevitably profiling and data slurping crap.
“keep they spyware laden devices in only the places their benefits outweigh the inevitably profiling and data slurping crap.”
How do you perform this benefit weight calculation? What is the unit of measure for “benefit”? Do you limit your data to the present or do you anticipate future software updates with even more insidious profiling? Do you actually understand what is going on inside these devices well enough to be able to make this determination? Do you plan to repeat your calculations in the future or are you done now?
“nobody much cares to learn about somebody else’s”. you really fail to understand human nature, look at social media networks, humans care very much about what other humans are doing. Your neighbors are probably remarking right now on your odd habits that they observe through your windows.
People don’t post to facebook because others want to see it. They post to facebook because they want to tell others what they are doing.
No one really cares about most peoples other posts.
Hard to observe though the perpetually closed curtain but its a valid enough point, within reason people are nosey. Still don’t tend to care about much of what the stupid smart devices could tell them though. The criminal element is probably the bigger worry – Oh they haven’t been picked up for a few hours by the ‘smart’ thingy and the car/bike whatever isn’t in the drive lets burgle them type stuff. Something that seems like more effort than its worth unless you actually have secrets/valuables you flaunt worth taking…
Indeed the risk/reward is not an easy thing to figure out. But for me keep it simple, the bulk of times a voice assistant would be actually useful are while you hands are covered in cooking related stuff – and unless you have a very strange household you cook in a room you do almost nothing else in..
Sure people might be interested to know how many bran flakes etc I consume.. Or just how much Tea it takes to make me vaguely approach functional in the morning. But none of this matters if it is snooped maybe some advertising conglomerate sell my address to tea subscription services or something. But that has no effect on me but adding some ‘targeted’ junk mail to the pile.
That said I don’t use and probably never will any cloud based, close source type as with the open source projects with local processing working pretty damn well and no need for any of the features of the stupid TV’s/Echo’s that don’t exist on other platforms why would I sell my self to them (and pay for the privilege)?
“No one really cares about most peoples other posts.”
That is precisely the Facebook business model, their entire company is based on the fact that some people care deeply about the random posts of strangers.
Not the OP, but no… I haven’t one.
“We’d love to see them maximize the test subjects by including enough devices of each type to cover all the possible assigned wake words, but that would be pretty expensive.”
Just borrow some.
Who in their right mind is going to loan their electronic gear to a nerd with a soldering iron?
My friends do it all the time.
By not telling them you have the equivalent of BDSM equipment for electronics.
Also known as lie.
Most smart assistants have been fully forgotten. They just stay plugged in in the corner, but nobody has talked to them for months.
” but they probably have ASICs that are made to listen for a few specific words.”
So the wake words are being processed locally? Is that true for all devices?
Does an open source project exist where you can define your own wake word and once triggered your device _only then_ speaks to the cloud?
Yes, wake word detection is done on-device. Otherwise the audio would have to be constantly streamed to “the cloud” and nobody could pretend anything about privacy. But it’s not custom sillicon – only a well trained neural net. So, changing the wake word is possible, but not easy.
Now I’m curious. Effectively change the wake-word by slowly and methodically altering the enunciation/inflection/tone/timbre working up to different syllables and eventually a completely new word?
Okay, Google ==> Okay, Rutabaga
Use a text-to-speech program and lock them in a box for a week…
It’s a curious idea but I fear the wake words are baked into the neural net and it is not adaptive.
ther used to be something like that called snips…
Apparently still about, and just been bought by Sonos.
Smart speaker is a bit of a misnomer of course; no doubt inspired by the marketing department.
In reality these things are smart microphones.
“SMART SPEAKER PLAY EASY LISTENING SMOOTHJAZZ PLAYLIST!!”
Civilization has always been asking for a “Santa Claus” figure who “knows when you are sleeping” and “knows if you’ve been bad or good”. Organized religion is another example of people who actively desire to be subjugated.
Like it or not fellow humans, your inner nature is that you do really want big brother looking over your shoulder. Your history and your culture are not lying to you.
We hope like hell he’ll someone IS in control. Because we sure as shit ain’t …..
It’s not the type of people who come to this site who want big brother listening over their shoulders, it’s the type of people who don’t come to siets like this. The type of people who see technical details as someone else’s problem. I for one do not and will never accept what I think it most appropriate to refer to as Neo-Stasi “assistants”.
N for neg right? All your comments are negative. No one wants your entryist marxism here, go away, this is not a political space.
Simply read the ‘permissions’ you agreed to when you allowed your smartphone/tablet to access Google Play Store……
You probably agreed to similar ‘intrusions’ when you opened the box with the Smart (Sneeky) Speaker came in.
Totally. And just like RMS you’ll ignore his perfectly true and valid point until it bites you in the ass a decade later.
Meh I give less than zero shits about this. They fixed it so it does not respond after Alexis on Schitts Creek so I am pleased lol.
I went to a product preview of Google Glass years back. The darned thing would never respond to my accent, but it /would/ respond to the utterances of others at the event when they were activating /their/ glasses.
Now I have a recently acquired ‘smart’ tv that I often find playing by itself. Perhaps I need to pierce it’s eardrum.
I think Joe Kim has very much nailed it before the first paragraph.
Agreed, this is spectacular art for the article.
I wonder if they then tested circular activation – like Cartman in that episode of Southpark.
Just glad these things aren’t using cameras for gesture-based commands…. Or i would seriously have to reconsider my naked yoga and aerobics workouts.
Please be kind and respectful to help make the comments section excellent. (Comment Policy)