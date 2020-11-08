Voice Assistants, love them, or hate them, are becoming more and more commonplace. One problem for voice assistants is the situation of multiple devices listening in the same place. When a command is given, which device should answer? Researchers at CMU’s Future Interfaces Group [Karan Ahuja], [Andy Kong], [Mayank Goel], and [Chris Harrison] have an answer; smart assistants should try to infer if the user is facing the device they want to talk to. They call it direction-of-voice or DoV.
Currently, smart assistants use a simple race to see who heard it first. The reasoning is that the device you are closest to will likely hear it first. However, in situations with echos or when you’re equidistant from multiple devices, the outcome can seem arbitrary to a user.
The implementation of DoV uses an Extra-Trees Classifier from the python sklearn toolkit. Several other machine learning algorithms were considered, but ultimately efficiency won out and Extra-Trees was selected. Another interesting facet of the research was determining what facing really means. The team had humans ‘listeners’ stand in for smart assistants. A ‘talker’ would speak the key phrase while the ‘listener’ determined if the talker was facing them or not. Based on their definition of facing, the system can determine if someone is facing the device with 90% accuracy that rises to 93% with per-room calibration.
Their algorithm as well as the data they collected has been open-sourced on GitHub. Perhaps when you’re building your own voice assistant, you can incorporate DoV to improve wake-word accuracy.
Thanks [Karan] for sending this in!
8 thoughts on “Robots Can Finally Answer, Are You Talking To Me?”
But .. since the assistant has to recognize the spoken commands, wouldn´t it be easier e less error prone to give said assistant a name and use it in commands directed to them ? If one is watching what is cooking in a pan, for example, and need to give some order to the (smart?) assistant, this one maybe cannot stop paying attention to the pan to search and look at the voice assistant.
If I have to look at the machine, it kinda defeats the reason / utility for having it listening.
Yes there should be a more fluid naming and command environment. Instead of Hey Google Turn On TV….it should be Turn on TV,…..Turn on Living room lights…..Set house AC on……False triggers would happen more which just shows that our Star Trek/Jetson home is not there yet. Speaking of false triggers….one time the TV was on and in a comedy show there was a sexual innuendo type joke about self pleasure….. suddenly my Lenovo/Google piped up and said ” Sorry I can’t help you with that…”
You both seem to forget the most important bit – politeness.
The device must not react until a “please” is heard.
Otherwise, kids who grew up with these assistants will end up with bad manners and turn into cheeky brats.
Seriously, this command style belongs to the military, not into a civilized home. Think about it. PLEASE.
I agree with you, it would be importante also, and could even help in the workings of the software. The command sequence starts with the name assigned to the device “Jarvis” then the command “turn on the lights” and then the end-of-command marker : “Please” .
The software could have two points to improve the recognizition of the commands.
But giving unique names to the assistants is also important. The Jetsons called their robot Rosie, if memory serves well. And when you are working, say, in the underside of your car, you don´t get out from under it, look directly at your helper/son/friend/whatever and ask for a wrench. You would call them by name , and ask for said wrench. Same true and tested thing could work for the voice assistants, for the people that want to use them.
DoV should be additional and not heavily weighted by the device. Do you physically turn and face each person when you talk them ? Not usually. Having to face each device to command it would be like Scotty picking up a mouse and saying “Hello computer”. I command my Google controlled devices as I walk by it. If it has to take additional time to decide if I am facing it or another device would either delay the actions or cause it to not act.
I can’t stand cloud dependent, data mining, voice operated clappers.
I think the Flintstones had it right. Semi intelligent biomechanism helpers is the future. They work for their own self interest instead of some corporate entity. Just don’t piss them off with your own notions of superiority and self importance and the world will be a better place.
As a plus, if you can live harmoniously with them, you’re more likely to be socially acceptable to your own species.
voice assistant (a.k.a. dystopian technostate eavesdropping device).
Yeah. Good idea. Lets also fill our houses with always on cameras. Of course some people are stupid enough to do such a thing in the name os “security” or “comfort”