Voice Assistants, love them, or hate them, are becoming more and more commonplace. One problem for voice assistants is the situation of multiple devices listening in the same place. When a command is given, which device should answer? Researchers at CMU’s Future Interfaces Group [Karan Ahuja], [Andy Kong], [Mayank Goel], and [Chris Harrison] have an answer; smart assistants should try to infer if the user is facing the device they want to talk to. They call it direction-of-voice or DoV.
Currently, smart assistants use a simple race to see who heard it first. The reasoning is that the device you are closest to will likely hear it first. However, in situations with echos or when you’re equidistant from multiple devices, the outcome can seem arbitrary to a user.
The implementation of DoV uses an Extra-Trees Classifier from the python sklearn toolkit. Several other machine learning algorithms were considered, but ultimately efficiency won out and Extra-Trees was selected. Another interesting facet of the research was determining what facing really means. The team had humans ‘listeners’ stand in for smart assistants. A ‘talker’ would speak the key phrase while the ‘listener’ determined if the talker was facing them or not. Based on their definition of facing, the system can determine if someone is facing the device with 90% accuracy that rises to 93% with per-room calibration.
Their algorithm as well as the data they collected has been open-sourced on GitHub. Perhaps when you’re building your own voice assistant, you can incorporate DoV to improve wake-word accuracy.
Continue reading “Robots Can Finally Answer, Are You Talking To Me?”
To quote the greatest philosopher of the 20th century: “The future ain’t what it used to be.” Take personal assistants such as Amazon Echo and Google Home. When first predicted by sci-fi writers, the idea of instant access to the sum total of human knowledge with a few utterances seemed like a no-brainer; who wouldn’t want that? But now that such things are a reality, having something listening to you all the time and potentially reporting everything it hears back to some faceless corporate monolith is unnerving, to say the least.
There’s a fix for that, though, with this cone of silence for your smart speaker. Dubbed “Project Alias” by [BjørnKarmann], the device consists of a Raspberry Pi with a couple of microphones and speakers inside a 3D-printed case. The Pi is programmed to emit white noise from its speakers directly into the microphones of the Echo or Home over which it sits, masking out the sounds in the room while simultaneously listening for a hot-word. It then mutes the white noise, plays a clip of either “Hey Google” or “Alexa” to wake the device up, and then business proceeds as usual. The bonus here is that the hot-word is customizable, so that in addition to winning back a measure of privacy, all the [Alexas] in your life can get their names back too. The video below shows people interacting with devices named [Doris], [Marvin], [Petey], and for some reason, [Milkshake].
We really like this idea, and the fact that no modifications are needed to the smart speaker is pretty slick, as is the fact that with a few simple changes to the code and the print files it can be used with any smart speaker. And some degree of privacy from the AI that we know is always listening through these things is no small comfort either.
Continue reading “Win Back Some Privacy With A Cone Of Silence For Your Smart Speaker”