Voice Without Sound

Voice recognition is becoming more and more common, but anyone who’s ever used a smart device can attest that they aren’t exactly fool-proof. They can activate seemingly at random, don’t activate when called or, most annoyingly, completely fail to understand the voice commands. Thankfully, researchers from the University of Tokyo are looking to improve the performance of devices like these by attempting to use them without any spoken voice at all.

The project is called SottoVoce and uses an ultrasound imaging probe placed under the user’s jaw to detect internal movements in the speaker’s larynx. The imaging generated from the probe is fed into a series of neural networks, trained with hundreds of speech patterns from the researchers themselves. The neural networks then piece together the likely sounds being made and generate an audio waveform which is played to an unmodified Alexa device. Obviously a few improvements would need to be made to the ultrasonic imaging device to make this usable in real-world situations, but it is interesting from a research perspective nonetheless.

The research paper with all the details is also available (PDF warning). It’s an intriguing approach to improving the performance or quality of voice especially in situations where the voice may be muffled, non-existent, or overlaid with a lot of background noise. Machine learning like this seems to be one of the more powerful tools for improving speech recognition, as we saw with this robot that can walk across town and order food for you using voice commands only.

14 thoughts on “Voice Without Sound

  1. A vocabulary word immediately popped into my head when I read this. Fricative. What about the fricatives? Those and other things that are created more in the upper area/lip area. Are they reasonably accurate?

      1. I read an article a few years ago and basically it said people unconsciously form speech silently and that there are decipherable laryngeal movements while thinking. While you may not think you’re doing anything other than thinking, research says otherwise

  2. Didn’t NASA nail this problem ages ago and were even able to detect sub vocalizations so that the user didn’t even need to actual make a sound, the tiny changes in electrical activity in the neck muscles was enough?

  3. For ages (think, WWII) the military had throat mike things that just used vibrations instead of sound itself. Called a voiceless mic if I recall correctly. Used for loud aircraft and stuff. Sometime they would be in movies where they pinch their neck when talking. I played with some surplus ones like 30 years ago. But if you want to uncover AI and neural net learning and stuff, fine.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.