Voice recognition is becoming more and more common, but anyone who’s ever used a smart device can attest that they aren’t exactly fool-proof. They can activate seemingly at random, don’t activate when called or, most annoyingly, completely fail to understand the voice commands. Thankfully, researchers from the University of Tokyo are looking to improve the performance of devices like these by attempting to use them without any spoken voice at all.
The project is called SottoVoce and uses an ultrasound imaging probe placed under the user’s jaw to detect internal movements in the speaker’s larynx. The imaging generated from the probe is fed into a series of neural networks, trained with hundreds of speech patterns from the researchers themselves. The neural networks then piece together the likely sounds being made and generate an audio waveform which is played to an unmodified Alexa device. Obviously a few improvements would need to be made to the ultrasonic imaging device to make this usable in real-world situations, but it is interesting from a research perspective nonetheless.
The research paper with all the details is also available (PDF warning). It’s an intriguing approach to improving the performance or quality of voice especially in situations where the voice may be muffled, non-existent, or overlaid with a lot of background noise. Machine learning like this seems to be one of the more powerful tools for improving speech recognition, as we saw with this robot that can walk across town and order food for you using voice commands only.
Continue reading “Voice Without Sound” →
Those of us who were around in the late 70s and into the 80s might remember the Speak & Spell, a children’s toy with a remarkable text-to-speech synthesizer. While it sounds dated by today’s standards, it was revolutionary for the time and was riding a wave of text-to-speech functionality that was starting to arrive to various computers of the era. While a lot of them used dedicated hardware to perform the speech synthesis, some computers were powerful enough to do this in software, but others were not quite able. The VIC-20 was one of the latter, but thanks to an ESP8266 it has been retroactively given this function.
This project comes to us from [Jan Derogee], a connoisseur of this retrocomputer, and builds on the work by [Earle F. Philhower] who ported the retro speech synthesis software known as SAM from assembly to C which made it possible to run on the ESP8266. Audio playback is handled on the I2S port, but some work needed to be done to get this to work smoothly since this port also handles the communication with the VIC-20. Once this was sorted out, a patch was made to be able to hear the computer’s audio as well as the speech synthesizer’s. Finally, a serial command interface was designed by [Jan] which allows for control of the module.
While not many of us have VIC-20s sitting at home, it’s still an interesting project that shows the broad scope of a small and inexpensive chip like the ESP8266 which would have had a hefty price tag back in the 1980s. If you have other 80s hardware laying around waiting to be put to work, though, take a look at this project which brings new vocabulary words to that old classic Speak & Spell.
Continue reading “Classic 80s Text-To-Speech On Classic 80s Hardware” →
For just about any task you care to name, a Linux-based desktop computer can get the job done using applications that rival or exceed those found on other platforms. However, that doesn’t mean it’s always easy to get it working, and speech recognition is just one of those difficult setups.
A project called Voice2JSON is trying to simplify the use of voice workflows. While it doesn’t provide the actual voice recognition, it does make it easier to get things going and then use speech in a natural way.
The software can integrate with several backends to do offline speech recognition including CMU’s pocketsphinx, Dan Povey’s Kaldi, Mozilla’s DeepSpeech 0.9, and Kyoto University’s Julius. However, the code is more than just a thin wrapper around these tools. The fast training process produces both a speech recognizer and an intent recognizer. So not only do you know there is a garage door, but you gain an understanding of the opening and closing of the garage door.
Continue reading “Making Linux Offline Voice Recognition Easier” →
Of the 43 muscles that comprise the human face, only a few are actually important to speaking. And yet replicating the movements of the mouth by mechanical means always seems to end up only partly convincing. Servos and linkages can only approximate the complex motions the lips, cheeks, jaw, and tongue are capable of. Still, there are animatronics out there that make a good go at the job, of which this somewhat creepy mechanical mouth is a fine example.
Why exactly [Will Cogley] felt the need to build a mechanical maw with terrifying and fairly realistic fangs is anyone’s guess. Recalling his lifelike disembodied animatronic heart build, it just seems like he pursues these builds for the challenge of it all. But if you thought the linkages of the heart were complex, wait till you see what’s needed to make this mouth move realistically. [Will] has stuffed this pie hole with nine servos, all working together to move the jaw up and down, push and pull the corners of the mouth, raise and lower the lips, and bounce the tongue around.
It all seems very complex, but [Will] explains that he actually simplified the mechanical design to concentrate more on the software side, which is a text-to-speech movement translator. Text input is translated to phonemes, each of which corresponds to a mouth shape that the servos can create. It’s pretty realistic although somewhat disturbing, especially when the mouth is placed in an otherwise cuddly stuffed bear that serenades you from the nightstand; check out the second video below for that.
[Will] has been doing a bang-up job on animatronics lately, from 3D-printed eyeballs to dexterous mechatronic hands. We’re looking forward to whatever he comes up with next — we think.
Continue reading “This Animatronic Mouth Mimics Speech With Servos” →
The field of Augmentative and Alternative Communciation (AAC) covers communication methods used by those who are unable to otherwise produce or comprehend spoken or written language. Many will be familiar with the speech synthesizer used by Stephen Hawking as just one such example of AAC technology. [Christina Hunger] is a speech language pathologist, and is intimately familiar with such tools. She decided to use these techniques to teach her dog, Stella, to talk.
[Christina] began her project by implementing a button board which triggers various speech samples when triggered. There are plenty of typical words that a dog may wish to use, like beach, park, and ball – as well as words describing concepts, such as where, later, and come. Over time, she has observed Stella using the button board in various ways, that she claims indicate a deeper understanding and use of language than would normally be ascribed to a dog.
From the outset, [Christina] has been intentional in her methods, being sure to only demonstrate the use of the board to Stella, rather than simply pressing the buttons for her. The experiment has many similarities to the case of Koko the gorilla, known for learning symbols from American Sign Language. The project is also documented on Instagram, where she films Stella using the device and gives interpretations of the meaning of Stella’s button pressing.
Attemping to communicate on a higher level with animals has long been a mysterious and complex pursuit; one which we’re sure to see more of as various technologies continue to improve. We’d love to see a broader scientific study on the use of AAC tools to “talk” to animals. In such matters, context and interpretation play a large role, and thus it’s difficult to truly gauge the quality of understanding an animal may actually have. More research would be great to shed light on these techniques. Video after the break.
Continue reading “Training A Dog To “Speak” With A Sound Board” →
One of the things that makes us human is our ability to communicate. However, a stroke or other medical impairment can take that ability away without warning. Although Stephen Hawking managed to do great things with a computer-aided voice, it took a lot of patience and technology to get there. Composing an e-mail or an utterance for a speech synthesizer using a tongue stick or by blinking can be quite frustrating since most people can only manage about ten words a minute. Conventional speech averages about 150 words per minute. However, scientists recently reported in the journal Nature that they have successfully decoded brain signals into speech directly, which could open up an entirely new world for people who need assistance communicating.
The tech is still only lab-ready, but they claim to be able to produce mostly intelligible sentences using the technique. Previous efforts have only managed to produce single syllables, not entire sentences.
Continue reading “Scientists Create Speech From Brain Signals” →
At the University of Oxford, [Jen Chesters] conducts therapy sessions with thirty men in a randomized clinical trial to test the effects of tDCS on subjects who stutter. Men are approximately four times as likely to stutter and the sex variability of the phenomenon is not being tested. In the randomized sessions, the men and [Jen] are unaware if any current is being applied, or a decoy buzzer is used.
Transcranial Direct Current, tDCS, applies a small current to the brain with the intent of exciting or biasing the region below the electrode. A credit-card sized card is used to apply the current. Typically, tDCS ranges from nine to eighteen volts at two milliamps or less. The power passing through a person’s brain is roughly on par with the kind of laser pointer you should not point straight into your eyeball and is considered “safe,” with quotation marks.
A week after the therapy, conversational fluency and the ability to recite written passages shows improvement over the placebo group which does not show improvement. Six weeks after the therapy, there is still measurable improvement in the ability to read written passages, but sadly, conversational gains are lost.
Many people are on the fence about tDCS and we urge our citizen scientists to exercise all the caution you would expect when sending current through the brain. Or, just don’t do that.