Speech synthesis is nothing new, but it has gotten better lately. It is about to get even better thanks to DeepMind’s WaveNet project. The Alphabet (or is it Google?) project uses neural networks to analyze audio data and it learns to speak by example. Unlike other text-to-speech systems, WaveNet creates sound one sample at a time and affords surprisingly human-sounding results.
Before you rush to comment “Not a hack!” you should know we are seeing projects pop up on GitHub that use the technology. For example, there is a concrete implementation by [ibab]. [Tomlepaine] has an optimized version. In addition to learning English, they successfully trained it for Mandarin and even to generate music. If you don’t want to build a system out yourself, the original paper has audio files (about midway down) comparing traditional parametric and concatenative voices with the WaveNet voices.
Another interesting project is the reverse path — teaching WaveNet to convert speech to text. Before you get too excited, though, you might want to note this quote from the read me file:
“We’ve trained this model on a single Titan X GPU during 30 hours until 20 epochs and the model stopped at 13.4 ctc loss. If you don’t have a Titan X GPU, reduce batch_size in the train.py file from 16 to 4.”
Last time we checked, you could get a Titan X for a little less than $2,000.
There is a multi-part lecture series on reinforced learning (the foundation for DeepMind). If you wanted to tackle a project yourself, that might be a good starting point (the first part appears below).
Continue reading “Talking Neural Nets”
It isn’t easy communicating when you have any form of speech impairment. In such cases, a Speech-generating device (SGD) becomes essential to help you talk to the world. When coupled with other ailments that limit body movement, the problem becomes worse. How do you type on a keyboard when you can’t move your hands, and how do you talk when your voice box doesn’t work. Well known Scientist Stephen Hawking has been battling this since 1985. Back then, it took a lot of hardware to build a text entry interface and a text to speech device allowing him to communicate.
But [Marquis de Geek] did a quick hack using just a few parts to make a Voice Box that sounds like Stephen Hawking. Using an arcade push button to act as a single button keyboard, an Arduino, a 74HC595 shift register, a 2-line LCD, and the SP0256 hooked to an audio amplifier / speaker, he built the stand-alone speech synthesizer which sounds just like the voice box that Stephen Hawking uses. Although Dr. Hawking’s speech hardware is quite complex, [Marquis de Geek]’s hack shows that it’s possible to have similar results using off the shelf parts for a low cost solution.
There aren’t a lot of those SP0256-AL2 chips around. We found just a couple of retailers with small stock levels, so if you want to make one of these voice boxes, better grab those chips while they last. The character entry is not quick, requiring several button presses to get to the character you want to select. But it makes things easier for someone who cannot move their hands or use all fingers. A lot of kids grew up using Speak and Spell, but the hardware inside that box wasn’t the easiest to hack into. For a demo of [Marquis de Geek]’s homemade Hawking voice box, check the video below.
Continue reading “Making a Homemade Stephen Hawking”
History and [Bil Herd] teaches us that Commodore begged, borrowed, or stole the engineers responsible for the Speak & Spell to add voice synthesis to a few of the computers that came after the C64. This didn’t quite work out in practice, but speech synthesis was something that was part of the Commodore scene for a long time. The Votrax Type ‘n Talk was a stand-alone speech synthesizer that plugged into the expansion port of the VIC-20. It was expensive, rare, but a few games supported it. [Jan] realized the state of speech synthesis has improved tremendously over the last 30 years, and decided to give his VIC a voice with the help of a cheap Android phone.
A few VIC-20 games, including [Scott Adams] adventure games, worked with the Votrax speech synthesizer by sending phonemes as text over the expansion port. From there, the Votrax would take care of assembling everything into something intelligible, requiring no overhead on the VIC-20. [Jan] realized since the VIC is just spitting out characters for each phoneme, he could redirect those words to a better, more modern voice synthesizer.
A small Bluetooth module was wired up to the user port on the VIC, and this module was paired with a cheap Android smartphone. The smartphone receives the serial stream from an adventure game, and speaks the descriptions of all the scenes in these classic adventure games.
It’s a unique experience judging from the video, but the same hardware and software can also be added to any program that will run on the VIC-20, C64, and C128. Video below.
Continue reading “An Adventure into Android Makes the VIC-20 Speak”
This is the under-the-hood view of the keyboard for the Voder (Voice Operating Demonstrator), the first electronic device capable of generating continuous human speech. It accomplishes this feat through a series of keys that generate the syllables, plosives, and affricatives normally produced by the human larynx and shaped by the throat and tongue. This week’s film is a picture montage paired with the audio from the demonstration of the Voder at the 1939 World’s Fair.
The Voder was created by one [Homer Dudley] at Bell Laboratories. He did so in conjunction with the Vocoder, which analyzes human-generated speech for encrypted transfer and re-synthesizes it on the other end. [Dudley] spent over 40 years researching speech at Bell Laboratories. His development of both the Voder and the Vocoder were instrumental in the SIGSALY project which aimed to deliver encrypted voice communication to the theatres of WWII.
Continue reading “Retrotechtacular: The Voder from Bell Labs”
[Aditya] had a project that called for spoken output. He admits that he could have built a PC-based solution, but he found that adding speech by using a microcontroller was not only a cheap and portable alternative, it was also a fun and easy build.
His design uses an ATMega128. Many microcontrollers would work, but his major requirements were PWM generation and plenty of memory to store the file(s). The output is cleaned up in a simple low pass filter before going to the 8Ω speaker.
[Aditya] lays his tracks in WAV format and then compresses it to 8-bit/8kHz. He found a C++ function that converts the track data into a huge arrays and then digitizes it. He uses two timers, one to generate the waveform and second one to time the square wave. [Aditya] has a zip of samples available on his site that will speak the digits 0-9.
You know Halloween is coming around when the tweet reading skulls start popping up. [Marc] wanted to bring the Halloween spirit into his workplace, so he built “Yorick”. In case you’re worried, no humans were harmed (or farmed for parts) in the creation of this hack. Yorick started life as an anatomical skull model, the type one might find in a school biology lab. Yorick’s skull provided a perfect enclosure for not one but two brains.
A Raspberry Pi handles his higher brain function. The Pi uses the Twitter API to scan for tweets to @wedurick. Once a tweet is found, it is sent to Google’s translate server. A somewhat well-known method of performing text to speech with Google translate is the next step. The procedure is simple: sending “http://translate.google.com/translate_tts?tl=en&q=hackaday” will return an MP3 file of the audio. To get a British accent, simply change to google.co.uk.
The Pi pipes the audio to a speaker, and to the analog input pin of an Arduino, which handles Yorick’s lower brain functions. The Arduino polls the audio in a tight loop. An average of the last 3 samples is computed and mapped to a servo position. This results in an amazingly realistic and automatic mouth movement. We think this is the best part of the hack.
It wouldn’t’ be fair for [Marc] to keep the fruits of his labors to himself, so Yorick now has his own Livestream channel. Click past the break to hear Yorick’s opinion on the Hack A Day comments section! Have we mentioned that we love pandering?
Continue reading “Alas, Poor Yorick! I Tweeted Him”
Back in 1991, a young [Backwoods Engineer] and his new wife went to a Valentines day get together. One of the conditions of the shindig was having the guys make – not buy – a Valentines day card. Go big or go home, he though, and after a few days he had a talking Valentines day card that would become one of his wife’s most treasured possessions.
The early 90s were a different time; in case you haven’t yet been made to feel very old yet today, 1991 is closer to 1970 than 2013 is to 1991. Likewise, the circuitry inside this heartfelt talking token of appreciation bears more resemblance to something from a 1970s electronics magazine than an Arduino project of today.
The project is powered by an old Intel MCS-48 microcontroller attached to one of the old speech synthesis chips Radio Shack used to sell. These are, in turn, connected to a programmable logic chip and a masked ROM that translates English words into phonemes for the speech synthesizer.
The entire device is constructed on a hacked up piece of perf board and a few wire wrap sockets; sturdy construction, even if the battery compartment has been replaced a few times.
As for what the talking valentine says? “”OK! Hello, I am a Talking Valentine Card. “Love Is A Many-Splendored Thing” and in this case also needs batteries!” You can check that out after the break.
Continue reading “Speech synthesizing valentine from 1991”