Vintage Speech Synthesizer Croons The Oldies

If you listened to the National Weather Service Weather Radio in the US about 25 years ago, you’ll no doubt remember [Perfect Paul], one of the synthesized voices used to read current conditions and weather forecasts. The voice came from a DECtalk DTC01, a not inexpensive voice synthesizer first made in 1984 that also gave voice to [Stephen Hawking] for many years.

Long obsolete, the DECtalk boxes have a devoted following with hobbyists who like to stretch what the device can do. Some even like to make it sing, after a fashion, and [Michael] decided that making a DECtalk sing “Xanadu”, the theme song from the 1980 [Olivia Newton-John] musical extravaganza, was a good idea. Whether it actually was is debatable, and we’ll take exception with having that particular ditty stuck in our head as a result, but we don’t judge except on the merits of the hack.

It’s actually easy if you have a DECtalk; the song is a straight ASCII file with remarkably concise instructions on which phonemes the box needs to generate. Along with inflection, tone, and timing instructions, the text file looks almost completely unlike English while still somehow being readable. The DECtalk accepts the file over RS-232, which would be easy enough to do with a modern computer, but [Michael] upped his game a bit by using a TRS-80 Model 100 computer as a serial terminal. The synthesized song is in the video below, with the original included for reference by those who didn’t experience endure the late disco-era glory days.

DECtalks seem pretty rare in the wild, so we appreciate this glimpse at what they can do. There are other retro speech synthesizer hacks, though: the simulated walnut goodness of the Votrax and the MicroVox come to mind, as does the venerable TI Speak and Spell.

Continue reading “Vintage Speech Synthesizer Croons The Oldies”

Scientists Create Speech From Brain Signals

One of the things that makes us human is our ability to communicate. However, a stroke or other medical impairment can take that ability away without warning. Although Stephen Hawking managed to do great things with a computer-aided voice, it took a lot of patience and technology to get there. Composing an e-mail or an utterance for a speech synthesizer using a tongue stick or by blinking can be quite frustrating since most people can only manage about ten words a minute. Conventional speech averages about 150 words per minute. However, scientists recently reported in the journal Nature that they have successfully decoded brain signals into speech directly, which could open up an entirely new world for people who need assistance communicating.

The tech is still only lab-ready, but they claim to be able to produce mostly intelligible sentences using the technique. Previous efforts have only managed to produce single syllables, not entire sentences.

Continue reading “Scientists Create Speech From Brain Signals”

Google’s Duplex AI Has Conversation Indistinguishable From Human’s

First Google gradually improved its WaveNet text-to-speech neural network to the point where it sounds almost perfectly human. Then they introduced Smart Reply which suggests possible replies to your emails. So it’s no surprise that they’ve announced an enhancement for Google Assistant called Duplex which can have phone conversations for you.

What is surprising is how well it works, as you can hear below. The first is Duplex calling to book an appointment at a hair salon, and the second is it making reservation’s with a restaurant.

Note that this reverses the roles when talking to a computer on the phone. The computer is the customer who calls the business, and the human is on the business side. The goal of the computer is to book a hair appointment or reserve a table at a restaurant. The computer has to know how to carry out a conversation with the human without the human knowing that they’re talking to a computer. It’s for communicating with all those businesses which don’t have online booking systems but instead use human operators on the phone.

Not knowing that they’re talking to a computer, the human will therefore speak as it would with another human, with all the pauses, “hmm”s and “ah”s, speed, leaving words out, and even changing the context in mid-sentence. There’s also the problem of multiple meanings for a phrase. The “four” in “Ok for four” can mean 4 pm or four people.

The component which decides what to say is a recurrent neural network (RNN) trained on many anonymized phone calls. The input is: the audio, the output from Google’s automatic speech recognition (ASR) software, and context such as the conversation’s history and the parameters of the conversation (e.g. book places at a restaurant, for how many, when), and more.

Producing the speech is done using Google’s text-to-speech technologies, Wavenet and Tacotron. “Hmm”s and “ah”s are inserted for a more natural sound. Timing is also taken into account. “Hello?” gets an immediate response. But they introduce latency when responding to more complex questions since replying too soon would sound unnatural.

There are limitations though. If it decides it can’t complete a task then it hands the conversation over to a human operator. Also, Duplex can’t handle a general conversation. Instead, multiple instances are trained on different domains. So this isn’t the singularity which we’ve talked about before. But if you’re tired of talking to computers at businesses, maybe this will provide a little payback by having the computer talk to the business instead.

On a more serious note, would you want to know if the person you were speaking to was in fact a computer? Perhaps Google should preface each conversation with “Hi! This is Google Assistant calling.” And even knowing that, would you want to have a human conversation with a computer, knowing that it’s “um”s were artificial? This may save time for the person whom the call is on behalf of, but the person being called may wish the computer would be a little more computer-like and speak more efficiently. Let us know your thoughts in the comments below. Or just check out the following Google I/O ’18 keynote presentation video where all this was announced.

Continue reading “Google’s Duplex AI Has Conversation Indistinguishable From Human’s”

DIY Text-to-Speech With Raspberry Pi

We can almost count on our eyesight to fail with age, maybe even past the point of correction. It’s a pretty big flaw if you ask us. So, how can a person with aging eyes hope to continue reading the printed word?

There are plenty of commercial document readers available that convert text to speech, but they’re expensive. Most require a smart phone and/or an internet connection. That might not be as big of an issue for future generations of failing eyes, but we’re not there yet. In the meantime, we have small, cheap computers and plenty of open source software to turn them into document readers.

[rgrokett] built a RaspPi text reader to help an aging parent maintain their independence. In the process, he made a good soup-to-nuts guide to building one. It couldn’t be easier to use—just place the document under the camera and push the button. A Python script makes the Pi take a picture of the text. Then it uses Tesseract OCR to convert the image to plain text, and runs the text through a speech synthesis engine which reads it aloud. The reader is on as long as it’s plugged in, so it’s ready to work at the push of a button. We can probably all appreciate such a low-hassle design. Be sure to check out the demo after the break.

If you wanted to use this to read books, you’d still have to turn the pages yourself. Here’s a BrickPi reader that solves that one.

Continue reading “DIY Text-to-Speech With Raspberry Pi”

Quick Hack Helps ALS Patient Communicate

A diagnosis of amyotrophic lateral sclerosis, or ALS, is devastating. Outlier cases like [Stephen Hawking] notwithstanding, most ALS patients die within four years or so of their diagnosis, after having endured the progressive loss of muscle control that robs them of their ability to walk, to swallow, and even to speak.

Rather than see a friend’s father locked in by his ALS, [Ricardo Andere de Mello] decided to help out by building a one-finger interface to a [Hawking]-esque voice synthesizer on the cheap. Working mainly with what hardware he had on hand, his system lets his friend’s dad flick a finger to operate off-the-shelf assistive communication software running on a laptop. The sensor is an accelerometer velcroed to a fingertip; when a movement threshold is passed, an Arduino sends the laptop an F12 keypress, which is all that’s needed to operate the software. You can watch it in action in the video after the break.

Hats off to [Ricardo] for pitching in and making a difference without breaking the bank. This isn’t the first expedient speech synthesizer we’ve seen for ALS patients — this one does it just three chips, including voice synthesis. Continue reading “Quick Hack Helps ALS Patient Communicate”

MicroVox Puts The 80’s Back Into Your Computer’s Voice

[Monta Elkins] got it in his mind that he wanted to try out an old-style speech synthesizer with the SC-01 (or SC-01A) chip, one that uses phonemes to produce speech. After searching online he found a MicroVox text-to-speech synthesizer from the 1980s based around the chip, and after putting together a makeshift serial cable, he connected it up to an Arduino Uno and tried it out. It has that 8-bit artificial voice that many of us remember fondly and is fairly understandable.

The SC-01, and then the SC-01A, were made by Votrax International, Inc. In addition to the MicroVox, the SC-01 and SC-01A were used in the Heath Hero robot, the VS-100 synthesizer add-on for TRS-80s, various arcade games such as Qbert and Krull, and in a variety of other products. Its input determines which phonemes to play and where it shines is in producing good transitions between them to come up with decent speech, much better than you’d get if you just play the phonemes one after the other.

microvox-manualThe MicroVox has a 25-pin RS-232 serial port as well as a parallel port and a speaker jack. In addition to the SC-01A, it has a 6502 under the hood. [Monta] was lucky to also receive the manual, and what a manual it is! In addition to a list of the supported phonemes and words, it also contains the schematics, parts list and details for the serial port which alone would make for fun reading. We really liked the taped-in note seen in this screenshot. It has a hand-written noted that says “Factory Corrected 10/18/82”.

Following along with [Monta] in the video below, he finds the serial port’s input buffer chip datasheet online and verifies the voltage levels. Next he opens up the case and uses dips switches to set baud rate, data bits, parity, stop bits and so on. After hooking up the speakers, putting together a makeshift cable for RX, TX and ground, and writing a little Arduino code, he sends it text and out comes the speech.

Continue reading “MicroVox Puts The 80’s Back Into Your Computer’s Voice”

Arduino Clock Is HAL 1000

In the movie 2001: A Space Odyssey, HAL 9000 — the neurotic computer — had a birthday in 1992 (for some reason, in the book it is 1997). In the late 1960s, that date sounded impossibly far away, but now it seems like a distant memory. The only thing is, we are only now starting to get computers with voice I/O that are practical and even they are a far cry from HAL.

[GeraldF6] built an Arduino-based clock. That’s nothing new but thanks to a MOVI board (ok, shield), this clock has voice input and output as you can see in the video below. Unlike most modern speech-enabled devices, the MOVI board (and, thus, the clock) does not use an external server in the cloud or any remote processing at all. On the other hand, the speech quality isn’t what you might expect from any of the modern smartphone assistants that talk. We estimate it might be about 1/9 the power of the HAL 9000.

Continue reading “Arduino Clock Is HAL 1000”