DIY Text-to-Speech With Raspberry Pi

We can almost count on our eyesight to fail with age, maybe even past the point of correction. It’s a pretty big flaw if you ask us. So, how can a person with aging eyes hope to continue reading the printed word?

There are plenty of commercial document readers available that convert text to speech, but they’re expensive. Most require a smart phone and/or an internet connection. That might not be as big of an issue for future generations of failing eyes, but we’re not there yet. In the meantime, we have small, cheap computers and plenty of open source software to turn them into document readers.

[rgrokett] built a RaspPi text reader to help an aging parent maintain their independence. In the process, he made a good soup-to-nuts guide to building one. It couldn’t be easier to use—just place the document under the camera and push the button. A Python script makes the Pi take a picture of the text. Then it uses Tesseract OCR to convert the image to plain text, and runs the text through a speech synthesis engine which reads it aloud. The reader is on as long as it’s plugged in, so it’s ready to work at the push of a button. We can probably all appreciate such a low-hassle design. Be sure to check out the demo after the break.

If you wanted to use this to read books, you’d still have to turn the pages yourself. Here’s a BrickPi reader that solves that one.

Continue reading “DIY Text-to-Speech With Raspberry Pi”

Stephen Hawking Forecasts The Weather

Stephen Hawking, although unable to speak himself, is immediately recognizable by his voice which is provided through a computer and a voice emulator. What may come as a surprise to some is that this voice emulator, the Emic2, has been used by many people, and is still around today and available for whatever text-to-speech projects you are working on. As a great example of this, [TegwynTwmffat] has built a weather forecasting station using an Emic2 voice module to provide audible weather alerts.

Besides the unique voice, the weather center is a high quality build on its own. An Arduino Mega 2560 equipped with a GPRS module is able to pull weather information once an hour. After the voice module was constructed (which seems like a project in itself) its relatively straightforward to pass the information from the Arduino over to the module and have it start announcing the weather. It can even be programmed to sing the weather to you!

All of the code that [TegwynTwmffat] used to build this is available on the project site if you’re curious about building your own Emic2 voice system. It’s also worth noting that GPRS is available to pretty much anyone and is a relatively simple system to start using to do things like pull weather information from, but you could also use it to roll out your own private cell phone network with the right equipment and licensing.

Talking Neural Nets

Speech synthesis is nothing new, but it has gotten better lately. It is about to get even better thanks to DeepMind’s WaveNet project. The Alphabet (or is it Google?) project uses neural networks to analyze audio data and it learns to speak by example. Unlike other text-to-speech systems, WaveNet creates sound one sample at a time and affords surprisingly human-sounding results.

Before you rush to comment “Not a hack!” you should know we are seeing projects pop up on GitHub that use the technology. For example, there is a concrete implementation by [ibab]. [Tomlepaine] has an optimized version. In addition to learning English, they successfully trained it for Mandarin and even to generate music. If you don’t want to build a system out yourself, the original paper has audio files (about midway down) comparing traditional parametric and concatenative voices with the WaveNet voices.

Another interesting project is the reverse path — teaching WaveNet to convert speech to text. Before you get too excited, though, you might want to note this quote from the read me file:

“We’ve trained this model on a single Titan X GPU during 30 hours until 20 epochs and the model stopped at 13.4 ctc loss. If you don’t have a Titan X GPU, reduce batch_size in the train.py file from 16 to 4.”

Last time we checked, you could get a Titan X for a little less than $2,000.

There is a multi-part lecture series on reinforced learning (the foundation for DeepMind). If you wanted to tackle a project yourself, that might be a good starting point (the first part appears below).

Continue reading “Talking Neural Nets”

A DIY, Visual Alexa

Talking to computers is all the rage right now. We are accustomed to using voice to communicate with each other, so that makes sense. However, there’s a distinct difference between talking to a human over a phone line and conversing face-to-face. You get a lot of visual cues in person compared to talking over a phone or radio.

Today, most voice-enabled systems are like taking to a computer over the phone. It gets the job done, but you don’t always get the most benefit. To that end, [Youness] decided to marry an OLED display to his Alexa to give visual feedback about the current state of Alexa. It is a work in progress, but you can see two incarnations of the idea in the videos below.

A Raspberry Pi provides the horsepower and the display. A Python program connects to the Alexa Voice Service (AVS) to understand what to do. AVS provides several interfaces for building voice-enabled applications:

  • Speech Recognition/Synthesis – Understand and generate speech.
  • Alerts – Deal with events such as timers or a user utterance.
  • AudioPlayer – Manages audio playback.
  • PlaybackController – Manages playback queue.
  • Speaker – Controls volume control.
  • System – Provides client information to AVS.

We’ve seen AVS used to create an Echo clone (in a retro case, though). We also recently looked at the Google speech API on the Raspberry Pi.

Continue reading “A DIY, Visual Alexa”

Raspberry Pi Want A Cracker?

If you watch the old original Star Trek, you’ll notice that the computers on board the Enterprise don’t look much like our computers (unless you count the little 3.5 inch floppies that looked pretty close to the real thing). Then again, the Enterprise didn’t need keyboards and screens since the computers did a pretty good job of listening and speaking to humans.

We aren’t quite to the point where you can just ask the computer some fuzzy open-ended question like Captain Kirk did, but we do have things like Echo, Siri, and Google Now that do a fair job of listening to you and replying. In fact, Google provides an API that can do speech recognition and generation. [Giulio] used some common Python libraries to add speech I/O to a Raspberry Pi.

Continue reading “Raspberry Pi Want A Cracker?”

Talking Star Trek

Speech generation and recognition have come a long way. It wasn’t that long ago that we were in a breakfast place and endured 30 minutes of a teenaged girl screaming “CALL JUSTIN TAYLOR!” into her phone repeatedly, with no results. Now speech on phones is good enough you might never use the keyboard unless you want privacy. Every time we ask Google or Siri a question and get an answer it makes us feel like we are living in Star Trek.

[Smcameron] probably feels the same way. He’s been working on a Star Trek-inspired bridge simulator called “Space Nerds in Space” for some time. He decided to test out the current state of Linux speech support by adding speech commands and response to it. You can see the results in the video below.

Continue reading “Talking Star Trek”

Mobile Text Reader With OCR And Text To Speech

There are devices out there that will magnify text using fancy cameras and displays, devices that will convert these to Braille, and text-to-speech software has been around for thirty years. For his entry into our Raspberry Pi Zero contest, [Markus] decided to combine all these ideas into a simple device that will turn the printed word into speech.

The impetus for [Markus]’ project came to him in the form of a group of blind computer science  students. These students used a specialized program that used specialized hardware and software such as mobile Braille terminals, OCR, and oral exams that allowed these students to study the same thing as everyone else. [Markus] wanted to produce something similar, using simple text-to-speech software instead of a complicated Braille display.

The physical design of [Markus]’ project is uniquely functional – a hand-held device with a camera up front, a Pi in the middle, and a speaker and headphone jack on the back. The hand grip includes a large battery and a trigger for telling the Pi to read a few words aloud.

The software is built around the SnapPicam and includes a lot of the functionality already needed. OCR is largely a solved problem with Tesseract, and text-to-speech is easy with Festival.

Although [Markus] is just plugging a few existing software modules together, he’s come up with a device that is certainly unique and could be exceptionally useful to anyone with a vision impairment.


Raspberry_Pi_LogoSmall

The Raspberry Pi Zero contest is presented by Hackaday and Adafruit. Prizes include Raspberry Pi Zeros from Adafruit and gift cards to The Hackaday Store!
See All the Entries