Magic Mirror Tirelessly Indulges Children’s Curiousity

May 27, 2018 by Donald Papp 6 Comments

[pepelepoisson]’s Miroir Magique (“Magic Mirror”) is an interesting take on the smart mirror concept; it’s intended to be a playful, interactive learning tool for kids who are at an age where language and interactivity are deeply interesting to them, but whose ceaseless demands for examples of spelling and writing can be equally exhausting. Inspiration came from his own five-year-old, who can neither read nor write but nevertheless has a bottomless fascination with the writing and spelling of words, phrases, and numbers.

The magic is all in the simple interface. Magic Mirror waits for activation (a simple pass of the hand over a sensor) then shows that it is listening. Anything it hears, it then displays on the screen and reads back to the user. From an application perspective it’s fairly simple, but what’s interesting is the use of speech-to-text and text-to-speech functions not as a means to an end, but as an end in themselves. A mirror in more ways than one, it listens and repeats back, while writing out what it hears at the same time. For its intended audience of curious children fascinated by the written and spoken aspects of language, it’s part interactive toy and part learning tool.

Like most smart mirror projects the technological elements are all hidden; the screen is behind a one-way mirror, speakers are out of sight, and the only inputs are a gesture sensor and a microphone embedded into the frame. Thus equipped, the mirror can tirelessly humor even the most demanding of curious children.

[pepelepoisson] explains some of the technical aspects on the project page (English translation link here) and all the code and build details are available (in French) on the project’s GitHub repository. Embedded below is a demonstration of the Magic Mirror, first in French then switching to English.

Continue reading “Magic Mirror Tirelessly Indulges Children’s Curiousity” →

Control Alexa Echo From Anywhere In The World

March 9, 2017 by Anool Mahidharia 16 Comments

If you are not within ear-shot of your Alexa Echo, Dot or Tap device and need to command it from anywhere in the world, you’d most likely use the handy mobile app or web interface to control it. For some strange reason, if you’d rather use voice commands from anywhere in the world, you can still do it using apps such as Alexa Listens or Reverb, among many others. We’d be the first ones to call these out and say “It’s not a hack”. But [pat dhens] approach is above reproach! He has posted details on how to Remote Control the Alexa Echo from Anywhere in the World. Short version of the hack — he’s using a Raspberry Pi with a speaker attached to it which commands his Alexa Tap using a text-to-speech converter program.

The long version is short as well. The user uses a VPN, such as OpenVPN, to log in to their home network where the Alexa device is located. Then, use VNC to connect to the Raspberry Pi to access its shell. Finally, the user issues a text command which is converted to speech by the ‘festival‘ program on the Raspberry Pi. The output goes to an external speaker via the Raspberry Pi’s 3.5 mm audio out jack. And that’s all there is to it. You’ve just issued a voice command to your Alexa from across the world.

Maybe it will save your vocal chords from damage due to excessive hollering, we guess. He’s even made a short video to prove that it works. Now all it needs is a microphone to listen to Alexa, convert speech-to-text, and then transmit it back to you across the world to complete the cycle.

We’re not sure, but he thinks this hack will lead him to world domination. Good Luck with that.

Continue reading “Control Alexa Echo From Anywhere In The World” →

Talking Neural Nets

December 3, 2016 by Al Williams 30 Comments

Speech synthesis is nothing new, but it has gotten better lately. It is about to get even better thanks to DeepMind’s WaveNet project. The Alphabet (or is it Google?) project uses neural networks to analyze audio data and it learns to speak by example. Unlike other text-to-speech systems, WaveNet creates sound one sample at a time and affords surprisingly human-sounding results.

Before you rush to comment “Not a hack!” you should know we are seeing projects pop up on GitHub that use the technology. For example, there is a concrete implementation by [ibab]. [Tomlepaine] has an optimized version. In addition to learning English, they successfully trained it for Mandarin and even to generate music. If you don’t want to build a system out yourself, the original paper has audio files (about midway down) comparing traditional parametric and concatenative voices with the WaveNet voices.

Another interesting project is the reverse path — teaching WaveNet to convert speech to text. Before you get too excited, though, you might want to note this quote from the read me file:

“We’ve trained this model on a single Titan X GPU during 30 hours until 20 epochs and the model stopped at 13.4 ctc loss. If you don’t have a Titan X GPU, reduce batch_size in the train.py file from 16 to 4.”

Last time we checked, you could get a Titan X for a little less than $2,000.

There is a multi-part lecture series on reinforced learning (the foundation for DeepMind). If you wanted to tackle a project yourself, that might be a good starting point (the first part appears below).

Continue reading “Talking Neural Nets” →

How To Upgrade Jasper’s Voice Recognition With AT&T’s Speech-to-Text API

June 7, 2014 by Rick Osgood 9 Comments

Jarvis upgrade

Jasper is an open-source platform for developing always-on voice-controlled applications — you talk and your electronics listen! It’s designed to run on a Raspberry Pi. [Zach] has been playing around with it and wasn’t satisfied with Jasper’s built-in speech-to-text recognition system. He decided to take the advice of the Jasper development team and modify the system to use AT&T’s speech-to-text engine.

The built-in system works, but it has limitations. Mainly, you have to specify exactly which keywords you want Jasper to look out for. This can be problematic if you aren’t sure what the user is going to say. It can also cause problems when there are many possibilities of what the user might say. For example if the user is going to say a number between one and one hundred, you don’t want to have to type out all one hundred numbers into the voice recognition system in order to make it work.

The Jasper FAQ does recommend using the AT&T’s speech-to-text engine in this situation but this has its own downsides. You are limited to only one request per second and it’s also slower to recognize the speech. [Zach] was just fine with these restrictions but he couldn’t find much information online about how to modify Jasper to make the AT&T engine work. Now that he’s gotten it functional, he shared his work to make it easier for others.

The modification first requires that you have at AT&T developer account. Once that’s setup, you need to make some changes to Jasper’s mic.py module. That’s the only part of Jasper’s core that must be changed, and it’s only a few lines of code. Outside of that, there are a couple of other Python scripts that need to be added. We won’t go into the finer details here since [Zach] goes into great detail on his own page, including the complete scripts. If you are interested in using the AT&T module with your Jasper installation, be sure to check out [Zach’s] work. He will likely save you a lot of time.

PhoneTag Helps You Read Your Voicemail

September 15, 2009 by Chris Gilmer 14 Comments

Have you ever been too busy to check in with your voicemail service? PhoneTag might have the solution for you.

Some of us might have done it before, let voicemails pile up if we know nothing urgent or important is coming down the pipes. Wouldn’t it be much simpler and more convenient if those voicemails played by our rules? PhoneTag is a speech to text service that converts a voicemail into text and sends it via email or SMS which you can read through and reference at will. The accuracy on this type of service is usually pretty good, but some translation is required as spoken words can sometimes be misinterpreted depending on the clarity of the call. On the security side of things, we tend to be a little hesitant of personal and business voicemails running through an extra service. PhoneTag does state that they use some kind of “special algorithm” that will guarantee voicemails are secure and private.

While there is a free trial period, this service is going to cost you. You can sign up for anything from a per message price of $.35 to an unlimited plan of $29.95/month. You are going to have to do your own calculations here to see if this is the best way to go, but this will save you from using your monthly minutes for checking the voicemails in your mailbox. As alternatives, Google Voice offers the same service for free and SpinVox charges a fee per use.