AI is currently popular, so [Chirs Lam] figured he’d stimulate some interest in amateur radio by using it to pull call signs from radio signals processed using SDR. As you’ll see, the AI did just okay so [Chris] augmented it with an algorithm invented for gene sequencing.
His experiment was simple enough. He picked up a Baofeng handheld radio transceiver to transmit messages containing a call sign and some speech. He then used a 0.5 meter antenna to receive it and a little connecting hardware and a NooElec SDR dongle to get it into his laptop. There he used SDRSharp to process the messages and output a WAV file. He then passed that on to the AI, Google’s Cloud Speech-to-Text service, to convert it to text.
Despite speaking his words one at a time and making an effort to pronounce them clearly, the result wasn’t great. In his example, only the first two words of the call sign and actual message were correct. Perhaps if the AI had been trained on actual off-air conversations with background noise, it would have been done better. It’s not quite the same issue, but we’re reminded of those MIT researchers who fooled Google’s Inception image recognizer into thinking that a turtle was a gun.
Rather than train his own AI, [Chris’s] clever solution was to turn to the Smith-Waterman algorithm. This is the same algorithm used for finding similar nucleic acid sequences when analyzing genes. It allowed him to use a list of correct call signs to find the best match for what the AI did come up with. As you can see in the video below, it got the call signs right.
There was a time when the average person was worried about the government or big corporations listening in on their every word. It was a quaint era, full of whimsy and superstition. Today, a good deal of us are paying for the privilege to have constantly listening microphones in multiple rooms of our house, largely so we can avoid having to use our hands to turn the lights on and off. Amazing what a couple years and a strong advertising push can do.
So if we’re going to be funneling everything we say to one or more of our corporate overlords anyway, why not make it fun? For example, check out this speech-to-image necklace developed by [Stephanie Nemeth]. As you speak, the necklace listens in and finds (usually) relevant images to display. Conceptually this could be used as an assistive communication technology, but we’re cool with it being a meme display device for now.
Hardware wise, the necklace is just a Raspberry Pi 3, a USB microphone, and a HyperPixel 4.0 touch screen. The Pi Zero would arguably be the better choice for hanging around your neck, but [Stephanie] notes that there’s some compatibility issues with Node.js on the Zero’s ARM6 processor. She details a workaround, but says there’s no guarantee it will work with her code.
[pepelepoisson]’s Miroir Magique (“Magic Mirror”) is an interesting take on the smart mirror concept; it’s intended to be a playful, interactive learning tool for kids who are at an age where language and interactivity are deeply interesting to them, but whose ceaseless demands for examples of spelling and writing can be equally exhausting. Inspiration came from his own five-year-old, who can neither read nor write but nevertheless has a bottomless fascination with the writing and spelling of words, phrases, and numbers.
The magic is all in the simple interface. Magic Mirror waits for activation (a simple pass of the hand over a sensor) then shows that it is listening. Anything it hears, it then displays on the screen and reads back to the user. From an application perspective it’s fairly simple, but what’s interesting is the use of speech-to-text and text-to-speech functions not as a means to an end, but as an end in themselves. A mirror in more ways than one, it listens and repeats back, while writing out what it hears at the same time. For its intended audience of curious children fascinated by the written and spoken aspects of language, it’s part interactive toy and part learning tool.
Like most smart mirror projects the technological elements are all hidden; the screen is behind a one-way mirror, speakers are out of sight, and the only inputs are a gesture sensor and a microphone embedded into the frame. Thus equipped, the mirror can tirelessly humor even the most demanding of curious children.
[pepelepoisson] explains some of the technical aspects on the project page (English translation link here) and all the code and build details are available (in French) on the project’s GitHub repository. Embedded below is a demonstration of the Magic Mirror, first in French then switching to English.
If you are not within ear-shot of your Alexa Echo, Dot or Tap device and need to command it from anywhere in the world, you’d most likely use the handy mobile app or web interface to control it. For some strange reason, if you’d rather use voice commands from anywhere in the world, you can still do it using apps such as Alexa Listens or Reverb, among many others. We’d be the first ones to call these out and say “It’s not a hack”. But [pat dhens] approach is above reproach! He has posted details on how to Remote Control the Alexa Echo from Anywhere in the World. Short version of the hack — he’s using a Raspberry Pi with a speaker attached to it which commands his Alexa Tap using a text-to-speech converter program.
The long version is short as well. The user uses a VPN, such as OpenVPN, to log in to their home network where the Alexa device is located. Then, use VNC to connect to the Raspberry Pi to access its shell. Finally, the user issues a text command which is converted to speech by the ‘festival‘ program on the Raspberry Pi. The output goes to an external speaker via the Raspberry Pi’s 3.5 mm audio out jack. And that’s all there is to it. You’ve just issued a voice command to your Alexa from across the world.
Maybe it will save your vocal chords from damage due to excessive hollering, we guess. He’s even made a short video to prove that it works. Now all it needs is a microphone to listen to Alexa, convert speech-to-text, and then transmit it back to you across the world to complete the cycle.
We’re not sure, but he thinks this hack will lead him to world domination. Good Luck with that.
Speech synthesis is nothing new, but it has gotten better lately. It is about to get even better thanks to DeepMind’s WaveNet project. The Alphabet (or is it Google?) project uses neural networks to analyze audio data and it learns to speak by example. Unlike other text-to-speech systems, WaveNet creates sound one sample at a time and affords surprisingly human-sounding results.
Before you rush to comment “Not a hack!” you should know we are seeing projects pop up on GitHub that use the technology. For example, there is a concrete implementation by [ibab]. [Tomlepaine] has an optimized version. In addition to learning English, they successfully trained it for Mandarin and even to generate music. If you don’t want to build a system out yourself, the original paper has audio files (about midway down) comparing traditional parametric and concatenative voices with the WaveNet voices.
Another interesting project is the reverse path — teaching WaveNet to convert speech to text. Before you get too excited, though, you might want to note this quote from the read me file:
“We’ve trained this model on a single Titan X GPU during 30 hours until 20 epochs and the model stopped at 13.4 ctc loss. If you don’t have a Titan X GPU, reduce batch_size in the train.py file from 16 to 4.”
Last time we checked, you could get a Titan X for a little less than $2,000.
There is a multi-part lecture series on reinforced learning (the foundation for DeepMind). If you wanted to tackle a project yourself, that might be a good starting point (the first part appears below).
Jasper is an open-source platform for developing always-on voice-controlled applications — you talk and your electronics listen! It’s designed to run on a Raspberry Pi. [Zach] has been playing around with it and wasn’t satisfied with Jasper’s built-in speech-to-text recognition system. He decided to take the advice of the Jasper development team and modify the system to use AT&T’s speech-to-text engine.
The built-in system works, but it has limitations. Mainly, you have to specify exactly which keywords you want Jasper to look out for. This can be problematic if you aren’t sure what the user is going to say. It can also cause problems when there are many possibilities of what the user might say. For example if the user is going to say a number between one and one hundred, you don’t want to have to type out all one hundred numbers into the voice recognition system in order to make it work.
The Jasper FAQ does recommend using the AT&T’s speech-to-text engine in this situation but this has its own downsides. You are limited to only one request per second and it’s also slower to recognize the speech. [Zach] was just fine with these restrictions but he couldn’t find much information online about how to modify Jasper to make the AT&T engine work. Now that he’s gotten it functional, he shared his work to make it easier for others.
The modification first requires that you have at AT&T developer account. Once that’s setup, you need to make some changes to Jasper’s mic.py module. That’s the only part of Jasper’s core that must be changed, and it’s only a few lines of code. Outside of that, there are a couple of other Python scripts that need to be added. We won’t go into the finer details here since [Zach] goes into great detail on his own page, including the complete scripts. If you are interested in using the AT&T module with your Jasper installation, be sure to check out [Zach’s] work. He will likely save you a lot of time.
Have you ever been too busy to check in with your voicemail service? PhoneTag might have the solution for you.
Some of us might have done it before, let voicemails pile up if we know nothing urgent or important is coming down the pipes. Wouldn’t it be much simpler and more convenient if those voicemails played by our rules? PhoneTag is a speech to text service that converts a voicemail into text and sends it via email or SMS which you can read through and reference at will. The accuracy on this type of service is usually pretty good, but some translation is required as spoken words can sometimes be misinterpreted depending on the clarity of the call. On the security side of things, we tend to be a little hesitant of personal and business voicemails running through an extra service. PhoneTag does state that they use some kind of “special algorithm” that will guarantee voicemails are secure and private.
While there is a free trial period, this service is going to cost you. You can sign up for anything from a per message price of $.35 to an unlimited plan of $29.95/month. You are going to have to do your own calculations here to see if this is the best way to go, but this will save you from using your monthly minutes for checking the voicemails in your mailbox. As alternatives, Google Voice offers the same service for free and SpinVox charges a fee per use.