We all need someone to talk to sometimes, and the pandemic has only made matters worse when it comes to the number of people living with anxiety and depression. Exchanging the simplest of pleasantries can make you feel whole again, but the masks make it hard to engage with strangers and judge their emotions, so your big trip to the grocery store can make you feel lonely in a crowd.
So you go back home, still feeling lonely, and maybe you turn on the TV. Watching people interact is probably the next best thing to actual interaction, and it might even make you laugh. But have you ever wished you could talk to the people on TV? With [aniketdhole]’s EMOJO chatbot, you’ll feel as though you’re among friends. And technically you are — all the dialogue is from the TV show Friends.
In Castaway, Tom Hanks didn’t give that volleyball a frowny face, now did he? Nor does he have a dopey grin. Instead, he wears a wry smile that suggests depth of character and a grasp of the dire situation at hand. But now we have emoji, and they do a pretty good job of conveying and evoking emotion. EMOJO is a visual chatbot that uses voice and emoji to make easy, two-way conversation to help chase the loneliness away. It uses a Raspberry Pi and a TFT display to take voice input from a Bluetooth headset, convert it to text, and then respond in kind with both voice and text. It was a finalist in the rethink displays round of the Hackaday Prize, and we can’t wait to see how its character develops. Be sure to check out the demo after the break.
The team behind [8 Bits and a Byte] have built a talking toaster. More accurately, they retrofitted their existing toaster with some hardware components to make it appear to talk and get angry at its users. While the actual toaster functionality isn’t necessary for the build, it certainly allows the project to have a more whimsical vibe.
The project uses a Raspberry Pi 3 and a Google AIY kit, consisting of a HAT, microphone, and speaker. Servos control the movement of the toaster’s eyebrows with the help of the HAT. Some decorative materials in the form of googly eyes and pipe cleaners help bring other features of the talking toaster to life.
The control flow for the chatbot makes use of Google’s speech-to-text for picking up text from audio input, the Dialogflow API to match intent, and Text-to-Speech to pipeline possible answer back to the Raspberry Pi to play over a speaker. They also used Remo.tv to broadcast live updates from the toaster to anyone on an online feed, allowing users in a chatroom to talk directly to Ted.
While Ted’s communications may be quite limited, there’s certainly no limit to the number of interactions he’ll be having online now!
AI is currently popular, so [Chirs Lam] figured he’d stimulate some interest in amateur radio by using it to pull call signs from radio signals processed using SDR. As you’ll see, the AI did just okay so [Chris] augmented it with an algorithm invented for gene sequencing.
His experiment was simple enough. He picked up a Baofeng handheld radio transceiver to transmit messages containing a call sign and some speech. He then used a 0.5 meter antenna to receive it and a little connecting hardware and a NooElec SDR dongle to get it into his laptop. There he used SDRSharp to process the messages and output a WAV file. He then passed that on to the AI, Google’s Cloud Speech-to-Text service, to convert it to text.
Despite speaking his words one at a time and making an effort to pronounce them clearly, the result wasn’t great. In his example, only the first two words of the call sign and actual message were correct. Perhaps if the AI had been trained on actual off-air conversations with background noise, it would have been done better. It’s not quite the same issue, but we’re reminded of those MIT researchers who fooled Google’s Inception image recognizer into thinking that a turtle was a gun.
Rather than train his own AI, [Chris’s] clever solution was to turn to the Smith-Waterman algorithm. This is the same algorithm used for finding similar nucleic acid sequences when analyzing genes. It allowed him to use a list of correct call signs to find the best match for what the AI did come up with. As you can see in the video below, it got the call signs right.
There was a time when the average person was worried about the government or big corporations listening in on their every word. It was a quaint era, full of whimsy and superstition. Today, a good deal of us are paying for the privilege to have constantly listening microphones in multiple rooms of our house, largely so we can avoid having to use our hands to turn the lights on and off. Amazing what a couple years and a strong advertising push can do.
So if we’re going to be funneling everything we say to one or more of our corporate overlords anyway, why not make it fun? For example, check out this speech-to-image necklace developed by [Stephanie Nemeth]. As you speak, the necklace listens in and finds (usually) relevant images to display. Conceptually this could be used as an assistive communication technology, but we’re cool with it being a meme display device for now.
Hardware wise, the necklace is just a Raspberry Pi 3, a USB microphone, and a HyperPixel 4.0 touch screen. The Pi Zero would arguably be the better choice for hanging around your neck, but [Stephanie] notes that there’s some compatibility issues with Node.js on the Zero’s ARM6 processor. She details a workaround, but says there’s no guarantee it will work with her code.
[pepelepoisson]’s Miroir Magique (“Magic Mirror”) is an interesting take on the smart mirror concept; it’s intended to be a playful, interactive learning tool for kids who are at an age where language and interactivity are deeply interesting to them, but whose ceaseless demands for examples of spelling and writing can be equally exhausting. Inspiration came from his own five-year-old, who can neither read nor write but nevertheless has a bottomless fascination with the writing and spelling of words, phrases, and numbers.
The magic is all in the simple interface. Magic Mirror waits for activation (a simple pass of the hand over a sensor) then shows that it is listening. Anything it hears, it then displays on the screen and reads back to the user. From an application perspective it’s fairly simple, but what’s interesting is the use of speech-to-text and text-to-speech functions not as a means to an end, but as an end in themselves. A mirror in more ways than one, it listens and repeats back, while writing out what it hears at the same time. For its intended audience of curious children fascinated by the written and spoken aspects of language, it’s part interactive toy and part learning tool.
Like most smart mirror projects the technological elements are all hidden; the screen is behind a one-way mirror, speakers are out of sight, and the only inputs are a gesture sensor and a microphone embedded into the frame. Thus equipped, the mirror can tirelessly humor even the most demanding of curious children.
[pepelepoisson] explains some of the technical aspects on the project page (English translation link here) and all the code and build details are available (in French) on the project’s GitHub repository. Embedded below is a demonstration of the Magic Mirror, first in French then switching to English.
If you are not within ear-shot of your Alexa Echo, Dot or Tap device and need to command it from anywhere in the world, you’d most likely use the handy mobile app or web interface to control it. For some strange reason, if you’d rather use voice commands from anywhere in the world, you can still do it using apps such as Alexa Listens or Reverb, among many others. We’d be the first ones to call these out and say “It’s not a hack”. But [pat dhens] approach is above reproach! He has posted details on how to Remote Control the Alexa Echo from Anywhere in the World. Short version of the hack — he’s using a Raspberry Pi with a speaker attached to it which commands his Alexa Tap using a text-to-speech converter program.
The long version is short as well. The user uses a VPN, such as OpenVPN, to log in to their home network where the Alexa device is located. Then, use VNC to connect to the Raspberry Pi to access its shell. Finally, the user issues a text command which is converted to speech by the ‘festival‘ program on the Raspberry Pi. The output goes to an external speaker via the Raspberry Pi’s 3.5 mm audio out jack. And that’s all there is to it. You’ve just issued a voice command to your Alexa from across the world.
Maybe it will save your vocal chords from damage due to excessive hollering, we guess. He’s even made a short video to prove that it works. Now all it needs is a microphone to listen to Alexa, convert speech-to-text, and then transmit it back to you across the world to complete the cycle.
We’re not sure, but he thinks this hack will lead him to world domination. Good Luck with that.
Speech synthesis is nothing new, but it has gotten better lately. It is about to get even better thanks to DeepMind’s WaveNet project. The Alphabet (or is it Google?) project uses neural networks to analyze audio data and it learns to speak by example. Unlike other text-to-speech systems, WaveNet creates sound one sample at a time and affords surprisingly human-sounding results.
Before you rush to comment “Not a hack!” you should know we are seeing projects pop up on GitHub that use the technology. For example, there is a concrete implementation by [ibab]. [Tomlepaine] has an optimized version. In addition to learning English, they successfully trained it for Mandarin and even to generate music. If you don’t want to build a system out yourself, the original paper has audio files (about midway down) comparing traditional parametric and concatenative voices with the WaveNet voices.
Another interesting project is the reverse path — teaching WaveNet to convert speech to text. Before you get too excited, though, you might want to note this quote from the read me file:
“We’ve trained this model on a single Titan X GPU during 30 hours until 20 epochs and the model stopped at 13.4 ctc loss. If you don’t have a Titan X GPU, reduce batch_size in the train.py file from 16 to 4.”
Last time we checked, you could get a Titan X for a little less than $2,000.
There is a multi-part lecture series on reinforced learning (the foundation for DeepMind). If you wanted to tackle a project yourself, that might be a good starting point (the first part appears below).