Need A Snack From Across Town? Send Spot!

May 2, 2022 by Dave Rowntree 22 Comments

[Dave Niewinski] clearly knows a thing or two about robots, judging from his YouTube channel. Usually the projects involve robot arms mounted on some sort of wheeled platform, but this time it’s the tune of some pretty famous yellow robot legs, in the shape of spot from Boston Dynamics. The premise is simple — tell the robot what snacks you want, entirely by voice command, and off he goes to fetch. But, we’re not talking about navigating to the fridge in the same room. We’re talking about trotting out the front door, down the street and crossing roads to visit favorite restaurant. Spot will order the snacks and bring them back, fully autonomously.

Spot’s depth cameras provide localized navigation and object avoidance information

Local AI vision system handles avoiding those pesky moving objects

There are multiple things going here, all of which are pretty big computational tasks. Firstly, there is no cloud-based voice control, ala Google voice or Alexa. The robot works on the premise of full autonomy, which means no internet connectivity for any aspect. All voice recognition, voice-to-text, and speech synthesis are performed locally using the NVIDIA Riva GPU-based AI speech SDK, running on the local NVIDIA Jetson AGX Orin carried on Spot’s back. A front-facing webcam supplies the audio feed for this. The voice recognition application listens for the wake phrase, then turns the snack order into text, for later replay when it gets to the destination. Navigation is taken care of with a Microstrain RTK GNSS module, which has all the needed robustness, such as dual antennas, and inertial fallback for those regions with a spotty signal. Navigation is no use out in the real world on its own, which is where Spot’s depth sensor cameras come in. These enable local obstacle avoidance, as per the usual spot behavior we’ve all seen before. But what about crossing the road without getting tens of thousands of dollars of someone else’s hardware crushed by a passing truck? Spot’s onboard streaming cameras are fed into the NVIDIA dash cam net AI platform which enables real-time recognition of moving obstacles such as cars, humans and anything else that might be wandering around and get in the way. All in all a cool project showing the future potential of AI in robotics for important tasks, like fetching me a beer when I most need it, even if it comes from the local corner shop.

We love robots around here. Robots can mow your lawn, navigate inside your house with a little help from invisible QR Codes, even help out with growing your food. The robot-assisted future long promised, may now be looking more like the present.

Continue reading “Need A Snack From Across Town? Send Spot!” →

Display Your Speech In Realtime To Help Lipreaders In The Mask Era

January 23, 2022 by Lewin Day 18 Comments

Masks are all well and good when it comes to reducing the spread of deadly pathogens, but they can make it harder to understand people when they speak. They also make lipreading impossible. [Kevin Lewis] set about building something to help.

The system consists of a small screen that can be worn on the chest or other part of the body, and a lapel microphone to record the wearer’s speech. Using the Deepgram AI speech recognition API running on a Raspberry Pi Zero W, the system decodes the speech and displays it on the Hyperpixel screen.

The API is quite capable, and can be set to only respond to the wearer’s voice, or in a group mode, display speech from multiple people in the area, displaying other voices in another colour. There’s also a translation feature using the iTranslateApp API as well.

It’s a neat tool that could be of great use in conferences or in situations where a quick simple machine translation could majorly ease communication. Video after the break.
Continue reading “Display Your Speech In Realtime To Help Lipreaders In The Mask Era” →

Voice Command Made Mostly Easy

December 27, 2021 by Al Williams 19 Comments

Speech commands are all the rage on everything from digital assistants to cars. Adding it to your own projects is a lot of work, right? Maybe not. [Electronoobs] shows a speech board that lets you easily integrate 255 voice commands via serial communications with a host computer. You can see the review in the video below.

He had actually used a similar board before, but that version was a few years ago, and the new module has, of course, many new features. As of version 3.1, the board can handle 255 commands in a more flexible way than the older versions.

Continue reading “Voice Command Made Mostly Easy” →

Speech Recognition On An Arduino Nano?

May 26, 2021 by Orlando Hoilett 24 Comments

Like most of us, [Peter] had a bit of extra time on his hands during quarantine and decided to take a look back at speech recognition technology in the 1970s. Quickly, he started thinking to himself, “Hmm…I wonder if I could do this with an Arduino Nano?” We’ve all probably had similar thoughts, but [Peter] really put his theory to the test.

The hardware itself is pretty straightforward. There is an Arduino Nano to run the speech recognition algorithm and a MAX9814 microphone amplifier to capture the voice commands. However, the beauty of [Peter’s] approach, lies in his software implementation. [Peter] has a bit of an interplay between a custom PC program he wrote and the Arduino Nano. The learning aspect of his algorithm is done on a PC, but the implementation is done in real-time on the Arduino Nano, a typical approach for really any machine learning algorithm deployed on a microcontroller. To capture sample audio commands, or utterances, [Peter] first had to optimize the Nano’s ADC so he could get sufficient sample rates for speech processing. Doing a bit of low-level programming, he achieved a sample rate of 9ksps, which is plenty fast for audio processing.

To analyze the utterances, he first divided each sample utterance into 50 ms segments. Think of dividing a single spoken word into its different syllables. Like analyzing the “se-” in “seven” separate from the “-ven.” 50 ms might be too long or too short to capture each syllable cleanly, but hopefully, that gives you a good mental picture of what [Peter’s] program is doing. He then calculated the energy of 5 different frequency bands, for every segment of every utterance. Normally that’s done using a Fourier transform, but the Nano doesn’t have enough processing power to compute the Fourier transform in real-time, so Peter tried a different approach. Instead, he implemented 5 sets of digital bandpass filters, allowing him to more easily compute the energy of the signal in each frequency band.

The energy of each frequency band for every segment is then sent to a PC where a custom-written program creates “templates” based on the sample utterances he generates. The crux of his algorithm is comparing how closely the energy of each frequency band for each utterance (and for each segment) is to the template. The PC program produces a .h file that can be compiled directly on the Nano. He uses the example of being able to recognize the numbers 0-9, but you could change those commands to “start” or “stop,” for example, if you would like to.

[Peter] admits that you can’t implement the type of speech recognition on an Arduino Nano that we’ve come to expect from those covert listening devices, but he mentions small, hands-free devices like a head-mounted multimeter could benefit from a single word or single phrase voice command. And maybe it could put your mind at ease knowing everything you say isn’t immediately getting beamed into the cloud and given to our AI overlords. Or maybe we’re all starting to get used to this. Whatever your position is on the current state of AI, hopefully, you’ve gained some inspiration for your next project.

Death Of The Turing Test In An Age Of Successful AIs

April 6, 2021 by Steven Dufresne 98 Comments

IBM has come up with an automatic debating system called Project Debater that researches a topic, presents an argument, listens to a human rebuttal and formulates its own rebuttal. But does it pass the Turing test? Or does the Turing test matter anymore?

The Turing test was first introduced in 1950, often cited as year-one for AI research. It asks, “Can machines think?”. Today we’re more interested in machines that can intelligently make restaurant recommendations, drive our car along the tedious highway to and from work, or identify the surprising looking flower we just stumbled upon. These all fit the definition of AI as a machine that can perform a task normally requiring the intelligence of a human. Though as you’ll see below, Turing’s test wasn’t even for intelligence or even for thinking, but rather to determine a test subject’s sex.

Continue reading “Death Of The Turing Test In An Age Of Successful AIs” →

Code Talkers: Programming With Voice

March 28, 2021 by Al Williams 18 Comments

IEEE Spectrum had an interesting post covering several companies trying to sell voice programming interfaces. Not programming APIs for speech recognition, but the replacement of the traditional text editor to produce programs.

The companies, Serenade and Talon, have very different styles. Serenade has fairly normal-sounding language, whereas Talon has you use very specific phrases and can even use eye tracking to figure out what you are looking at when you issue a command. There’s also mention of two open-source products (Aenae and Caster) that require you to use a third-party speech engine.

For an example of Talon’s input, imagine you want this line of code in your program:

name=extract_word(m)

You’d say this out loud: “Phrase name op equals snake extract word paren mad.” Not exactly how Star Trek envisioned voice programming.

For accessibility, this might be workable. It is hard for us to imagine a room full of developers all talking to make their computers enter C or Python code. Until we can say, “Computer, build a graphic using the data in file hackaday-27,” we think this is not going to go mainstream.

The actual speech recognition part is pretty much a commodity now. Making a reasonable set of guesses about what people will say and what they mean by it is something else. It seems like this works best when you have a very specific and limited vocabulary, like operating a 3D printer.

Hackaday Links: December 6, 2020

December 6, 2020 by Dan Maloney 4 Comments

By now you’ve no doubt heard of the sudden but not unexpected demise of the iconic Arecibo radio telescope in Puerto Rico. We have been covering the agonizing end of Arecibo from almost the moment the first cable broke in August to a eulogy, and most recently its final catastrophic collapse this week. That last article contained amazing video of the final collapse, including up-close and personal drone shots of the cable breaking. For a more in-depth analysis of the collapse, it’s hard to beat Scott Manley’s frame-by-frame analysis, which really goes into detail about what happened. Seeing the paint spalling off the cables as they stretch and distort under loads far greater than they were designed for is both terrifying and fascinating.

Exciting news from Australia as the sample return capsule from JAXA’s Hayabusa2 asteroid explorer returned safely to Earth Saturday. We covered Hayabusa2 in our roundup of extraterrestrial excavations a while back, describing how it used both a tantalum bullet and a shaped-charge penetrator to blast regolith from the surface of asteroid 162173 Ryugu. Samples of the debris were hoovered up and hermetically sealed for the long ride back to Earth, which culminated in the fiery re-entry and safe landing in the midst of the Australian outback. Planetary scientists are no doubt eager to get a look inside the capsule and analyze the precious milligrams of space dust. In the meantime, Hayabusa2, with 66 kilograms of propellant remaining, is off on an extended mission to visit more asteroids for the next eleven years or so.

The 2020 Remoticon has been wrapped up for most of a month now, but one thing we noticed was how much everyone seemed to like the Friday evening Bring-a-Hack event that was hosted on Remo. To kind of keep that meetup momentum going and to help everyone slide into the holiday season with a little more cheer, we’re putting together a “Holiday with Hackaday & Tindie” meetup on Tuesday, December 15 at noon Pacific time. The details haven’t been shared yet, but our guess is that this will certainly be a “bring-a-hack friendly” event. We’ll share more details when we get them this week, but for now, hop over to the Remo event page and reserve your spot.

On the Buzzword Bingo scorecard, “Artificial Intelligence” is a square that can almost be checked off by default these days, as companies rush to stretch the definition of the term to fit almost every product in the neverending search for market share. But even those products that actually have machine learning built into them are only as good as the data sets used to train them. That can be a problem for voice-recognition systems; while there are massive databases of utterances in just about every language, the likes of Amazon and Google aren’t too willing to share what they’ve leveraged from their smart speaker using customer base. What’s the little person to do? Perhaps the People’s Speech database will help. Part of the MLCommons project, it has 86,000 hours of speech data, mostly derived from audiobooks, a clever source indeed since the speech and the text can be easily aligned. The database also pulls audio and the corresponding text from Wikipedia and other random sources around the web. It’s a small dataset, to be sure, but it’s a start.

And finally, divers in the Baltic Sea have dredged up a bit of treasure: a Nazi Enigma machine. Divers in Gelting Bay near the border of Germany and Denmark found what appeared to be an old typewriter caught in one of the abandoned fishing nets they were searching for. When they realized what it was — even crusted in 80-years-worth of corrosion and muck some keys still look like they’re brand new — they called in archaeologists to take over recovery. Gelting Bay was the scene of a mass scuttling of U-boats in the final days of World War II, so this Engima may have been pitched overboard before by a Nazi commander before pulling the plug on his boat. It’ll take years to restore, but it’ll be quite a museum piece when it’s done.