Masks are all well and good when it comes to reducing the spread of deadly pathogens, but they can make it harder to understand people when they speak. They also make lipreading impossible. [Kevin Lewis] set about building something to help.
The system consists of a small screen that can be worn on the chest or other part of the body, and a lapel microphone to record the wearer’s speech. Using the Deepgram AI speech recognition API running on a Raspberry Pi Zero W, the system decodes the speech and displays it on the Hyperpixel screen.
A lot of what we take for granted these days existed only in the realm of science fiction not all that long ago. And perhaps nowhere is this more true than in the field of machine vision. The little bounding box that pops up around everyone’s face when you go to take a picture with your cell phone is a perfect example; it seems so trivial now, but just think about what’s involved in putting that little yellow box on the screen, and how it would not have been plausible just 20 years ago.
Perhaps even more exciting than the development of computer vision systems is their accessibility to anyone, as well as their move into the third dimension. No longer confined to flat images, spatial AI and CV systems seek to extract information from the position of objects relative to others in the scene. It’s a huge leap forward in making machines see like we see and make decisions based on that information.
To help us along the road to incorporating spatial AI into our projects, Erik Kokalj will stop by the Hack Chat. Erik does technical documentation and support at Luxonis, a company working on the edge of spatial AI and computer vision. Join us as we explore the depths of spatial AI.
We have to admit, it was hard not to be insufferably smug this week when Facebook temporarily went dark around the globe. Sick of being stalked by crazy aunts and cousins, I opted out of that little slice of cyber-hell at least a decade ago, so Monday’s outage was no skin off my teeth. But it was nice to see that the world didn’t stop turning. More interesting are the technical postmortems on the outage, particularly this great analysis by the good folks at the University of Nottingham. Dr. Steve Bagley does a great job explaining how Facebook likely pushed a configuration change to the Border Gateway Protocol (BGP) that propagated through the Internet and eventually erased all routes to Facebook’s servers from the DNS system. He also uses a graphical map of routes to show peer-to-peer connections to Facebook dropping one at a time, until their machines were totally isolated. He also offers speculation on why Facebook engineers were denied internal access, sometimes physically, to their own systems.
It may be a couple of decades overdue, but the US Federal Communications Commission finally decided to allow FM voice transmissions on Citizen’s Band radios. It seems odd to be messing around with a radio service whose heyday was in the 1970s, but Cobra, the CB radio manufacturer, petitioned for a rule change to allow frequency modulation in addition to the standard amplitude modulation that’s currently mandatory. It’s hard to say how this will improve the CB user experience, which last time we checked is a horrifying mix of shouting, screaming voices often with a weird echo effect, all put through powerful — and illegal — linear amps that distort the signal beyond intelligibility. We can’t see how a little less static is going to improve that.
Can you steal a car with a Game Boy? Probably not, but car thieves in the UK are using some sort of device hidden in a Game Boy case to boost expensive cars. A group of three men in Yorkshire used the device, which supposedly cost £20,000 ($27,000), to wirelessly defeat the security systems on cars in seconds. They stole cars for garages and driveways to the tune of £180,000 — not a bad return on their investment. It’s not clear how the device works, but we’d love to find out — for science, of course.
There have been tons of stories lately about all the things AI is good for, and all the magical promises it will deliver on given enough time. And it may well, but we’re still early enough in the AI hype curve to take everything we see with a grain of salt. However, one area that bears watching is the ability of AI to help fill in the gaps left when an artist is struck down before completing their work. And perhaps no artist left so much on the table as Ludwig von Beethoven, with his famous unfinished 10th Symphony. When the German composer died, he had left only a few notes on what he wanted to do with the four-movement symphony. But those notes, along with a rich body of other works and deep knowledge of the composer’s creative process, have allowed a team of musicologists and AI experts to complete the 10th Symphony. The article contains a lot of technical detail, both on the musical and the informatics sides. How will it sound? Here’s a preview:
And finally, Captain Kirk is finally getting to space. William Shatner, who played captain — and later admiral — James Tiberius Kirk from the 1960s to the 1990s, will head to space aboard Blue Origin’s New Shepard rocket on Tuesday. At 90 years old, Shatner will edge out Wally Funk, who recently set the record after her Blue Origin flight at the age of 82. It’s interesting that Shatner agreed to go, since he is said to have previously refused the offer of a ride upstairs with Virgin Galactic. Whatever the reason for the change of heart, here’s hoping the flight goes well.
The philosophical question of “What is art?” has an ethereal, transient quality to it. A definition seems to slip away as you get close to an answer. Embracing that quality, [Max Fischer] has created an AI-powered painting that paints a new piece of art at the push of a button. When the button below the screen is pushed, a new image is generated and the old one is forever lost, which in a way, makes the frame a piece of art itself.
The really makes this project stand is the sheer quality of documentation on the GitHub repo. The instructions are incredibly detailed. Everything from setting up the Jetson to building the control box out of half-inch MDF (12mm for the sane part of the world) is laid out with copious pictures. Despite the ease of generating images ahead of time, [Max] took the hard route Hackaday route and did all inference locally and in real-time. To handle the processing requirements, an Nvidia Jetson Xavier NX single-board computer was used. He trained StyleGAN with high-resolution abstract art that gets generated whenever the button below the screen is pushed. To prevent screen burn-in, a PIR was added to turn the screen off when no one is around.
For all their detail, [Max] did leave one thing out of the readme: the AI that generates the art. [Max] hints that he wants others to create their picture frames, but with their own art generation. So what are you waiting for? Go make some art.
Like most of us, [Peter] had a bit of extra time on his hands during quarantine and decided to take a look back at speech recognition technology in the 1970s. Quickly, he started thinking to himself, “Hmm…I wonder if I could do this with an Arduino Nano?” We’ve all probably had similar thoughts, but [Peter] really put his theory to the test.
The hardware itself is pretty straightforward. There is an Arduino Nano to run the speech recognition algorithm and a MAX9814 microphone amplifier to capture the voice commands. However, the beauty of [Peter’s] approach, lies in his software implementation. [Peter] has a bit of an interplay between a custom PC program he wrote and the Arduino Nano. The learning aspect of his algorithm is done on a PC, but the implementation is done in real-time on the Arduino Nano, a typical approach for really any machine learning algorithm deployed on a microcontroller. To capture sample audio commands, or utterances, [Peter] first had to optimize the Nano’s ADC so he could get sufficient sample rates for speech processing. Doing a bit of low-level programming, he achieved a sample rate of 9ksps, which is plenty fast for audio processing.
To analyze the utterances, he first divided each sample utterance into 50 ms segments. Think of dividing a single spoken word into its different syllables. Like analyzing the “se-” in “seven” separate from the “-ven.” 50 ms might be too long or too short to capture each syllable cleanly, but hopefully, that gives you a good mental picture of what [Peter’s] program is doing. He then calculated the energy of 5 different frequency bands, for every segment of every utterance. Normally that’s done using a Fourier transform, but the Nano doesn’t have enough processing power to compute the Fourier transform in real-time, so Peter tried a different approach. Instead, he implemented 5 sets of digital bandpass filters, allowing him to more easily compute the energy of the signal in each frequency band.
The energy of each frequency band for every segment is then sent to a PC where a custom-written program creates “templates” based on the sample utterances he generates. The crux of his algorithm is comparing how closely the energy of each frequency band for each utterance (and for each segment) is to the template. The PC program produces a .h file that can be compiled directly on the Nano. He uses the example of being able to recognize the numbers 0-9, but you could change those commands to “start” or “stop,” for example, if you would like to.
We know you love a good biohack as much as we do, so we thought you would like [Tony’s] brainwave-controlled RC truck. Instead of building his own electroencephalogram (EEG), he thought he would use NeuroSky’s MindWave. EEGs are pretty complex, multi-frequency waves that require some fairly sophisticated circuitry and even more sophisticated signal processing to interpret. So, [Tony] thought it would be nice to off-load a bit of that heavy-lifting, and luckily for him, the MindWave headset is fairly hacker-friendly.
To control the car, he utilized the existing remote control instead of making his own. Like most people, [Tony] thought about hooking up the Arduino pins to the buttons on the remote control, thereby bypassing the physical buttons, but he noticed the buttons were a bit smaller than he was comfortable soldering to and he didn’t want to risk damaging the circuit board. [Tony’s] RC truck has a pistol grip transmitter, which inspired a slightly different approach. He mounted the servo onto the controller’s wheel mechanism, allowing him to control the direction of the truck by rotating the wheel using the servo. He then fashioned another servo onto the transmitter such that the servo could depress the throttle when it rotates. We thought that was a pretty nifty workaround.
We are always envious of the Star Trek Enterprise computers. You can just sort of ask them a hazy question and they will — usually — figure out what you want. Even the automatic doors seemed to know the difference between someone walking into a turbolift versus someone being thrown into the door during a fight. [River] decided to try his new API keys for the private beta of an AI service to generate Linux commands based on a description. How does it work? Watch the video below and find out.
Some examples work fairly well. In response to “email the Rickroll video to Jeff Bezos,” the system produced a curl command and an e-mail to what we assume is the right place. “Find all files in the current directory bigger than 1 GB” works, too.