Voice Without Sound

Voice recognition is becoming more and more common, but anyone who’s ever used a smart device can attest that they aren’t exactly fool-proof. They can activate seemingly at random, don’t activate when called or, most annoyingly, completely fail to understand the voice commands. Thankfully, researchers from the University of Tokyo are looking to improve the performance of devices like these by attempting to use them without any spoken voice at all.

The project is called SottoVoce and uses an ultrasound imaging probe placed under the user’s jaw to detect internal movements in the speaker’s larynx. The imaging generated from the probe is fed into a series of neural networks, trained with hundreds of speech patterns from the researchers themselves. The neural networks then piece together the likely sounds being made and generate an audio waveform which is played to an unmodified Alexa device. Obviously a few improvements would need to be made to the ultrasonic imaging device to make this usable in real-world situations, but it is interesting from a research perspective nonetheless.

The research paper with all the details is also available (PDF warning). It’s an intriguing approach to improving the performance or quality of voice especially in situations where the voice may be muffled, non-existent, or overlaid with a lot of background noise. Machine learning like this seems to be one of the more powerful tools for improving speech recognition, as we saw with this robot that can walk across town and order food for you using voice commands only.

Continue reading “Voice Without Sound”

How To Roll Your Own Custom Object Detection Neural Network

Real-time object detection, which uses neural networks and deep learning to rapidly identify and tag objects of interest in a video feed, is a handy feature with great hacker potential. Happily, it’s also possible to make customized CNNs (convolutional neural networks) tailored for one’s own needs, and that process just got easier thanks to some new documentation for the Vizy “AI camera” by Charmed Labs.

Raspberry Pi-based Vizy camera

Charmed Labs has been making hacker-friendly machine vision devices for a long time, and the Vizy camera impressed us mightily when we checked it out last year. Out of the box, Vizy has a perfectly functional object detector application that runs locally on the device, and can detect and tag many common everyday objects in real time. But what if that default application doesn’t quite meet one’s project needs? Good news, because it’s possible to create a custom-trained CNN, and that process got a lot more accessible thanks to step-by-step examples of training a model to recognize hands doing rock-paper-scissors.

Person and cat with machine-generated tags identifying them
Default object detection works well, but sometimes one needs custom results.

The basic process is this: Start with a variety of images that show the item of interest. Then identify and label the item of interest in each photo. These photos (a “training set”) are then sent to Google Colab, which will be used to generate a neural network. The resulting CNN model can then be downloaded and used, to see how well it performs.

Of course things rarely work perfectly the first time around, so at this point it’s pretty common for some refinement to be needed to increase accuracy. Luckily there are a number of tools to help do this without creating a new model from scratch, so it’s just a matter of tweaking until things perform acceptably.

Google Colab is free and the resulting CNNs are implemented in the TensorFlow Lite framework, meaning it’s possible to use them elsewhere. So if custom object detection has been holding up a project idea of yours, this might be what gets you over that hump.

Render Yourself Invisible To AI With This Adversarial Sweater Of Doom

Ugly sweater season is rapidly approaching, at least here in the Northern Hemisphere. We’ve always been a bit baffled by the tradition of paying top dollar for a loud, obnoxious sweater that gets worn to exactly one social event a year. We don’t judge, of course, but that’s not to say we wouldn’t look a little more favorably on someone’s fashion choice if it were more like this AI-defeating adversarial ugly sweater.

The idea behind this research from the University of Maryland is not, of course, to inform fashion trends, nor is it to create a practical invisibility cloak. It’s really to probe machine learning systems for vulnerabilities by making small changes to the input while watching for changes in the output. In this case, the ML system was a YOLO-based vision system which has little trouble finding humans in an arbitrary image. The adversarial pattern was generated by using a large set of training images, some of which contain the objects of interest — in this case, humans. Each time a human is detected, a random pattern is rendered over the image, and the data is reassessed to see how much the pattern lowers the object’s score. The adversarial pattern eventually improves to the point where it mostly prevents humans from being recognized. Much more detail is available in the research paper (PDF) if you want to dig into the guts of this.

The pattern, which looks a little like a bad impressionist painting of people buying pumpkins at a market and bears some resemblance to one we’ve seen before in similar work, is said to work better from different viewing angles. It also makes a spiffy pullover, especially if you’d rather blend in at that Christmas party.

OpenAI Hears You Whisper

Should you wish to try high-quality voice recognition without buying something, good luck. Sure, you can borrow the speech recognition on your phone or coerce some virtual assistants on a Raspberry Pi to handle the processing for you, but those aren’t good for major work that you don’t want to be tied to some closed-source solution. OpenAI has introduced Whisper, which they claim is an open source neural net that “approaches human level robustness and accuracy on English speech recognition.” It appears to work on at least some other languages, too.

If you try the demonstrations, you’ll see that talking fast or with a lovely accent doesn’t seem to affect the results. The post mentions it was trained on 680,000 hours of supervised data. If you were to talk that much to an AI, it would take you 77 years without sleep!

Continue reading “OpenAI Hears You Whisper”

Insect class-order-family-genus-species chart with drawn examples

Neural Network Identifies Insects, Outperforming Humans

There are about one million known species of insects – more than for any other group of living organisms. If you need to determine which species an insect belongs to, things get complicated quick. In fact, for distinguishing between certain kinds of species, you might need a well-trained expert in that species, and experts’ time is often better spent on something else. This is where CNNs (convolutional neural networks) come in nowadays, and this paper describes a CNN doing just as well if not better than human experts.

Continue reading “Neural Network Identifies Insects, Outperforming Humans”

Food Irradiation Detector Doesn’t Use Banana For Scale

How do the potatoes in that sack keep from sprouting on their long trip from the field to the produce section? Why don’t the apples spoil? To an extent, the answer lies in varying amounts of irradiation. Though it sounds awful, irradiation reduces microbial contamination, which improves shelf life. Most people can choose to take it or leave it, but in some countries, they aren’t overly concerned about the irradiation dosages found in, say, animal feed. So where does that leave non-vegetarians?

If that line of thinking makes you want to Hulk out, you’re not alone. [kutluhan_aktar] decided to build an IoT food irradiation detector in an effort to help small businesses make educated choices about the feed they give to their animals. The device predicts irradiation dosage level using a combination of the food’s weight, color, and emitted ionizing radiation after being exposed to sunlight for an appreciable amount of time. Using this information, [kutluhan_aktar] trained a neural network running on a Beetle ESP32-C3 to detect the dosage and display relevant info on a transparent OLED screen. Primarily, the device predicts whether the dosage falls into the Regulated, Unsafe, or just plain Hazardous category.

[kutluhan_aktar] lets this baby loose on some uncooked pasta in the short demo video after the break. The macaroni is spread across a load cell to detect the weight, while [kutluhan_aktar] uses a handheld sensor to determine the color.

This isn’t the first time we’ve seen AI on the Hackaday menu. Remember when we tried those AI-created recipes?

picture of a brambling (a small bird), with "BirdNET-Pi" written above it

Neural Network Identifies Bird Calls, Even On Your Pi

Recently, we’ve stumbled upon the extensive effort that is the BirdNET research platform. BirdNET uses a neural network to identify birds by the sounds they make, and is a joint project between the Cornell Lab of Ornithology and the Chemnitz University of Technology. What strikes us is – this project is impressively featureful and accessible for a variety of applications. No doubt, BirdNET is aiming to become a one-stop shop for identifying birds as they sing.

There’s plenty of ways BirdNET can help you. Starting with likely the most popular option among us, there are iOS and Android apps – giving the microphone-enabled “smart” devices in our pockets a feature even the most app-averse hackers can respect. However, the BirdNET team also talks about bringing sound recognition to our browsers, Raspberry Pi and other SBCs, and even microcontrollers. We can’t wait for someone to bring BirdNET to a RP2040! The code’s open-source, the models are freely available – there’s hardly a use case one couldn’t cover with these.

Screenshot of the BirdNET-Pi interface, showing a chart of bird chirp occurences, and a spectrogram below itAbout that Raspberry Pi version! There’s a sister project called BirdNET-Pi – it’s an easy-to-install software package intended for the Raspberry Pi OS. Having equipped your Pi with a USB sound card, you can make it do 24/7 recording and analysis using a “lite” version of BirdNET. Then, you get a web interface you can log into and see bird sounds identified in real-time. Not just that – BirdNET-Pi also processes the sounds and creates spectrograms, keeps the sound in a database, and can even send you notifications.

The BirdNET-Pi project is open, too, of course. Not just that – the BirdNET-Pi team emphasizes everything being fully local, unless you choose otherwise, and perhaps decide to share it with others. Many do make their BirdNET-Pi instances public, and there’s a lovely interactive map that shows bird sounds all across the world!

BirdNET is, undoubtedly, a high-effort project – and a shining example of what a dedicated research team can do with a neural network and an admirable goal in mind. For many of us who feel joy when we hear birds outside, it’s endearing to know that we can plug a USB sound card into our Pi and learn more about them – even if we can’t spot them or recognize them by sight just yet. We’ve covered bird sound recognition on microcontrollers before – also using machine learning.