Fooling Speech Recognition With Hidden Voice Commands

It’s 2018, and while true hoverboards still elude humanity, some future predictions have come true. It’s now possible to talk to computers, and most of the time they might even understand you. Speech recognition is usually achieved through the use of neural networks to process audio, in a way that some suggest mimics the operation of the human brain. However, as it turns out, they can be easily fooled.

The attack begins with an audio sample, generally of a simple spoken phrase, though music can also be used. The desired text that the computer should hear instead is then fed into an algorithm along with the audio sample. This function returns a low value when the output of the speech recognition system matches the desired attack phrase. The input audio file is gradually modified using the mathematics of gradient descent, creating a result that to a human sounds like one thing, and to a machine, something else entirely.

The audio files are available on the site for your own experimental purposes. In a noisy environment with poor audio coupling between speakers and a Google Pixel, results were poor – OK Google only heard the human phrase, not the encoded attack phrase. Given that the sound quality was poor, and the files were generated with a different speech model, this is not entirely surprising. We’d love to hear the results of your experiments in the comments.

It’s all a part of [Nicholas]’s PhD studies around the strengths and pitfalls of neural networks. It highlights the fact that neural networks don’t always work in the way we think they do. Google’s Inception is susceptible to similar attacks with images, as we’ve seen recently.

[Thanks to Wolfgang for the tip!]

Google’s Inception Sees This Turtle as a Gun; Image Recognition Camouflage

The good people at MIT’s Computer Science and Artificial Intelligence Laboratory [CSAIL] have found a way of tricking Google’s InceptionV3 image classifier into seeing a rifle where there actually is a turtle. This is achieved by presenting the classifier with what is called ‘adversary examples’.

Adversary examples are a proven concept for 2D stills. In 2014 [Goodfellow], [Shlens] and [Szegedy] added imperceptible noise to the image of a panda that from then on was classified as gibbon. This method relies on the image being undisturbed and can be overcome by zooming, blurring or rotating the image.

The applicability for real world shenanigans has been seriously limited but this changes everything. This weaponized turtle is a color 3D print that is reliably misclassified by the algorithm from any point of view. To achieve this, some knowledge about the classifier is required to generate misleading input. The image transformations, such as rotation, scaling and skewing but also color corrections and even print errors are added to the input and the result is then optimized to reliably mislead the algorithm. The whole process is documented in [CSAIL]’s paper on the method.

What this amounts to is camouflage from machine vision. Assuming that the method also works the other way around, the possibility of disguising guns (or anything else) as turtles has serious implications for automated security systems.

As this turtle targets the Inception algorithm, it should be able to fool the DIY image recognition talkbox that Hackaday’s own [Steven Dufresne] built.

Thanks to [Adam] for the tip.

Hardware for Deep Neural Networks

In case you didn’t make it to the ISCA (International Society for Computers and their Applications) session this year, you might be interested in a presentation by [Joel Emer] an MIT  professor and scientist for NVIDIA. Along with another MIT professor and two PhD students ([Vivienne Sze], [Yu-Hsin  Chen], and [Tien-Ju Yang]), [Emer’s] presentation covers hardware architectures for deep neural networks.

The presentation covers the background on deep neural networks and basic theory. Then it progresses to deep learning specifics. One interesting graph shows how neural networks are getting better at identifying objects in images every year and as of 2015 can do a better job than a human over a set of test images. However, the real key is using hardware to accelerate the performance of networks.

Hardware acceleration is important for several reasons. For one, many applications have lots of data associated. Also, training can involve many iterations which can take a long time.

Continue reading “Hardware for Deep Neural Networks”

Catastrophic Forgetting: Learning’s Effect on Machine Minds

What if every time you learned something new, you forgot a little of what you knew before? That sort of overwriting doesn’t happen in the human brain, but it does in artificial neural networks. It’s appropriately called catastrophic forgetting. So why are neural networks so successful despite this? How does this affect the future of things like self-driving cars? Just what limit does this put on what neural networks will be able to do, and what’s being done about it?

The way a neural network stores knowledge is by setting the values of weights (the lines in between the neurons in the diagram). That’s what those lines literally are, just numbers assigned to pairs of neurons. They’re analogous to the axons in our brain, the long tendrils that reach out from one neuron to the dendrites of another neuron, where they meet at microscopic gaps called synapses. The value of the weight between two artificial neurons is roughly like the number of axons between biological neurons in the brain.

To understand the problem, and the solutions below, you need to know a little more detail.

Continue reading “Catastrophic Forgetting: Learning’s Effect on Machine Minds”

DIY Raspberry Neural Network Sees All, Recognizes Some

As a fun project I thought I’d put Google’s Inception-v3 neural network on a Raspberry Pi to see how well it does at recognizing objects first hand. It turned out to be not only fun to implement, but also the way I’d implemented it ended up making for loads of fun for everyone I showed it to, mostly folks at hackerspaces and such gatherings. And yes, some of it bordering on pornographic — cheeky hackers.

An added bonus many pointed out is that, once installed, no internet access is required. This is state-of-the-art, standalone object recognition with no big brother knowing what you’ve been up to, unlike with that nosey Alexa.

But will it lead to widespread useful AI? If a neural network can recognize every object around it, will that lead to human-like skills? Read on. Continue reading “DIY Raspberry Neural Network Sees All, Recognizes Some”

From 50s Perceptrons To The Freaky Stuff We’re Doing Today

Things have gotten freaky. A few years ago, Google showed us that neural networks’ dreams are the stuff of nightmares, but more recently we’ve seen them used for giving game character movements that are indistinguishable from that of humans, for creating photorealistic images given only textual descriptions, for providing vision for self-driving cars, and for much more.

Being able to do all this well, and in some cases better than humans, is a recent development. Creating photorealistic images is only a few months old. So how did all this come about?

Continue reading “From 50s Perceptrons To The Freaky Stuff We’re Doing Today”

Neural Networks Walk Better Than Humans for Game Animation

Modern day video games have come a long way from Mario the plumber hopping across the screen. Incredibly intricate environments of games today are part of the lure for new gamers and this experience is brought to life by the characters interacting with the scene. However the illusion of the virtual world is disrupted by unnatural movements of the figures in performing actions such as turning around suddenly or climbing a hill.

To remedy the abrupt movements, [Daniel Holden et. al] recently published a paper (PDF) and a video showing a method to greatly improve the real-time character control mechanism. The proposed system uses a neural network that has been trained using a large data set of walking, jumping and other sequences on various terrains. The key is breaking down the process of bipedal movement and its cyclic behaviour into a series of sub-steps or phases. Each phase translates to a natural posture for the character while moving. The system precomputes the next-phases offline to conserve computational resources at runtime. Then considering user control, previous pose of the character(including joint positions) and terrain geometry, the consequent frame of the animation is computed. The computation is done by a regression network that calculates future position of the joints and a blending function is used for Motion Matching as described in a presentation (PDF) and video by [Simon Clavet]. Continue reading “Neural Networks Walk Better Than Humans for Game Animation”