Twitter: It’s Not The Algorithm’s Fault. It’s Much Worse.

Maybe you heard about the anger surrounding Twitter’s automatic cropping of images. When users submit pictures that are too tall or too wide for the layout, Twitter automatically crops them to roughly a square. Instead of just picking, say, the largest square that’s closest to the center of the image, they use some “algorithm”, likely a neural network, trained to find people’s faces and make sure they’re cropped in.

The problem is that when a too-tall or too-wide image includes two or more people, and they’ve got different colored skin, the crop picks the lighter face. That’s really offensive, and something’s clearly wrong, but what?

A neural network is really just a mathematical equation, with the input variables being in these cases convolutions over the pixels in the image, and training them essentially consists in picking the values for all the coefficients. You do this by applying inputs, seeing how wrong the outputs are, and updating the coefficients to make the answer a little more right. Do this a bazillion times, with a big enough model and dataset, and you can make a machine recognize different breeds of cat.

What went wrong at Twitter? Right now it’s speculation, but my money says it lies with either the training dataset or the coefficient-update step. The problem of including people of all races in the training dataset is so blatantly obvious that we hope that’s not the problem; although getting a representative dataset is hard, it’s known to be hard, and they should be on top of that.

Which means that the issue might be coefficient fitting, and this is where math and culture collide. Imagine that your algorithm just misclassified a cat as an “airplane” or as a “lion”. You need to modify the coefficients so that they move the answer away from this result a bit, and more toward “cat”. Do you move them equally from “airplane” and “lion” or is “airplane” somehow more wrong? To capture this notion of different wrongnesses, you use a loss function that can numerically encapsulate just exactly what it is you want the network to learn, and then you take bigger or smaller steps in the right direction depending on how bad the result was.

Let that sink in for a second. You need a mathematical equation that summarizes what you want the network to learn. (But not how you want it to learn it. That’s the revolutionary quality of applied neural networks.)

Now imagine, as happened to Google, your algorithm fits “gorilla” to the image of a black person. That’s wrong, but it’s categorically differently wrong from simply fitting “airplane” to the same person. How do you write the loss function that incorporates some penalty for racially offensive results? Ideally, you would want them to never happen, so you could imagine trying to identify all possible insults and assigning those outcomes an infinitely large loss. Which is essentially what Google did — their “workaround” was to stop classifying “gorilla” entirely because the loss incurred by misclassifying a person as a gorilla was so large.

This is a fundamental problem with neural networks — they’re only as good as the data and the loss function. These days, the data has become less of a problem, but getting the loss right is a multi-level game, as these neural network trainwrecks demonstrate. And it’s not as easy as writing an equation that isn’t “racist”, whatever that would mean. The loss function is being asked to encapsulate human sensitivities, navigate around them and quantify them, and eventually weigh the slight risk of making a particularly offensive misclassification against not recognizing certain animals at all.

I’m not sure this problem is solvable, even with tremendously large datasets. (There are mathematical proofs that with infinitely large datasets the model will classify everything correctly, so you needn’t worry. But how close are we to infinity? Are asymptotic proofs relevant?)

Anyway, this problem is bigger than algorithms, or even their writers, being “racist”. It may be a fundamental problem of machine learning, and we’re definitely going to see further permutations of the Twitter fiasco in the future as machine classification is being increasingly asked to respect human dignity.

Garbage Can Takes Itself Out

Home automation is a fine goal but typically remains confined to lights, blinds, and other things that are relatively stationary and/or electrical in nature. There is a challenge there to be certain, but to really step up your home automation game you’ll need to think outside the box. This automated garbage can that can take itself out, for example, has all the home automation street cred you’d ever need.

The garbage can moves itself by means of a scooter wheel which has a hub motor inside and is powered by a lithium battery, but the real genius of this project is the electronics controlling everything. A Raspberry Pi Zero W is at the center of the build which controls the motor via a driver board and also receives instructions on when to wheel the garbage can out to the curb from an Nvidia Jetson board. That board is needed because the creator, [Ahad Cove], didn’t want to be bothered to tell his garbage can to take itself out or even schedule it. He instead used machine learning to detect when the garbage truck was headed down the street and instruct the garbage can to roll itself out then.

The only other thing to tie this build together was to get the garage door to open automatically for the garbage can. Luckily, [Ahad]’s garage door opener was already equipped with WiFi and had an available app, unbeknownst to him, which made this a surprisingly easy part of the build. If you have a more rudimentary garage door opener, though, there are plenty of options available to get it on the internets.

Continue reading “Garbage Can Takes Itself Out”

Home Automation At A Glance Using AI Glasses

There was a time when you had to get up from the couch to change the channel on your TV. But then came the remote control, which saved us from having to move our legs. Later still we got electronic assistants from the likes of Amazon and Google which allowed us to command our home electronics with nothing more than our voice, so now we don’t even have to pick up the remote. Ushering in the next era of consumer gelification, [Nick Bild] has created ShAIdes: a pair of AI-enabled glasses that allow you to control devices by looking at them.

Of course on a more serious note, vision-based home automation could be a hugely beneficial assistive technology for those with limited mobility. By simply looking at the device you want to control and waving in its direction, the system knows which appliance to activate. In the video after the break, you can see [Nick] control lamps and his speakers with such ease that it almost looks like magic; a defining trait of any sufficiently advanced technology.

So how does it work? A Raspberry Pi camera module mounted to a pair of sunglasses captures video which is sent down to a NVIDIA Jetson Nano. Here, two separate image classification Convolutional Neural Network (CNN) models are being used to identify objects which can be controlled in the background, and hand gestures in the foreground. When there’s a match for both, the system can fire off the appropriate signal to turn the device on or off. Between the Nano, the camera, and the battery pack to make it all mobile, [Nick] says the hardware cost about $150 to put together.

But really, the hardware is only one small piece of the puzzle in a project like this. Which is why we’re happy to see [Nick] go into such detail about how the software functions, and crucially, how he trained the system. Just the gesture recognition subroutine alone went through nearly 20K images so it could reliably detect an arm extended into the frame.

If controlling your home with a glance and wave isn’t quite mystical enough, you could always add an infrared wand to the mix for that authentic Harry Potter experience.

Continue reading “Home Automation At A Glance Using AI Glasses”

The Cloak Of Invisibility Against Image Recognition

Adversarial attacks are not something new to the world of Deep Networks used for image recognition. However, as the research with Deep Learning grows, more flaws are uncovered. The team at the University of KU Leuven in Belgium have demonstrated how, by simple using a colored photo held near the torso of a man can render him invisible to image recognition systems based on convolutional neural networks.

Convolutional Neural Networks or CNNs are a class of Deep learning networks that reduces the number of computations to be performed by creating hierarchical patterns from simpler and smaller networks. They are becoming the norm for image recognition applications and are being used in the field. In this new paper, the addition of color patches is seen to confuse the image detector YoLo(v2) by adding noise that disrupts the calculations of the CNN. The patch is not random and can be identified using the process defined in the publication.

This attack can be implemented by printing the disruptive pattern on a t-shirt making them invisible to surveillance system detection. You can read the paper[PDF] that outlines the generation of the adversarial patch. Image recognition camouflage that works on Google’s Inception has been documented in the past and we hope to see more such hacks in the future. Its a new world out there where you hacking is colorful as ever.

Continue reading “The Cloak Of Invisibility Against Image Recognition”

Training The Squirrel Terminator

Depending on which hemisphere of the Earth you’re currently reading this from, summer is finally starting to fight its way to the surface. For the more “green” of our readers, that can mean it’s time to start making plans for summer gardening. But as anyone who’s ever planted something edible can tell you, garden pests such as squirrels are fantastically effective at turning all your hard work into a wasteland. Finding ways to keep them away from your crops can be a full-time job, but luckily it’s a job nobody will mind if automation steals from humans.

Kitty gets a pass

[Peter Quinn] writes in to tell us about the elaborate lengths he is going to keep bushy-tailed marauders away from his tomatoes this year. Long term he plans on setting up a non-lethal sentry gun to scare them away, but before he can get to that point he needs to perfect the science of automatically targeting his prey. At the same time, he wants to train the system well enough that it won’t fire on humans or other animals such as cats and birds which might visit his garden.

A Raspberry Pi 3 with a cheap webcam is used to surveil the garden and detect motion. When frames containing motion are detected, they are forwarded to a laptop which has enough horsepower to handle the squirrel detection through Darknet YOLO. [Peter] recognizes this isn’t an ideal architecture for real-time targeting of a sentry turret, but it’s good enough for training the system.

Which incidentally is what [Peter] spends the most time explaining on the project’s Hackaday.io page. From the saga of getting the software environment up and running to determining how many pictures of squirrels in his yard he should provide the software for training, it’s an excellent case study in rolling your own image recognition system. After approximately 18 hours of training, he now has a system which is able to pick squirrels out from the foliage. The next step is hooking up the turret.

We’ve covered other automated turrets here on Hackaday, and we’ve seen automated devices for terrifying squirrels before, but this is the first time we’ve seen the concepts mixed.

Neural Networks… On A Stick!

They probably weren’t inspired by [Jeff Dunham’s] jalapeno on a stick, but Intel have created the Movidius neural compute stick which is in effect a neural network in a USB stick form factor. They don’t rely on the cloud, they require no fan, and you can get one for well under $100. We were interested in [Jeff Johnson’s] use of these sticks with a Pynq-Z1. He also notes that it is a great way to put neural net power on a Raspberry Pi or BeagleBone. He shows us YOLO — an image recognizer — and applies it to an HDMI signal with the processing done on the Movidius. You can see the result in the first video, below.

At first, we thought you might be better off using the Z1’s built-in FPGA to do neural networks. [Jeff] points out that while it is possible, the Z1 has a lower-end device on it, so there isn’t that much FPGA real estate to play with. The stick, then, is a great idea. You can learn more about the device in the second video, below.

Continue reading “Neural Networks… On A Stick!”

One-Pixel Attack Fools Neural Networks

Deep Neural Networks can be pretty good at identifying images — almost as good as they are at attracting Silicon Valley venture capital. But they can also be fairly brittle, and a slew of research projects over the last few years have been working on making the networks’ image classification less likely to be deliberately fooled.

One particular line of attack involves adding particularly-crafted noise to an image that flips some bits in the deep dark heart of the network, and makes it see something else where no human would notice the difference. We got tipped with a YouTube video of a one-pixel attack, embedded below, where changing a single pixel in the image would fool the network. Take that robot overlords!

We can’t tell what these are either..

Or not so fast. Reading the fine-print in the cited paper paints a significantly less gloomy picture for Deep Neural Nets. First, the images in question were 32 pixels by 32 pixels to begin with, so each pixel matters, especially after it’s run through a convolution step with a few-pixel window. The networks they attacked weren’t the sharpest tools in the shed either, with somewhere around a 68% classification success rate. What this means is that the network was unsure to begin with for many of the test images — making it flip from its marginally best (correct) first choice to a second choice shouldn’t be all that hard.

This isn’t to say that this line of research, adversarial training of the networks, is bogus. The idea that making neural nets robust to small changes is important. You don’t want turtles to be misclassified as guns, for instance, or Hackaday’s own Steven Dufresne misclassified as a tobacconist. And you certainly don’t want speech recognition software to be fooled by carefully crafted background noise. But if a claim of “astonishing results” on YouTube seems too good to be true, well, maybe it is.

Thanks [kamathin] for the tip!

Continue reading “One-Pixel Attack Fools Neural Networks”