There was a time when you had to get up from the couch to change the channel on your TV. But then came the remote control, which saved us from having to move our legs. Later still we got electronic assistants from the likes of Amazon and Google which allowed us to command our home electronics with nothing more than our voice, so now we don’t even have to pick up the remote. Ushering in the next era of consumer gelification, [Nick Bild] has created ShAIdes: a pair of AI-enabled glasses that allow you to control devices by looking at them.
Of course on a more serious note, vision-based home automation could be a hugely beneficial assistive technology for those with limited mobility. By simply looking at the device you want to control and waving in its direction, the system knows which appliance to activate. In the video after the break, you can see [Nick] control lamps and his speakers with such ease that it almost looks like magic; a defining trait of any sufficiently advanced technology.
So how does it work? A Raspberry Pi camera module mounted to a pair of sunglasses captures video which is sent down to a NVIDIA Jetson Nano. Here, two separate image classification Convolutional Neural Network (CNN) models are being used to identify objects which can be controlled in the background, and hand gestures in the foreground. When there’s a match for both, the system can fire off the appropriate signal to turn the device on or off. Between the Nano, the camera, and the battery pack to make it all mobile, [Nick] says the hardware cost about $150 to put together.
But really, the hardware is only one small piece of the puzzle in a project like this. Which is why we’re happy to see [Nick] go into such detail about how the software functions, and crucially, how he trained the system. Just the gesture recognition subroutine alone went through nearly 20K images so it could reliably detect an arm extended into the frame.
Convolutional Neural Networks or CNNs are a class of Deep learning networks that reduces the number of computations to be performed by creating hierarchical patterns from simpler and smaller networks. They are becoming the norm for image recognition applications and are being used in the field. In this new paper, the addition of color patches is seen to confuse the image detector YoLo(v2) by adding noise that disrupts the calculations of the CNN. The patch is not random and can be identified using the process defined in the publication.
This attack can be implemented by printing the disruptive pattern on a t-shirt making them invisible to surveillance system detection. You can read the paper[PDF] that outlines the generation of the adversarial patch. Image recognition camouflage that works on Google’s Inception has been documented in the past and we hope to see more such hacks in the future. Its a new world out there where you hacking is colorful as ever.
Depending on which hemisphere of the Earth you’re currently reading this from, summer is finally starting to fight its way to the surface. For the more “green” of our readers, that can mean it’s time to start making plans for summer gardening. But as anyone who’s ever planted something edible can tell you, garden pests such as squirrels are fantastically effective at turning all your hard work into a wasteland. Finding ways to keep them away from your crops can be a full-time job, but luckily it’s a job nobody will mind if automation steals from humans.
[Peter Quinn] writes in to tell us about the elaborate lengths he is going to keep bushy-tailed marauders away from his tomatoes this year. Long term he plans on setting up a non-lethal sentry gun to scare them away, but before he can get to that point he needs to perfect the science of automatically targeting his prey. At the same time, he wants to train the system well enough that it won’t fire on humans or other animals such as cats and birds which might visit his garden.
A Raspberry Pi 3 with a cheap webcam is used to surveil the garden and detect motion. When frames containing motion are detected, they are forwarded to a laptop which has enough horsepower to handle the squirrel detection through Darknet YOLO. [Peter] recognizes this isn’t an ideal architecture for real-time targeting of a sentry turret, but it’s good enough for training the system.
Which incidentally is what [Peter] spends the most time explaining on the project’s Hackaday.io page. From the saga of getting the software environment up and running to determining how many pictures of squirrels in his yard he should provide the software for training, it’s an excellent case study in rolling your own image recognition system. After approximately 18 hours of training, he now has a system which is able to pick squirrels out from the foliage. The next step is hooking up the turret.
They probably weren’t inspired by [Jeff Dunham’s] jalapeno on a stick, but Intel have created the Movidius neural compute stick which is in effect a neural network in a USB stick form factor. They don’t rely on the cloud, they require no fan, and you can get one for well under $100. We were interested in [Jeff Johnson’s] use of these sticks with a Pynq-Z1. He also notes that it is a great way to put neural net power on a Raspberry Pi or BeagleBone. He shows us YOLO — an image recognizer — and applies it to an HDMI signal with the processing done on the Movidius. You can see the result in the first video, below.
At first, we thought you might be better off using the Z1’s built-in FPGA to do neural networks. [Jeff] points out that while it is possible, the Z1 has a lower-end device on it, so there isn’t that much FPGA real estate to play with. The stick, then, is a great idea. You can learn more about the device in the second video, below.
Deep Neural Networks can be pretty good at identifying images — almost as good as they are at attracting Silicon Valley venture capital. But they can also be fairly brittle, and a slew of research projects over the last few years have been working on making the networks’ image classification less likely to be deliberately fooled.
One particular line of attack involves adding particularly-crafted noise to an image that flips some bits in the deep dark heart of the network, and makes it see something else where no human would notice the difference. We got tipped with a YouTube video of a one-pixel attack, embedded below, where changing a single pixel in the image would fool the network. Take that robot overlords!
Or not so fast. Reading the fine-print in the cited paper paints a significantly less gloomy picture for Deep Neural Nets. First, the images in question were 32 pixels by 32 pixels to begin with, so each pixel matters, especially after it’s run through a convolution step with a few-pixel window. The networks they attacked weren’t the sharpest tools in the shed either, with somewhere around a 68% classification success rate. What this means is that the network was unsure to begin with for many of the test images — making it flip from its marginally best (correct) first choice to a second choice shouldn’t be all that hard.
[Douglas] hometown Goshen, Indiana takes the state’s motto ‘The Crossroads of America’ seriously, at least when it comes to trains. The city is the meeting point of three heavily frequented railroad tracks that cross near the center of town, resulting in a car-traffic nightmare. When everybody agrees that a situation is bad, it is time to quantify exactly how bad it is. [Douglas] stepped up for this task and delivered.
He describes himself as cheap, and the gear he used to analyze the railroad traffic at a crossing visible from his home certainly fits the bill: a decades-old webcam, a scratched telephoto lens and a laptop with a damaged hinge.
With the hardware in place, the next step was to write the software to count and time passing trains. Doing this in stable conditions with reasonable equipment would pose no problem to any modern image processing library, but challenged with variable lighting and poor image quality, [Douglas] needed another solution.
Instead of looking for actual trains, [Douglas] decided to watch the crossing signals. His program crops the webcam image and then compares the average brightness of the left and right halves to detect blinking. This rudimentary solution is robust enough to handle low light conditions as well as morning glare and passing cars.
The rest is verifying the data, making it fit for processing, and then combining it with publicly available data on car traffic at the affected intersections to estimate impact. The next council meeting will find [Douglas] well prepared. Traffic issues are a great field for citizen science as shown in Stuttgart earlier. If the idea of bolting old lenses to webcams intrigues you, we got you covered as well.
The good people at MIT’s Computer Science and Artificial Intelligence Laboratory [CSAIL] have found a way of tricking Google’s InceptionV3 image classifier into seeing a rifle where there actually is a turtle. This is achieved by presenting the classifier with what is called ‘adversary examples’.
Adversary examples are a proven concept for 2D stills. In 2014 [Goodfellow], [Shlens] and [Szegedy] added imperceptible noise to the image of a panda that from then on was classified as gibbon. This method relies on the image being undisturbed and can be overcome by zooming, blurring or rotating the image.
The applicability for real world shenanigans has been seriously limited but this changes everything. This weaponized turtle is a color 3D print that is reliably misclassified by the algorithm from any point of view. To achieve this, some knowledge about the classifier is required to generate misleading input. The image transformations, such as rotation, scaling and skewing but also color corrections and even print errors are added to the input and the result is then optimized to reliably mislead the algorithm. The whole process is documented in [CSAIL]’s paper on the method.
What this amounts to is camouflage from machine vision. Assuming that the method also works the other way around, the possibility of disguising guns (or anything else) as turtles has serious implications for automated security systems.
As this turtle targets the Inception algorithm, it should be able to fool the DIY image recognition talkbox that Hackaday’s own [Steven Dufresne] built.