There are about one million known species of insects – more than for any other group of living organisms. If you need to determine which species an insect belongs to, things get complicated quick. In fact, for distinguishing between certain kinds of species, you might need a well-trained expert in that species, and experts’ time is often better spent on something else. This is where CNNs (convolutional neural networks) come in nowadays, and this paper describes a CNN doing just as well if not better than human experts.
Humans weren’t made to sit in front of a computer all day, yet for many of us that’s how we spend a large part of our lives. Of course we all know that it’s important to get up and move around every now and then to stretch our muscles and get our blood flowing, but it’s easy to forget if you’re working towards a deadline. [Victor Sonck] thought he needed some reminders — as well as some not-so-gentle nudging — to get into the habit of doing a quick workout a few times a day.
To this end, he designed a piece of software that would lock his computer’s screen and only unlock it if he performed five push-ups. Locking the screen on his Linux box was as easy as sending a command through the network, but recognizing push-ups was a harder task for which [Victor] decided to employ machine learning. A Raspberry Pi with a webcam attached could do the trick, but the limited processing power of the Pi’s CPU might prove insufficient for processing lots of raw image data.
[Victor] therefore decided on using a Luxonis OAK-1, which is a 4K camera with a built-in machine-learning processor. It can run various kinds of image recognition systems including Blazepose, a pre-trained model that can recognize a person’s pose from an image. The OAK-1 uses this to send out a set of coordinates that describe the position of a person’s head, torso and limbs to the Raspberry Pi through a USB interface. A second machine-learning model running on the Pi then analyzes this dataset to recognize push-ups.
[Victor]’s video (embedded below) is an entertaining introduction into the world of machine-learning systems for video processing, as well as a good hands-on example of a project that results in a useful tool. If you’re interested in learning more about machine learning on small platforms, check out this 2020 Remoticon talk on machine learning on microcontrollers, or this 2019 Supercon talk about implementing machine vision on a Raspberry Pi.
Currently, if you want to use the Autopilot or Self-Driving modes on a Tesla vehicle you need to keep your hands on the wheel at all times. That’s because, ultimately, the human driver is still the responsible party. Tesla is adamant about the fact that functions which allow the car to steer itself within a lane, avoid obstacles, and intelligently adjust its speed to match traffic all constitute a driver assistance system. If somebody figures out how to fool the wheel sensor and take a nap while their shiny new electric car is hurtling down the freeway, they want no part of it.
So it makes sense that the company’s official line regarding the driver-facing camera in the Model 3 and Model Y is that it’s there to record what the driver was doing in the seconds leading up to an impact. As explained in the release notes of the June 2020 firmware update, Tesla owners can opt-in to providing this data:
Help Tesla continue to develop safer vehicles by sharing camera data from your vehicle. This update will allow you to enable the built-in cabin camera above the rearview mirror. If enabled, Tesla will automatically capture images and a short video clip just prior to a collision or safety event to help engineers develop safety features and enhancements in the future.
But [green], who’s spent the last several years poking and prodding at the Tesla’s firmware and self-driving capabilities, recently found some compelling hints that there’s more to the story. As part of the vehicle’s image recognition system, which usually is tasked with picking up other vehicles or pedestrians, they found several interesting classes that don’t seem necessary given the official explanation of what the cabin camera is doing.
If all Tesla wanted was a few seconds of video uploaded to their offices each time one of their vehicles got into an accident, they wouldn’t need to be running image recognition configured to detect distracted drivers against it in real-time. While you could make the argument that this data would be useful to them, there would still be no reason to do it in the vehicle when it could be analyzed as part of the crash investigation. It seems far more likely that Tesla is laying the groundwork for a system that could give the vehicle another way of determining if the driver is paying attention.
Maybe you heard about the anger surrounding Twitter’s automatic cropping of images. When users submit pictures that are too tall or too wide for the layout, Twitter automatically crops them to roughly a square. Instead of just picking, say, the largest square that’s closest to the center of the image, they use some “algorithm”, likely a neural network, trained to find people’s faces and make sure they’re cropped in.
The problem is that when a too-tall or too-wide image includes two or more people, and they’ve got different colored skin, the crop picks the lighter face. That’s really offensive, and something’s clearly wrong, but what?
A neural network is really just a mathematical equation, with the input variables being in these cases convolutions over the pixels in the image, and training them essentially consists in picking the values for all the coefficients. You do this by applying inputs, seeing how wrong the outputs are, and updating the coefficients to make the answer a little more right. Do this a bazillion times, with a big enough model and dataset, and you can make a machine recognize different breeds of cat.
What went wrong at Twitter? Right now it’s speculation, but my money says it lies with either the training dataset or the coefficient-update step. The problem of including people of all races in the training dataset is so blatantly obvious that we hope that’s not the problem; although getting a representative dataset is hard, it’s known to be hard, and they should be on top of that.
Which means that the issue might be coefficient fitting, and this is where math and culture collide. Imagine that your algorithm just misclassified a cat as an “airplane” or as a “lion”. You need to modify the coefficients so that they move the answer away from this result a bit, and more toward “cat”. Do you move them equally from “airplane” and “lion” or is “airplane” somehow more wrong? To capture this notion of different wrongnesses, you use a loss function that can numerically encapsulate just exactly what it is you want the network to learn, and then you take bigger or smaller steps in the right direction depending on how bad the result was.
Let that sink in for a second. You need a mathematical equation that summarizes what you want the network to learn. (But not how you want it to learn it. That’s the revolutionary quality of applied neural networks.)
Now imagine, as happened to Google, your algorithm fits “gorilla” to the image of a black person. That’s wrong, but it’s categorically differently wrong from simply fitting “airplane” to the same person. How do you write the loss function that incorporates some penalty for racially offensive results? Ideally, you would want them to never happen, so you could imagine trying to identify all possible insults and assigning those outcomes an infinitely large loss. Which is essentially what Google did — their “workaround” was to stop classifying “gorilla” entirely because the loss incurred by misclassifying a person as a gorilla was so large.
This is a fundamental problem with neural networks — they’re only as good as the data and the loss function. These days, the data has become less of a problem, but getting the loss right is a multi-level game, as these neural network trainwrecks demonstrate. And it’s not as easy as writing an equation that isn’t “racist”, whatever that would mean. The loss function is being asked to encapsulate human sensitivities, navigate around them and quantify them, and eventually weigh the slight risk of making a particularly offensive misclassification against not recognizing certain animals at all.
I’m not sure this problem is solvable, even with tremendously large datasets. (There are mathematical proofs that with infinitely large datasets the model will classify everything correctly, so you needn’t worry. But how close are we to infinity? Are asymptotic proofs relevant?)
Anyway, this problem is bigger than algorithms, or even their writers, being “racist”. It may be a fundamental problem of machine learning, and we’re definitely going to see further permutations of the Twitter fiasco in the future as machine classification is being increasingly asked to respect human dignity.
Home automation is a fine goal but typically remains confined to lights, blinds, and other things that are relatively stationary and/or electrical in nature. There is a challenge there to be certain, but to really step up your home automation game you’ll need to think outside the box. This automated garbage can that can take itself out, for example, has all the home automation street cred you’d ever need.
The garbage can moves itself by means of a scooter wheel which has a hub motor inside and is powered by a lithium battery, but the real genius of this project is the electronics controlling everything. A Raspberry Pi Zero W is at the center of the build which controls the motor via a driver board and also receives instructions on when to wheel the garbage can out to the curb from an Nvidia Jetson board. That board is needed because the creator, [Ahad Cove], didn’t want to be bothered to tell his garbage can to take itself out or even schedule it. He instead used machine learning to detect when the garbage truck was headed down the street and instruct the garbage can to roll itself out then.
The only other thing to tie this build together was to get the garage door to open automatically for the garbage can. Luckily, [Ahad]’s garage door opener was already equipped with WiFi and had an available app, unbeknownst to him, which made this a surprisingly easy part of the build. If you have a more rudimentary garage door opener, though, there are plenty of options available to get it on the internets.
There was a time when you had to get up from the couch to change the channel on your TV. But then came the remote control, which saved us from having to move our legs. Later still we got electronic assistants from the likes of Amazon and Google which allowed us to command our home electronics with nothing more than our voice, so now we don’t even have to pick up the remote. Ushering in the next era of consumer gelification, [Nick Bild] has created ShAIdes: a pair of AI-enabled glasses that allow you to control devices by looking at them.
Of course on a more serious note, vision-based home automation could be a hugely beneficial assistive technology for those with limited mobility. By simply looking at the device you want to control and waving in its direction, the system knows which appliance to activate. In the video after the break, you can see [Nick] control lamps and his speakers with such ease that it almost looks like magic; a defining trait of any sufficiently advanced technology.
So how does it work? A Raspberry Pi camera module mounted to a pair of sunglasses captures video which is sent down to a NVIDIA Jetson Nano. Here, two separate image classification Convolutional Neural Network (CNN) models are being used to identify objects which can be controlled in the background, and hand gestures in the foreground. When there’s a match for both, the system can fire off the appropriate signal to turn the device on or off. Between the Nano, the camera, and the battery pack to make it all mobile, [Nick] says the hardware cost about $150 to put together.
But really, the hardware is only one small piece of the puzzle in a project like this. Which is why we’re happy to see [Nick] go into such detail about how the software functions, and crucially, how he trained the system. Just the gesture recognition subroutine alone went through nearly 20K images so it could reliably detect an arm extended into the frame.
If controlling your home with a glance and wave isn’t quite mystical enough, you could always add an infrared wand to the mix for that authentic Harry Potter experience.
Adversarial attacks are not something new to the world of Deep Networks used for image recognition. However, as the research with Deep Learning grows, more flaws are uncovered. The team at the University of KU Leuven in Belgium have demonstrated how, by simple using a colored photo held near the torso of a man can render him invisible to image recognition systems based on convolutional neural networks.
Convolutional Neural Networks or CNNs are a class of Deep learning networks that reduces the number of computations to be performed by creating hierarchical patterns from simpler and smaller networks. They are becoming the norm for image recognition applications and are being used in the field. In this new paper, the addition of color patches is seen to confuse the image detector YoLo(v2) by adding noise that disrupts the calculations of the CNN. The patch is not random and can be identified using the process defined in the publication.
This attack can be implemented by printing the disruptive pattern on a t-shirt making them invisible to surveillance system detection. You can read the paper[PDF] that outlines the generation of the adversarial patch. Image recognition camouflage that works on Google’s Inception has been documented in the past and we hope to see more such hacks in the future. Its a new world out there where you hacking is colorful as ever.