There are about one million known species of insects – more than for any other group of living organisms. If you need to determine which species an insect belongs to, things get complicated quick. In fact, for distinguishing between certain kinds of species, you might need a well-trained expert in that species, and experts’ time is often better spent on something else. This is where CNNs (convolutional neural networks) come in nowadays, and this paper describes a CNN doing just as well if not better than human experts.
Telecommuters: tired of the constant embarrassment of showing up to video conferences wearing nothing but your underwear? Save the humiliation and all those pesky trips down to HR with Safe Meeting, the new system that uses the power of artificial intelligence to turn off your camera if you forget that casual Friday isn’t supposed to be that casual.
The following infomercial is brought to you by [Nick Bild], who says the whole thing is tongue-in-cheek but we sense a certain degree of “necessity is the mother of invention” here. It’s true that the sudden throng of remote-work newbies certainly increases the chance of videoconference mishaps and the resulting mortification, so whatever the impetus, Safe Meeting seems like a great idea. It uses a Pi cam connected to a Jetson Nano to capture images of you during videoconferences, which are conducted over another camera. The stream is classified by a convolutional neural net (CNN) that determines whether it can see your underwear. If it can, it makes a REST API call to the conferencing app to turn off the camera. The video below shows it in action, and that it douses the camera quickly enough to spare your modesty.
We shudder to think about how [Nick] developed an underwear-specific training set, but we applaud him for doing so and coming up with a neat application for machine learning. He’s been doing some fun work in this space lately, from monitoring where surfaces have been touched to a 6502-based gesture recognition system.
Pitching a baseball is about accuracy and speed. A swift ball on target is the goal, allowing the pitcher to strike out the batter. [Nick Bild] created an AI system that can determine a ball’s trajectory in mid-flight, based on a camera feed.
The system uses an NVIDIA Jetson AGX Xavier, fitted with a USB camera running at 100FPS. A Nerf tennis ball launcher is used to fire a ball towards the batter. Once triggered, the AI uses the camera to capture two successive images of the ball in flight. These images are fed into a convolutional neural network (CNN), and the software determines whether the ball is heading for the strike zone, or moving off-target. It uses this information to light a green or red LED respectively to alert the batter.
While such a system is unlikely to appear in professional baseball anytime soon, it shows the sheer capability of neural network systems to quickly and effectively analyse data in ways simply impossible for mere humans. [Nick]’s future goals involve running the system on faster hardware, and expanding it to determine effects like spin and more accurate positioning within the strike zone.
The world was never black and white – we simply lacked the technology to capture it in full color. Many have experimented with techniques to take black and white images, and colorize them. [Adrian Rosebrock] decided to put an AI on the job, with impressive results.
The method involves training a Convolutional Neural Network (CNN) on a large batch of photos, which have been converted to the Lab colorspace. In this colorspace, images are made up of 3 channels – lightness, a (red-green), and b (blue-yellow). This colorspace is used as it better corresponds to the nature of the human visual system than RGB. The model is then trained such that when given a lightness channel as an input, it can predict the likely a & b channels. These can then be recombined into a colorized image, and converted back to RGB for human consumption.
It’s a technique capable of doing a decent job on a wide variety of material. Things such as grass, countryside, and ocean are particularly well dealt with, however more complicated scenes can suffer from some aberration. Regardless, it’s a useful technique, and far less tedious than manual methods.
Suppose you ran a website releasing many articles per day about various topics, all following a general theme. And suppose that your website allowed for a comments section for discussion on those topics. Unless you are brand new to the Internet, you’ll also imagine that the comments section needs at least a little bit of moderation to filter out spam, off topic, or even toxic comments. If you don’t want to employ any people for this task, you could try this machine learning algorithm instead.
[Ladvien] goes through a general overview of how to set up a convolutional neural network (CNN) which can be programmed to do many things, but this one crawls a web page, gathers data, and also makes decisions regarding that data. In this case, the task is to identify toxic comments but the goal is not to achieve the sharpest sword in the comment moderator’s armory, but to learn more about how CNNs work.
Written in Python, the process outlines the code itself and how it behaves, setting up a small server to host the neural network, and finally creating the webservice. As with any machine learning, you need a reliable dataset to use for training and this one came from Wikipedia comments previously flagged by humans. Trolling nuance is thrown aside, as the example homes in on blatant insults and vulgarity.
While [Ladvien] notes that his guide isn’t meant to be comprehensive, but rather to fill in some gaps that he noticed within other guides like this, we find this to be an interesting read. He also mentioned that, in theory, this tool could be used to predict the number of comments following an article like this very one based on the language in the article. We’ll leave that one as an academic exercise for now, probably.
There was a time when you had to get up from the couch to change the channel on your TV. But then came the remote control, which saved us from having to move our legs. Later still we got electronic assistants from the likes of Amazon and Google which allowed us to command our home electronics with nothing more than our voice, so now we don’t even have to pick up the remote. Ushering in the next era of consumer gelification, [Nick Bild] has created ShAIdes: a pair of AI-enabled glasses that allow you to control devices by looking at them.
Of course on a more serious note, vision-based home automation could be a hugely beneficial assistive technology for those with limited mobility. By simply looking at the device you want to control and waving in its direction, the system knows which appliance to activate. In the video after the break, you can see [Nick] control lamps and his speakers with such ease that it almost looks like magic; a defining trait of any sufficiently advanced technology.
So how does it work? A Raspberry Pi camera module mounted to a pair of sunglasses captures video which is sent down to a NVIDIA Jetson Nano. Here, two separate image classification Convolutional Neural Network (CNN) models are being used to identify objects which can be controlled in the background, and hand gestures in the foreground. When there’s a match for both, the system can fire off the appropriate signal to turn the device on or off. Between the Nano, the camera, and the battery pack to make it all mobile, [Nick] says the hardware cost about $150 to put together.
But really, the hardware is only one small piece of the puzzle in a project like this. Which is why we’re happy to see [Nick] go into such detail about how the software functions, and crucially, how he trained the system. Just the gesture recognition subroutine alone went through nearly 20K images so it could reliably detect an arm extended into the frame.
If controlling your home with a glance and wave isn’t quite mystical enough, you could always add an infrared wand to the mix for that authentic Harry Potter experience.
DOOM will forever be remembered as one of the founding games of the entire FPS genre. It also stands as a game which has long been a fertile ground for hackers and modders. [Nick Bild] decided to bring gesture control to iD’s classic shooter, courtesy of machine learning.
The setup consists of a Jetson Nano fitted with a camera, which films the player and uses a convolutional neural network to recognise the player’s various gestures. Once recognised, an API request is sent to a laptop playing Doom which simulates the relevant keystrokes. The laptop is hooked up to a projector, creating a large screen which allows the wildly gesturing player to more easily follow the action.
The neural network was trained on 3300 images – 300 per gesture. [Nick] found that using a larger data set actually performed less well, as he became less diligent in reliably performing the gestures. This demonstrates that quality matters in training networks, as well as quantity.
Reports are that the network is fairly reliable, and it appears to work quite well. Unfortunately, playability is limited as it’s not possible to gesture for more than one key at once. Overall though, it serves as a tidy example of how to do gesture recognition with CNNs.
If you’re not convinced by this demonstration, you might be interested to learn that neural networks can also be used to name tomatoes. If you don’t want to roll your own pose detection, check out this selfie drone that uses CMU’s OpenPose library. Video after the break.