Suppose you ran a website releasing many articles per day about various topics, all following a general theme. And suppose that your website allowed for a comments section for discussion on those topics. Unless you are brand new to the Internet, you’ll also imagine that the comments section needs at least a little bit of moderation to filter out spam, off topic, or even toxic comments. If you don’t want to employ any people for this task, you could try this machine learning algorithm instead.
[Ladvien] goes through a general overview of how to set up a convolutional neural network (CNN) which can be programmed to do many things, but this one crawls a web page, gathers data, and also makes decisions regarding that data. In this case, the task is to identify toxic comments but the goal is not to achieve the sharpest sword in the comment moderator’s armory, but to learn more about how CNNs work.
Written in Python, the process outlines the code itself and how it behaves, setting up a small server to host the neural network, and finally creating the webservice. As with any machine learning, you need a reliable dataset to use for training and this one came from Wikipedia comments previously flagged by humans. Trolling nuance is thrown aside, as the example homes in on blatant insults and vulgarity.
While [Ladvien] notes that his guide isn’t meant to be comprehensive, but rather to fill in some gaps that he noticed within other guides like this, we find this to be an interesting read. He also mentioned that, in theory, this tool could be used to predict the number of comments following an article like this very one based on the language in the article. We’ll leave that one as an academic exercise for now, probably.
There was a time when you had to get up from the couch to change the channel on your TV. But then came the remote control, which saved us from having to move our legs. Later still we got electronic assistants from the likes of Amazon and Google which allowed us to command our home electronics with nothing more than our voice, so now we don’t even have to pick up the remote. Ushering in the next era of consumer gelification, [Nick Bild] has created ShAIdes: a pair of AI-enabled glasses that allow you to control devices by looking at them.
Of course on a more serious note, vision-based home automation could be a hugely beneficial assistive technology for those with limited mobility. By simply looking at the device you want to control and waving in its direction, the system knows which appliance to activate. In the video after the break, you can see [Nick] control lamps and his speakers with such ease that it almost looks like magic; a defining trait of any sufficiently advanced technology.
So how does it work? A Raspberry Pi camera module mounted to a pair of sunglasses captures video which is sent down to a NVIDIA Jetson Nano. Here, two separate image classification Convolutional Neural Network (CNN) models are being used to identify objects which can be controlled in the background, and hand gestures in the foreground. When there’s a match for both, the system can fire off the appropriate signal to turn the device on or off. Between the Nano, the camera, and the battery pack to make it all mobile, [Nick] says the hardware cost about $150 to put together.
But really, the hardware is only one small piece of the puzzle in a project like this. Which is why we’re happy to see [Nick] go into such detail about how the software functions, and crucially, how he trained the system. Just the gesture recognition subroutine alone went through nearly 20K images so it could reliably detect an arm extended into the frame.
If controlling your home with a glance and wave isn’t quite mystical enough, you could always add an infrared wand to the mix for that authentic Harry Potter experience.
Continue reading “Home Automation At A Glance Using AI Glasses”
DOOM will forever be remembered as one of the founding games of the entire FPS genre. It also stands as a game which has long been a fertile ground for hackers and modders. [Nick Bild] decided to bring gesture control to iD’s classic shooter, courtesy of machine learning.
The setup consists of a Jetson Nano fitted with a camera, which films the player and uses a convolutional neural network to recognise the player’s various gestures. Once recognised, an API request is sent to a laptop playing Doom which simulates the relevant keystrokes. The laptop is hooked up to a projector, creating a large screen which allows the wildly gesturing player to more easily follow the action.
The neural network was trained on 3300 images – 300 per gesture. [Nick] found that using a larger data set actually performed less well, as he became less diligent in reliably performing the gestures. This demonstrates that quality matters in training networks, as well as quantity.
Reports are that the network is fairly reliable, and it appears to work quite well. Unfortunately, playability is limited as it’s not possible to gesture for more than one key at once. Overall though, it serves as a tidy example of how to do gesture recognition with CNNs.
If you’re not convinced by this demonstration, you might be interested to learn that neural networks can also be used to name tomatoes. If you don’t want to roll your own pose detection, check out this selfie drone that uses CMU’s OpenPose library. Video after the break.
Continue reading “Gesture Controlled Doom”
Adversarial attacks are not something new to the world of Deep Networks used for image recognition. However, as the research with Deep Learning grows, more flaws are uncovered. The team at the University of KU Leuven in Belgium have demonstrated how, by simple using a colored photo held near the torso of a man can render him invisible to image recognition systems based on convolutional neural networks.
Convolutional Neural Networks or CNNs are a class of Deep learning networks that reduces the number of computations to be performed by creating hierarchical patterns from simpler and smaller networks. They are becoming the norm for image recognition applications and are being used in the field. In this new paper, the addition of color patches is seen to confuse the image detector YoLo(v2) by adding noise that disrupts the calculations of the CNN. The patch is not random and can be identified using the process defined in the publication.
This attack can be implemented by printing the disruptive pattern on a t-shirt making them invisible to surveillance system detection. You can read the paper[PDF] that outlines the generation of the adversarial patch. Image recognition camouflage that works on Google’s Inception has been documented in the past and we hope to see more such hacks in the future. Its a new world out there where you hacking is colorful as ever.
Continue reading “The Cloak Of Invisibility Against Image Recognition”
We salute hackers who make technology useful for people in emerging markets. Leigh Johnson joined that select group when she accepted the challenge to build portable machine vision units that work offline and can be deployed for under $100 each. For hardware, a Raspberry Pi with camera plus screen can fit under that cost ceiling, and the software to give it sight is the focus of her 2018 Hackaday Superconference presentation. (Video also embedded below.)
The talk is a very concise 13 minutes, so Leigh flies through definitions of basic terms, before quickly naming TensorFlow and Keras as the tools she used. The time she saved here was spent on explaining what convolutional neural networks are and how they work, just enough to prepare the audience. But all of that is really just background, the meat of the talk is self-contained examples that Leigh has put together and made available online. I love to see that since it means you go beyond just watching and try it out for yourself. Continue reading “Leigh Johnson’s Guide To Machine Vision On Raspberry Pi”
If you live in a bustling city and have anyone over who drives, it can be difficult for them to find parking. Maybe you have an assigned space, but they’re resigned to circling the block with an eagle eye. With those friends in mind, [Adam Geitgey] wrote a Python script that takes the video feed from a web cam and analyzes it frame by frame to figure out when a street parking space opens up. When the glorious moment arrives, he gets a text message via Twilio with a picture of the void.
It sounds complicated, but much of the work has already been done. Cars are a popular target for machine learning, so large data sets with cars already exist. [Adam] didn’t have to train a neural network, either–he found a pre-trained Mask R-CNN model with data for 80 common objects like people, animals, and cars.
The model gives a lot of useful info, including a bounding box for each car with pixel coordinates. Since the boxes overlap, there needs be a way to determine whether there’s really a car in the space, or just the bumpers of other cars. [Adam] used intersection over union to do this, which is conveniently available as a function of the Mask R-CNN model’s library. The function returns a score, so it was just a matter of ignoring low-scoring bounding boxes.
[Adam] purposely made the script adaptable. A few changes here and there, and you could be picking up tennis balls with a robotic collector or analyzing human migration patterns on your block in no time. Or change it up and detect all the cars that run the stop sign by your house.
Thanks for the tip, [foamyguy].