We all seem to intuitively know that a lot of what we do online is not great for our mental health. Hang out on enough social media platforms and you can practically feel the changes your mind inflicts on your body as a result of what you see — the racing heart, the tight facial expression, the clenched fists raised in seething rage. Not on Hackaday, of course — nothing but sweetness and light here.
That’s all highly subjective, of course. If you’d like to quantify your online misery more objectively, take a look at the aptly named BrowZen, a machine learning application by [Nick Bild]. Built around an NVIDIA Jetson Xavier NX and a web camera, BrowZen captures images of the user’s face periodically. The expression on the user’s face is classified using a facial recognition model that has been trained to recognize facial postures related to emotions like anger, surprise, fear, and happiness. The app captures your mood and which website you’re currently looking at and stores the results in a database. Handy charts let you know which sites are best for your state of mind; it’s not much of a surprise that Twitter induces rage while Hackaday pushes [Nick]’s happiness button. See? Sweetness and light.
Telecommuters: tired of the constant embarrassment of showing up to video conferences wearing nothing but your underwear? Save the humiliation and all those pesky trips down to HR with Safe Meeting, the new system that uses the power of artificial intelligence to turn off your camera if you forget that casual Friday isn’t supposed to be that casual.
The following infomercial is brought to you by [Nick Bild], who says the whole thing is tongue-in-cheek but we sense a certain degree of “necessity is the mother of invention” here. It’s true that the sudden throng of remote-work newbies certainly increases the chance of videoconference mishaps and the resulting mortification, so whatever the impetus, Safe Meeting seems like a great idea. It uses a Pi cam connected to a Jetson Nano to capture images of you during videoconferences, which are conducted over another camera. The stream is classified by a convolutional neural net (CNN) that determines whether it can see your underwear. If it can, it makes a REST API call to the conferencing app to turn off the camera. The video below shows it in action, and that it douses the camera quickly enough to spare your modesty.
Ever since we first saw the nightmarish artwork produced by Google DeepDream and the ridiculous faux paintings produced from neural style transfer, we’ve been aware of the ways machine learning can be applied to visual art. With commercially available trained models and automated pipelines for generating images from relatively small training sets, it’s now possible for developers without theoretical knowledge of machine learning to easily generate images, provided they have sufficient access to GPUs. Filmmaker [Kira Bursky] took this a step further, creating a surreal short film that features characters and textures produced from image sets.
She began with about 150 photos of her face, 200 photos of film locations, 4600 photos of past film productions, and 100 drawings as the main datasets.
Using GAN models for nebulas, faces, and skyscrapers in RunwayML, she found the results from training her face set disintegrated, realistic, and painterly. Many of the images continue to evoke aspects of her original face with distortions, although whether that is the model identifying a feature common to skyscrapers and faces or our own bias towards facial recognition is up to the viewer.
On the other hand, the results of training the film set photos on models of faces and bedrooms produced abstract textures and “surreal and eerie faces like a fever dream”. Perhaps, unlike the familiar anchors of facial features, it’s the lack of recognizable characteristics in the transformed images that gives them such a surreal feel.
[Kira] certainly uses these results to her advantage, brainstorming a concept for a short film that revolves around her main character experiencing nightmares. Although her objective was to use her results to convey a series of emotionally striking scenes, the models she uses to produce these scenes are also quite interesting.
She started off by using the MiDaS model, created by a team of researchers from ETH Zurich and Intel, for generating monocular depth maps. The results associated levels inside of an image with their appropriate depth in relation to one another. She also used the MASK R-CNN for masking out the backgrounds in generated faces and combined her generated images in Photoshop to create the main character for her short film.
In order to simulate the character walking, she used the Liquid Warping GAN, a framework for human motion imitation and appearance transfer, created by a team from ShanghaiTech University and Tencent AI Lab. This allowed her to take her original images and synthesize results from reference poses of herself going through the motions of walking by using a 3D body mesh recovery module. Later on, she applied similar techniques for motion tracking on her faces, running them through the First Order Motion Model to simulate different emotions. She went on to join her facial movements with her character using After Effects.
Bringing the results together, she animated a 3D camera blur using the depth map videos to create a less disorienting result by providing anchor points for the viewers and creating a displacement map to heighten the sense of depth and movement within the scenes. In After Effects, she also overlaid dust and film grain effects to give the final result a crisper look. The result is a surprisingly cinematic film entirely made of images and videos generated from machine learning models. With the help of the depth adjustments, it almost looks like something that you might see in a nightmare.
Conventional wisdom holds that the best way to learn a new language is immersion: just throw someone into a situation where they have no choice, and they’ll learn by context. Militaries use immersion language instruction, as do diplomats and journalists, and apparently computers can now use it to teach themselves Morse code.
The blog entry by the delightfully callsigned [Mauri Niininen (AG1LE)] reads like a scientific paper, with good reason: [Mauri] really seems to know a thing or two about machine learning. His method uses curated training data to build a model, namely Morse snippets and their translations, as is the usual approach with such systems. But things take an unexpected turn right from the start, as [Mauri] uses a Tensorflow handwriting recognition implementation to train his model.
Using a few lines of Python, he converts short, known snippets of Morse to a grayscale image that looks a little like a barcode, with the light areas being the dits and dahs and the dark bars being silence. The first training run only resulted in about 36% accuracy, but a subsequent run with shorter snippets ended up being 99.5% accurate. The model was also able to pull Morse out of a signal with -6 dB signal-to-noise ratio, even though it had been trained with a much cleaner signal.
Other Morse decoders use lookup tables to convert sound to text, but it’s important to note that this one doesn’t. By comparing patterns to labels in the training data, it inferred what the characters mean, and essentially taught itself Morse code in about an hour. We find that fascinating, and wonder what other applications this would be good for.
In our opinion, the primary evidence of a properly lived childhood is an enormous box of every conceivable Lego piece, from simple bricks to girders and gears, all with a small town’s worth of minifigs swimming through it. It takes years of birthdays and Christmases to accumulate a Lego collection best measured by the pound, but like anything worth doing, it’s worth overdoing.
But what to do with such a collection? Digging through it to find Just the Right Piece™ can be frustrating, and bringing order to the chaos with manual sorting is just so impractical. How about putting some of those bricks to work with a machine-vision Lego sorter built from Lego?
[Daniel West]’s approach is hardly new – we’ve even featured brick-built Lego sorters before – but we’re impressed by its architecture. First, the mechanical system is amazing. It uses a series of conveyors to transport bricks from a hopper, winnowing the stream down as it goes. The final step is a vibratory feeder that places one piece on a conveyor at a time. Those pass under a camera attached to a Raspberry Pi, where OpenCV does background subtraction from the video stream, applies bounding boxes to the parts, and runs the images through a convolutional neural network (CNN) that’s been trained on a database of every Lego part. Servo-controlled gates then direct the parts into one of 18 bins. See it in action in the video below.
We must admit that we’re not sure what the sorting criteria are, as some bins seem nearly as chaotic as the input mix. Still, we appreciate the fine engineering, and award extra style points for all the Lego goodness.
We’ve gotten to the point where a $35 Raspberry Pi can be a reasonable alternative to a traditional desktop or laptop, and microcontrollers in the Arduino ecosystem are getting powerful enough to handle some remarkably demanding computational jobs. But there’s still one area where microcontrollers seem to be lagging a bit: machine learning. Sure, there are purpose-built edge-computing SBCs, but wouldn’t it be great to be able to run AI models on versatile and ubiquitous MCUs that you can pick up for a couple of bucks?
We’re moving in that direction, and our friends at Adafruit Industries want to stop by the Hack Chat and tell us all about what they’re working on. In addition to Ladyada and PT, we’ll be joined by Meghna Natraj, Daniel Situnayake, and Pete Warden, all from the Google TensorFlow team. If you’ve got any interest in edge computing on small form-factor computers, you won’t want to miss this chat. Join us, ask your questions about TensorFlow Lite and TensorFlow Lite for Microcontrollers, and see what’s possible in machine learning way out on the edge.
Click that speech bubble to the right, and you’ll be taken directly to the Hack Chat group on Hackaday.io. You don’t have to wait until Wednesday; join whenever you want and you can see what the community is talking about.
The setup consists of a Jetson Nano fitted with a camera, which films the player and uses a convolutional neural network to recognise the player’s various gestures. Once recognised, an API request is sent to a laptop playing Doom which simulates the relevant keystrokes. The laptop is hooked up to a projector, creating a large screen which allows the wildly gesturing player to more easily follow the action.
The neural network was trained on 3300 images – 300 per gesture. [Nick] found that using a larger data set actually performed less well, as he became less diligent in reliably performing the gestures. This demonstrates that quality matters in training networks, as well as quantity.
Reports are that the network is fairly reliable, and it appears to work quite well. Unfortunately, playability is limited as it’s not possible to gesture for more than one key at once. Overall though, it serves as a tidy example of how to do gesture recognition with CNNs.