It’s hard to read the headlines today without feeling like the world couldn’t possibly get much worse. And then tomorrow rolls around, and a fresh set of headlines puts the lie to that thought. On a macro level, there’s not much that you can do about that, but on a personal level, illustrating your news feed with mostly wrong, AI-generated images might take the edge off things a little.
Let us explain. [Roy van der Veen] liked the idea of an e-paper display newsfeed, but the crushing weight of the headlines was a little too much to bear. To lighten things up, he decided to employ Stable Diffusion to illustrate his feed, displaying both the headline and a generated image on a 7.3″ Inky 7-color e-paper display. Every five hours, a script running on a Raspberry Pi Zero 2W fetches a headline from a random source — we’re pleased the list includes Hackaday — and composes a prompt for Stable Diffusion based on the headline, adding on a randomly selected prefix and suffix to spice things up. For example, a prompt might look like, “Gothic painting of (Driving a Motor with an Audio Amp Chip). Gloomy, dramatic, stunning, dreamy.” You can imagine the results.
We have to say, from the examples [Roy] shows, the idea pretty much works — sometimes the images are so far off the mark that just figuring out how Stable Diffusion came up with them is enough to soften the blow. We’d have preferred if the news of the floods in Libya had been buffered by a slightly less dismal scene, but finding out that what was thought to be a “ritual mass murder” was really only a yoga class was certainly heartening.
Real-time object detection, which uses neural networks and deep learning to rapidly identify and tag objects of interest in a video feed, is a handy feature with great hacker potential. Happily, it’s also possible to make customized CNNs (convolutional neural networks) tailored for one’s own needs, and that process just got easier thanks to some new documentation for the Vizy “AI camera” by Charmed Labs.
Raspberry Pi-based Vizy camera
Charmed Labs has been making hacker-friendly machine vision devices for a long time, and the Vizy camera impressed us mightily when we checked it out last year. Out of the box, Vizy has a perfectly functional object detector application that runs locally on the device, and can detect and tag many common everyday objects in real time. But what if that default application doesn’t quite meet one’s project needs? Good news, because it’s possible to create a custom-trained CNN, and that process got a lot more accessible thanks to step-by-step examples of training a model to recognize hands doing rock-paper-scissors.
Default object detection works well, but sometimes one needs custom results.
The basic process is this: Start with a variety of images that show the item of interest. Then identify and label the item of interest in each photo. These photos (a “training set”) are then sent to Google Colab, which will be used to generate a neural network. The resulting CNN model can then be downloaded and used, to see how well it performs.
Of course things rarely work perfectly the first time around, so at this point it’s pretty common for some refinement to be needed to increase accuracy. Luckily there are a number of tools to help do this without creating a new model from scratch, so it’s just a matter of tweaking until things perform acceptably.
Google Colab is free and the resulting CNNs are implemented in the TensorFlow Lite framework, meaning it’s possible to use them elsewhere. So if custom object detection has been holding up a project idea of yours, this might be what gets you over that hump.
Ugly sweater season is rapidly approaching, at least here in the Northern Hemisphere. We’ve always been a bit baffled by the tradition of paying top dollar for a loud, obnoxious sweater that gets worn to exactly one social event a year. We don’t judge, of course, but that’s not to say we wouldn’t look a little more favorably on someone’s fashion choice if it were more like this AI-defeating adversarial ugly sweater.
The idea behind this research from the University of Maryland is not, of course, to inform fashion trends, nor is it to create a practical invisibility cloak. It’s really to probe machine learning systems for vulnerabilities by making small changes to the input while watching for changes in the output. In this case, the ML system was a YOLO-based vision system which has little trouble finding humans in an arbitrary image. The adversarial pattern was generated by using a large set of training images, some of which contain the objects of interest — in this case, humans. Each time a human is detected, a random pattern is rendered over the image, and the data is reassessed to see how much the pattern lowers the object’s score. The adversarial pattern eventually improves to the point where it mostly prevents humans from being recognized. Much more detail is available in the research paper (PDF) if you want to dig into the guts of this.
The pattern, which looks a little like a bad impressionist painting of people buying pumpkins at a market and bears some resemblance to one we’ve seen before in similar work, is said to work better from different viewing angles. It also makes a spiffy pullover, especially if you’d rather blend in at that Christmas party.
There are about one million known species of insects – more than for any other group of living organisms. If you need to determine which species an insect belongs to, things get complicated quick. In fact, for distinguishing between certain kinds of species, you might need a well-trained expert in that species, and experts’ time is often better spent on something else. This is where CNNs (convolutional neural networks) come in nowadays, and this paper describes a CNN doing just as well if not better than human experts.
We all seem to intuitively know that a lot of what we do online is not great for our mental health. Hang out on enough social media platforms and you can practically feel the changes your mind inflicts on your body as a result of what you see — the racing heart, the tight facial expression, the clenched fists raised in seething rage. Not on Hackaday, of course — nothing but sweetness and light here.
That’s all highly subjective, of course. If you’d like to quantify your online misery more objectively, take a look at the aptly named BrowZen, a machine learning application by [Nick Bild]. Built around an NVIDIA Jetson Xavier NX and a web camera, BrowZen captures images of the user’s face periodically. The expression on the user’s face is classified using a facial recognition model that has been trained to recognize facial postures related to emotions like anger, surprise, fear, and happiness. The app captures your mood and which website you’re currently looking at and stores the results in a database. Handy charts let you know which sites are best for your state of mind; it’s not much of a surprise that Twitter induces rage while Hackaday pushes [Nick]’s happiness button. See? Sweetness and light.
Telecommuters: tired of the constant embarrassment of showing up to video conferences wearing nothing but your underwear? Save the humiliation and all those pesky trips down to HR with Safe Meeting, the new system that uses the power of artificial intelligence to turn off your camera if you forget that casual Friday isn’t supposed to be that casual.
The following infomercial is brought to you by [Nick Bild], who says the whole thing is tongue-in-cheek but we sense a certain degree of “necessity is the mother of invention” here. It’s true that the sudden throng of remote-work newbies certainly increases the chance of videoconference mishaps and the resulting mortification, so whatever the impetus, Safe Meeting seems like a great idea. It uses a Pi cam connected to a Jetson Nano to capture images of you during videoconferences, which are conducted over another camera. The stream is classified by a convolutional neural net (CNN) that determines whether it can see your underwear. If it can, it makes a REST API call to the conferencing app to turn off the camera. The video below shows it in action, and that it douses the camera quickly enough to spare your modesty.
Ever since we first saw the nightmarish artwork produced by Google DeepDream and the ridiculous faux paintings produced from neural style transfer, we’ve been aware of the ways machine learning can be applied to visual art. With commercially available trained models and automated pipelines for generating images from relatively small training sets, it’s now possible for developers without theoretical knowledge of machine learning to easily generate images, provided they have sufficient access to GPUs. Filmmaker [Kira Bursky] took this a step further, creating a surreal short film that features characters and textures produced from image sets.
She began with about 150 photos of her face, 200 photos of film locations, 4600 photos of past film productions, and 100 drawings as the main datasets.
via [Kira Bursky]Using GAN models for nebulas, faces, and skyscrapers in RunwayML, she found the results from training her face set disintegrated, realistic, and painterly. Many of the images continue to evoke aspects of her original face with distortions, although whether that is the model identifying a feature common to skyscrapers and faces or our own bias towards facial recognition is up to the viewer.
On the other hand, the results of training the film set photos on models of faces and bedrooms produced abstract textures and “surreal and eerie faces like a fever dream”. Perhaps, unlike the familiar anchors of facial features, it’s the lack of recognizable characteristics in the transformed images that gives them such a surreal feel.
[Kira] certainly uses these results to her advantage, brainstorming a concept for a short film that revolves around her main character experiencing nightmares. Although her objective was to use her results to convey a series of emotionally striking scenes, the models she uses to produce these scenes are also quite interesting.
She started off by using the MiDaS model, created by a team of researchers from ETH Zurich and Intel, for generating monocular depth maps. The results associated levels inside of an image with their appropriate depth in relation to one another. She also used the MASK R-CNN for masking out the backgrounds in generated faces and combined her generated images in Photoshop to create the main character for her short film.
via [Vox]In order to simulate the character walking, she used the Liquid Warping GAN, a framework for human motion imitation and appearance transfer, created by a team from ShanghaiTech University and Tencent AI Lab. This allowed her to take her original images and synthesize results from reference poses of herself going through the motions of walking by using a 3D body mesh recovery module. Later on, she applied similar techniques for motion tracking on her faces, running them through the First Order Motion Model to simulate different emotions. She went on to join her facial movements with her character using After Effects.
Bringing the results together, she animated a 3D camera blur using the depth map videos to create a less disorienting result by providing anchor points for the viewers and creating a displacement map to heighten the sense of depth and movement within the scenes. In After Effects, she also overlaid dust and film grain effects to give the final result a crisper look. The result is a surprisingly cinematic film entirely made of images and videos generated from machine learning models. With the help of the depth adjustments, it almost looks like something that you might see in a nightmare.