How The Image-Generating AI Of Stable Diffusion Works

[Jay Alammar] has put up an illustrated guide to how Stable Diffusion works, and the principles in it are perfectly applicable to understanding how similar systems like OpenAI’s Dall-E or Google’s Imagen work under the hood as well. These systems are probably best known for their amazing ability to turn text prompts (e.g. “paradise cosmic beach”) into a matching image. Sometimes. Well, usually, anyway.

‘System’ is an apt term, because Stable Diffusion (and similar systems) are actually made up of many separate components working together to make the magic happen. [Jay]’s illustrated guide really shines here, because it starts at a very high level with only three components (each with their own neural network) and drills down as needed to explain what’s going on at a deeper level, and how it fits into the whole.

Spot any similar shapes and contours between the image and the noise that preceded it? That’s because the image is a result of removing noise from a random visual mess, not building it up from scratch like a human artist would do.

It may surprise some to discover that the image creation part doesn’t work the way a human does. That is to say, it doesn’t begin with a blank canvas and build an image bit by bit from the ground up. It begins with a seed: a bunch of random noise. Noise gets subtracted in a series of steps that leave the result looking less like noise and more like an aesthetically pleasing and (ideally) coherent image. Combine that with the ability to guide noise removal in a way that favors conforming to a text prompt, and one has the bones of a text-to-image generator. There’s a lot more to it of course, and [Jay] goes into considerable detail for those who are interested.

If you’re unfamiliar with Stable Diffusion or art-creating AI in general, it’s one of those fields that is changing so fast that it sometimes feels impossible to keep up. Luckily, our own Matthew Carlson explains all about what it is, and why it matters.

Stable Diffusion can be run locally. There is a fantastic open-source web UI, so there’s no better time to get up to speed and start experimenting!

Laser Zaps Cockroaches Over One Meter

You may have missed this month’s issue of Oriental Insects, in which a project by [Ildar Rakhmatulin] Heriot-Watt University in Edinburgh caught our attention. [Ildar] led a team of researchers in the development of an AI-controlled laser that neutralizes moving cockroaches at distances of up to 1.2 meters. Noting the various problems using chemical pesticides for pest control, his team sought out a non-conventional approach.

The heart of the pest controller is a Jetson Nano, which uses OpenCV and Yolo object detection to find the cockroaches and galvanometers to steer the laser beam. Three different lasers were used for testing, allowing the team to evaluate a range of wavelengths, power levels, and spot sizes. Unsurprisingly, the higher power 1.6 W laser was most efficient and quicker.

The project is on GitHub (here) and the cockroach machine learning image set is available here. But [Ildar] points out in the conclusion of the report, this is dangerous. It’s suitable for academic research, but it’s not quite ready for general use, lacking any safety features. This report is full of cockroach trivia, such as the average speed of a cockroach is 4.8 km/h, and they run much faster when being zapped. If you want to experiment with cockroaches yourself, a link is provided to a pet store that sells the German Blattela germanica that was the target of this report.

If this project sounds familiar, it is because it is an improvement of a previous project we wrote about last year which used similar techniques to zap mosquitoes.

Continue reading “Laser Zaps Cockroaches Over One Meter”

Self-Driving Laboratories Do Research On Autopilot

Scientific research is a messy business. The road to learning new things and making discoveries is paved with hard labor, tough thinking, and plenty of dead ends. It’s a time-consuming, expensive endeavor, and for every success, there are thousands upon thousands of failures.

It’s a process so inefficient, you would think someone would have automated it already. The concept of the self-driving laboratory aims to do exactly that, and could revolutionize materials research in particular.

Continue reading “Self-Driving Laboratories Do Research On Autopilot”

Tesla’s Dojo Is An Interesting CPU Design

What do you get when you cross a modern super-scalar out-of-order CPU core with more traditional microcontroller aspects such as no virtual memory, no memory cache, and no DDR or PCIe controllers? You get the Tesla Dojo, which Chips and Cheese recently did a deep dive on.

It starts with a comparison to the IBM Cell processors. The Cell of the mid-2000s featured something called the SPE (Synergistic Processing Elements). They were smaller cores focused on vector processing or other specialized types of workloads. They didn’t access the main memory and had to be given tasks by the fully featured CPU. Dojo has 1.25MB of SRAM that it can use as working memory with five ports, but it has no cache or virtual memory. It uses DMA to get the information it needs via a mesh system. The front end pulls RISC-V-like (heavily MIPS-inspired) instructions into a small instruction cache and decodes eight instructions per cycle. Continue reading “Tesla’s Dojo Is An Interesting CPU Design”

Truthsayer Uses Facial Recognition To See If You’re Telling The Truth

It’s hard to watch [Mark Zuckerberg]’s 2018 Congressional testimony and not come to the conclusion that he is, at a minimum, quite a bit different than the average person. Of course, having built a multibillion-dollar company that drastically changed everything about the way people communicate is pretty solid evidence of that, but the footage at least made a fun test case for this AI truth-detecting algorithm.

Now, we’re not saying that anyone in these videos was lying, and neither is [Fletcher Heisler]. His algorithm, which analyzes video of a person and uses machine vision to pick up cues that might be associated with the stress of untruthfulness, is far from perfect. But as the first video below shows, it is a lot of fun to see it at work. The idea is to capture data like pulse rate, gaze direction, blink rate, mouth posture, and even hand position and use them as a proxy for lying. The second video, from [Fletcher]’s recent DEFCON talk, has much more detail.

The key to all this is finding human faces in a video — a task that seemed to fail suspiciously frequently when [Zuck] was on camera — using OpenCV and MediaPipe’s Face Mesh. The subject’s pulse is detected by watching for subtle changes in the color of a subject’s cheeks as blood flows through them, which we’ve heard about plenty of times but never before seen presented so clearly and executed so simply. Gaze direction, blinking, and lip compression are fairly easy to detect too. [Fletcher] also threw in the FER library for facial expression recognition, to get an idea of the subject’s mood. Together, these cues form a rough estimate of the subject’s truthiness, which [Fletcher] is quick to point out is just for entertainment purposes and totally shouldn’t be used on your colleagues on the next Zoom call.

Does [Fletcher]’s facial mesh look familiar? It should, since we once watched him twitch his way through a coding interview.

Continue reading “Truthsayer Uses Facial Recognition To See If You’re Telling The Truth”

Blog Title Optimizer Uses AI, But How Well Does It Work?

[Max Woolf] sometimes struggles to create ideal headlines for his blog posts, and decided to apply his experience with machine learning to the problem. He asked: could an AI be trained to optimize his blog titles? It is a fascinating application of natural language processing, and [Max] explains all about what it does and how it works.

The machine learning framework [Max] uses is GPT-3, a language model that works with natural-seeming human language that is capable of being tweaked in different ways. [Max] uses OpenAI’s GPT-3 API (which, by the way, is much easier to experiment with than one might think) and here is the basic workflow for his title optimizer:

  1. The optimizer takes as input a blog post title to optimize.
  2. OpenAI’s pre-trained GPT-3 engine is used to generate six alternate titles.
  3. For each of those alternate titles, a fine-tuned version of GPT-3 is consulted to judge how “good” they are based on custom training data. (“Good” in this context means “similar to titles of successful submissions on Hacker News“, but more on that in a moment.)
  4. Print the results.

Continue reading “Blog Title Optimizer Uses AI, But How Well Does It Work?”

Machine Learning Baby Monitor Prevents The Hunger Games

Newborn babies can be tricky to figure out, especially for first-time parents. Despite the abundance of unsolicited advice proffered by anyone who ever had a baby before — and many who haven’t — most new parents quickly get in sync with the baby’s often ambiguous signals. But [Caleb] took his observations of his newborn a step further and built a machine-learning hungry baby early warning system that’s pretty slick.

Normally, babies are pretty unsubtle about being hungry, and loudly announce their needs to the world. But it turns out that crying is a lagging indicator of hunger, and that there are a host of face, head, and hand cues that precede the wailing. [Caleb] based his system on Google’s MediaPipe library, using his baby monitor’s camera to track such behaviors as lip smacking, pacifier rejection, fist mouthing, and rooting, all signs that someone’s tummy needs filling. By putting together a system to recognize these cues and assign a weight to them, [Caleb] now gets a text before the baby gets to the screaming phase, to the benefit of not only the little nipper but to his sleep-deprived servants as well. The video below has some priceless bits in it; don’t miss [Baby Caleb] at 5:11 or the hilarious automatic feeder gag at the end.

We’ve seen some interesting videos from [Caleb] recently, mostly having to do with his dog’s bathroom habits and getting help cleaning up afterward. We can only guess how those projects will be leveraged when this kid gets a little older and starts potty training.

Continue reading “Machine Learning Baby Monitor Prevents The Hunger Games”