Machine Learning Makes Sure Your LOLs Are Genuine

There was a time not too long ago when “LOL” actually meant something online. If someone went through the trouble of putting LOL into an email or text, you could be sure they were actually LOL-ing while they were typing — it was part of the social compact that made the Internet such a wholesome and inviting place. But no more — LOL has been reduced to a mere punctuation mark, with no guarantee that the sender was actually laughing, chuckling, chortling, or even snickering. What have we become?

To put an end to this madness, [Brian Moore] has come up with the LOL verifier. Like darn near every project we see these days, it uses a machine learning algorithm — EdgeImpulse in this case. It detects a laugh by comparing audio input against an exhaustive model of [Brian]’s jocular outbursts — he says it took nearly three full minutes to collect the training set. A Teensy 4.1 takes care of HID duties; if a typed “LOL” correlates to some variety of laugh, the initialism is verified with a time and date stamp. If your LOL was judged insincere – well, that’s on you. See what you think of the short video below — we genuinely LOL’d. And while we’re looking forward to a ROTFL verifier, we’re not sure we want to see his take on LMAO.

Hats off to [Brian] for his attempt to enforce some kind of standards online. You may recall his earlier attempt to make leaving Zoom calls a little less awkward, which we also appreciate.

Continue reading “Machine Learning Makes Sure Your LOLs Are Genuine”

Shopping Cart Does The Tedious Work For You

Thanks to modern microcontrollers, basic home automation tasks such as turning lights on and off, opening blinds, and various other simple tasks have become common DIY projects. But with the advent of artificial intelligence and machine learning the amount of tasks that can be offloaded to computers has skyrocketed. This shopping cart that automates away the checkout lines at grocery stores certainly fits into this category.

The project was inspired by the cashierless Amazon stores where customers simply walk into a store, grab what they want, and leave. This is made possible by the fact that computers monitor their purchases and charge them automatically, but creator [kutluhan_aktar] wanted to explore a way of doing this without a fleet of sensors and cameras all over a store. By mounting the hardware to a shopping cart instead, the sensors travel with the shopper and monitor what’s placed in the cart instead of what’s taken from a shelf. It’s built around the OpenMV Cam H7, a microcontroller paired with a camera specifically designed for these types of tasks, and the custom circuitry inside the case also includes WiFi connectivity to make sure the shopping cart can report its findings properly.

[kutluhan_aktar] also built the entire software stack from the ground up and trained the model on a set of common products as a proof-of-concept. The idea was to allow smaller stores to operate more efficiently without needing a full suite of Amazon hardware and software backing it up, and this prototype seems to work pretty well to that end. If you want to develop a machine vision project on your own with more common hardware, take a look at this project which uses the Raspberry Pi instead.

On Getting A Computer’s Attention And Striking Up A Conversation

With the rise in voice-driven virtual assistants over the years, the sight of people talking to various electrical devices in public and in private has become rather commonplace. While such voice-driven interfaces are decidedly useful for a range of situations, they also come with complications. One of these are the trigger phrases or wake words that voice assistants listen to when in standby. Much like in Star Trek, where uttering ‘Computer’ would get the computer’s attention, so do we have our ‘Siri’, ‘Cortana’ and a range of custom trigger phrases that enable the voice interface.

Unlike in Star Trek, however, our virtual assistants do not know when we really desire to interact. Unable to distinguish context, they’ll happily respond to someone on TV mentioning their trigger phrase. This possibly followed by a ludicrous purchase order or other mischief. The realization here is the complexity of voice-based interfaces, while still lacking any sense of self-awareness or intelligence.

Another issue is that the process of voice recognition itself is very resource-intensive, which limits the amount of processing that can be performed on the local device. This usually leads to the voice assistants like Siri, Alexa, Cortana and others processing recorded voices in a data center, with obvious privacy implications.

Continue reading “On Getting A Computer’s Attention And Striking Up A Conversation”

How The Image-Generating AI Of Stable Diffusion Works

[Jay Alammar] has put up an illustrated guide to how Stable Diffusion works, and the principles in it are perfectly applicable to understanding how similar systems like OpenAI’s Dall-E or Google’s Imagen work under the hood as well. These systems are probably best known for their amazing ability to turn text prompts (e.g. “paradise cosmic beach”) into a matching image. Sometimes. Well, usually, anyway.

‘System’ is an apt term, because Stable Diffusion (and similar systems) are actually made up of many separate components working together to make the magic happen. [Jay]’s illustrated guide really shines here, because it starts at a very high level with only three components (each with their own neural network) and drills down as needed to explain what’s going on at a deeper level, and how it fits into the whole.

Spot any similar shapes and contours between the image and the noise that preceded it? That’s because the image is a result of removing noise from a random visual mess, not building it up from scratch like a human artist would do.

It may surprise some to discover that the image creation part doesn’t work the way a human does. That is to say, it doesn’t begin with a blank canvas and build an image bit by bit from the ground up. It begins with a seed: a bunch of random noise. Noise gets subtracted in a series of steps that leave the result looking less like noise and more like an aesthetically pleasing and (ideally) coherent image. Combine that with the ability to guide noise removal in a way that favors conforming to a text prompt, and one has the bones of a text-to-image generator. There’s a lot more to it of course, and [Jay] goes into considerable detail for those who are interested.

If you’re unfamiliar with Stable Diffusion or art-creating AI in general, it’s one of those fields that is changing so fast that it sometimes feels impossible to keep up. Luckily, our own Matthew Carlson explains all about what it is, and why it matters.

Stable Diffusion can be run locally. There is a fantastic open-source web UI, so there’s no better time to get up to speed and start experimenting!

Render Yourself Invisible To AI With This Adversarial Sweater Of Doom

Ugly sweater season is rapidly approaching, at least here in the Northern Hemisphere. We’ve always been a bit baffled by the tradition of paying top dollar for a loud, obnoxious sweater that gets worn to exactly one social event a year. We don’t judge, of course, but that’s not to say we wouldn’t look a little more favorably on someone’s fashion choice if it were more like this AI-defeating adversarial ugly sweater.

The idea behind this research from the University of Maryland is not, of course, to inform fashion trends, nor is it to create a practical invisibility cloak. It’s really to probe machine learning systems for vulnerabilities by making small changes to the input while watching for changes in the output. In this case, the ML system was a YOLO-based vision system which has little trouble finding humans in an arbitrary image. The adversarial pattern was generated by using a large set of training images, some of which contain the objects of interest — in this case, humans. Each time a human is detected, a random pattern is rendered over the image, and the data is reassessed to see how much the pattern lowers the object’s score. The adversarial pattern eventually improves to the point where it mostly prevents humans from being recognized. Much more detail is available in the research paper (PDF) if you want to dig into the guts of this.

The pattern, which looks a little like a bad impressionist painting of people buying pumpkins at a market and bears some resemblance to one we’ve seen before in similar work, is said to work better from different viewing angles. It also makes a spiffy pullover, especially if you’d rather blend in at that Christmas party.

RatPack Is A Wearable Fit For A Rodent

Rats are often seen as pests and vermin, but they can also do useful jobs for us, like hunting for landmines. To aid in their work, [kjwu] designed the RatPack, a wearable device that lets these valiant rats communicate with their handlers.

The heart of the build is an ESP32-CAM board, which combines the capable wireless-enabled microcontroller with a small lightweight camera. It’s paired with a TinyML machine learning board, and it’s all wrapped up in a 3D printed enclosure that serves as a backpack to fit African Giant Pouched rats.

The RatPack can provide a live video feed. However, its main purpose is to track the rat’s movements through the use of an accelerometer. This data is then fed to the machine learning subsystem, which analyzes it to detect certain gestures the rats have been trained to make. The idea is that when the rat identifies an object of interest, such as a landmine, it will perform a predetermined gesture. The RatPack would then detect this, and transmit a signal to the rat’s handlers. Given a rat’s limbs are all on the bottom of its body, this approach is useful. It’s kind of hard to ask a rat to press a button on its own back, after all.

Finding and carefully disposing of unexploded ordnance is a problem facing many societies around the world. We’re lucky in many cases that the rats are helping out with this difficult and dangerous job.

Tesla’s Dojo Is An Interesting CPU Design

What do you get when you cross a modern super-scalar out-of-order CPU core with more traditional microcontroller aspects such as no virtual memory, no memory cache, and no DDR or PCIe controllers? You get the Tesla Dojo, which Chips and Cheese recently did a deep dive on.

It starts with a comparison to the IBM Cell processors. The Cell of the mid-2000s featured something called the SPE (Synergistic Processing Elements). They were smaller cores focused on vector processing or other specialized types of workloads. They didn’t access the main memory and had to be given tasks by the fully featured CPU. Dojo has 1.25MB of SRAM that it can use as working memory with five ports, but it has no cache or virtual memory. It uses DMA to get the information it needs via a mesh system. The front end pulls RISC-V-like (heavily MIPS-inspired) instructions into a small instruction cache and decodes eight instructions per cycle. Continue reading “Tesla’s Dojo Is An Interesting CPU Design”