Shopping Cart Does The Tedious Work For You

Thanks to modern microcontrollers, basic home automation tasks such as turning lights on and off, opening blinds, and various other simple tasks have become common DIY projects. But with the advent of artificial intelligence and machine learning the amount of tasks that can be offloaded to computers has skyrocketed. This shopping cart that automates away the checkout lines at grocery stores certainly fits into this category.

The project was inspired by the cashierless Amazon stores where customers simply walk into a store, grab what they want, and leave. This is made possible by the fact that computers monitor their purchases and charge them automatically, but creator [kutluhan_aktar] wanted to explore a way of doing this without a fleet of sensors and cameras all over a store. By mounting the hardware to a shopping cart instead, the sensors travel with the shopper and monitor what’s placed in the cart instead of what’s taken from a shelf. It’s built around the OpenMV Cam H7, a microcontroller paired with a camera specifically designed for these types of tasks, and the custom circuitry inside the case also includes WiFi connectivity to make sure the shopping cart can report its findings properly.

[kutluhan_aktar] also built the entire software stack from the ground up and trained the model on a set of common products as a proof-of-concept. The idea was to allow smaller stores to operate more efficiently without needing a full suite of Amazon hardware and software backing it up, and this prototype seems to work pretty well to that end. If you want to develop a machine vision project on your own with more common hardware, take a look at this project which uses the Raspberry Pi instead.

On Getting A Computer’s Attention And Striking Up A Conversation

With the rise in voice-driven virtual assistants over the years, the sight of people talking to various electrical devices in public and in private has become rather commonplace. While such voice-driven interfaces are decidedly useful for a range of situations, they also come with complications. One of these are the trigger phrases or wake words that voice assistants listen to when in standby. Much like in Star Trek, where uttering ‘Computer’ would get the computer’s attention, so do we have our ‘Siri’, ‘Cortana’ and a range of custom trigger phrases that enable the voice interface.

Unlike in Star Trek, however, our virtual assistants do not know when we really desire to interact. Unable to distinguish context, they’ll happily respond to someone on TV mentioning their trigger phrase. This possibly followed by a ludicrous purchase order or other mischief. The realization here is the complexity of voice-based interfaces, while still lacking any sense of self-awareness or intelligence.

Another issue is that the process of voice recognition itself is very resource-intensive, which limits the amount of processing that can be performed on the local device. This usually leads to the voice assistants like Siri, Alexa, Cortana and others processing recorded voices in a data center, with obvious privacy implications.

Continue reading “On Getting A Computer’s Attention And Striking Up A Conversation”

How The Image-Generating AI Of Stable Diffusion Works

[Jay Alammar] has put up an illustrated guide to how Stable Diffusion works, and the principles in it are perfectly applicable to understanding how similar systems like OpenAI’s Dall-E or Google’s Imagen work under the hood as well. These systems are probably best known for their amazing ability to turn text prompts (e.g. “paradise cosmic beach”) into a matching image. Sometimes. Well, usually, anyway.

‘System’ is an apt term, because Stable Diffusion (and similar systems) are actually made up of many separate components working together to make the magic happen. [Jay]’s illustrated guide really shines here, because it starts at a very high level with only three components (each with their own neural network) and drills down as needed to explain what’s going on at a deeper level, and how it fits into the whole.

Spot any similar shapes and contours between the image and the noise that preceded it? That’s because the image is a result of removing noise from a random visual mess, not building it up from scratch like a human artist would do.

It may surprise some to discover that the image creation part doesn’t work the way a human does. That is to say, it doesn’t begin with a blank canvas and build an image bit by bit from the ground up. It begins with a seed: a bunch of random noise. Noise gets subtracted in a series of steps that leave the result looking less like noise and more like an aesthetically pleasing and (ideally) coherent image. Combine that with the ability to guide noise removal in a way that favors conforming to a text prompt, and one has the bones of a text-to-image generator. There’s a lot more to it of course, and [Jay] goes into considerable detail for those who are interested.

If you’re unfamiliar with Stable Diffusion or art-creating AI in general, it’s one of those fields that is changing so fast that it sometimes feels impossible to keep up. Luckily, our own Matthew Carlson explains all about what it is, and why it matters.

Stable Diffusion can be run locally. There is a fantastic open-source web UI, so there’s no better time to get up to speed and start experimenting!

Render Yourself Invisible To AI With This Adversarial Sweater Of Doom

Ugly sweater season is rapidly approaching, at least here in the Northern Hemisphere. We’ve always been a bit baffled by the tradition of paying top dollar for a loud, obnoxious sweater that gets worn to exactly one social event a year. We don’t judge, of course, but that’s not to say we wouldn’t look a little more favorably on someone’s fashion choice if it were more like this AI-defeating adversarial ugly sweater.

The idea behind this research from the University of Maryland is not, of course, to inform fashion trends, nor is it to create a practical invisibility cloak. It’s really to probe machine learning systems for vulnerabilities by making small changes to the input while watching for changes in the output. In this case, the ML system was a YOLO-based vision system which has little trouble finding humans in an arbitrary image. The adversarial pattern was generated by using a large set of training images, some of which contain the objects of interest — in this case, humans. Each time a human is detected, a random pattern is rendered over the image, and the data is reassessed to see how much the pattern lowers the object’s score. The adversarial pattern eventually improves to the point where it mostly prevents humans from being recognized. Much more detail is available in the research paper (PDF) if you want to dig into the guts of this.

The pattern, which looks a little like a bad impressionist painting of people buying pumpkins at a market and bears some resemblance to one we’ve seen before in similar work, is said to work better from different viewing angles. It also makes a spiffy pullover, especially if you’d rather blend in at that Christmas party.

RatPack Is A Wearable Fit For A Rodent

Rats are often seen as pests and vermin, but they can also do useful jobs for us, like hunting for landmines. To aid in their work, [kjwu] designed the RatPack, a wearable device that lets these valiant rats communicate with their handlers.

The heart of the build is an ESP32-CAM board, which combines the capable wireless-enabled microcontroller with a small lightweight camera. It’s paired with a TinyML machine learning board, and it’s all wrapped up in a 3D printed enclosure that serves as a backpack to fit African Giant Pouched rats.

The RatPack can provide a live video feed. However, its main purpose is to track the rat’s movements through the use of an accelerometer. This data is then fed to the machine learning subsystem, which analyzes it to detect certain gestures the rats have been trained to make. The idea is that when the rat identifies an object of interest, such as a landmine, it will perform a predetermined gesture. The RatPack would then detect this, and transmit a signal to the rat’s handlers. Given a rat’s limbs are all on the bottom of its body, this approach is useful. It’s kind of hard to ask a rat to press a button on its own back, after all.

Finding and carefully disposing of unexploded ordnance is a problem facing many societies around the world. We’re lucky in many cases that the rats are helping out with this difficult and dangerous job.

Tesla’s Dojo Is An Interesting CPU Design

What do you get when you cross a modern super-scalar out-of-order CPU core with more traditional microcontroller aspects such as no virtual memory, no memory cache, and no DDR or PCIe controllers? You get the Tesla Dojo, which Chips and Cheese recently did a deep dive on.

It starts with a comparison to the IBM Cell processors. The Cell of the mid-2000s featured something called the SPE (Synergistic Processing Elements). They were smaller cores focused on vector processing or other specialized types of workloads. They didn’t access the main memory and had to be given tasks by the fully featured CPU. Dojo has 1.25MB of SRAM that it can use as working memory with five ports, but it has no cache or virtual memory. It uses DMA to get the information it needs via a mesh system. The front end pulls RISC-V-like (heavily MIPS-inspired) instructions into a small instruction cache and decodes eight instructions per cycle. Continue reading “Tesla’s Dojo Is An Interesting CPU Design”

Machine Learning App Remembers Names So You Don’t Have To

Depending on your point of view, real-life conversations with strangers can either be refreshing or terrifying. Some of us are glib and at ease in new social situations, while others are sure that the slightest flub will haunt them forever. And perhaps chief among these conversational faux pas is forgetting the name of the person who just introduced themselves a few seconds before.

Rather than commit himself to a jail of shame on such occasions, [Caleb] fought back with this only slightly creepy name-recalling smartphone app. The non-zero creep factor comes from the fact that, as [Caleb] points out, the app crosses lines that most of us would find unacceptable if Google or Amazon did it — like listening to your every conversation. It does this not to direct ads to you based on your conversations, but to fish out the name of your interlocutor from the natural flow of the conversation.

It turns out to be a tricky problem, even with the help of named-entity recognition (NER), which basically looks for the names of things in natural text. Apache OpenNLP, the NER library used here, works well at pulling out names, but figuring out whether they’re part of an introduction or just a bit of gossip about a third party is where [Caleb] put the bulk of his coding effort. That, and trying to make the whole thing at least a little privacy-respecting. See the video below for a demo.

To be sure, this doesn’t do much more than a simple, ‘remind me of your name again?’ would, but without the embarrassment. It’s still pretty cool though, and we’re especially jazzed to learn about NER and the tons of applications for it. Those are projects for a future day, though. We’re just glad to see that [Caleb] has moved on from monitoring the bodily functions of his dog and his kid. At least for now.

Continue reading “Machine Learning App Remembers Names So You Don’t Have To”