Need To Pick Objects Out Of Images? Segment Anything Does Exactly That

Segment Anything, recently released by Facebook Research, does something that most people who have dabbled in computer vision have found daunting: reliably figure out which pixels in an image belong to an object. Making that easier is the goal of the Segment Anything Model (SAM), just released under the Apache 2.0 license.

The online demo has a bank of examples, but also works with uploaded images.

The results look fantastic, and there’s an interactive demo available where you can play with the different ways SAM works. One can pick out objects by pointing and clicking on an image, or images can be automatically segmented. It’s frankly very impressive to see SAM make masking out the different objects in an image look so effortless. What makes this possible is machine learning, and part of that is the fact that the model behind the system has been trained on a huge dataset of high-quality images and masks, making it very effective at what it does.

Continue reading “Need To Pick Objects Out Of Images? Segment Anything Does Exactly That”

My Glasses Hear Everything I’m Not Saying!

There was a time when you saw someone walking down the street talking to no one, they were probably crazy. Now you have to look for a Bluetooth headset. But soon they may just be quietly talking to their glasses. Cornell University researchers have EchoSpeech which use sonar-like sensors in a pair of glasses to watch your lips and mouth move. From that data, they can figure out what you are saying, even if you don’t really say it out loud. You can see a video of the glasses below.

There are a few advantages to a method like this. For one thing, you can speak commands even in places where you can’t talk out loud to a microphone. There have been HAL 9000-like attempts to read lips with cameras, but this is power-hungry and video tends to be data intensive.

Continue reading “My Glasses Hear Everything I’m Not Saying!”

New Raspberry Pi Camera With Global Shutter

Raspberry Pi has just introduced a new camera module in the high-quality camera format. For the same $50 price you would shell out for the HQ camera, you get roughly eight times fewer pixels. But this is a global shutter camera, and if you need a global shutter, there’s just no substitute. That’s a big deal for the Raspberry Pi ecosystem.

Global vs Rolling

Most cameras out there today use CMOS sensors in rolling shutter mode. That means that the sensor starts in the upper left corner and rasters along, reading out exposure values from each row before moving down to the next row, and then starting up at the top again. The benefit is simpler CMOS design, but the downside is that none of the pixels are exposed or read at the same instant.

Continue reading “New Raspberry Pi Camera With Global Shutter”

Hackaday Links Column Banner

Hackaday Links: January 29, 2023

We’ve been told for ages that “the robots are coming for our jobs!” It’s true that we’ve seen robots capable of everything from burger flipping to bricklaying being demonstrated, and that’s certainly alarming for anyone employed in such trades. But now it looks like AI has set its sights set on the white-collar world, with the announcement that ChatGPT has managed a passing grade on a Wharton MBA exam.

For those not in the know, the University of Pennsylvania’s Wharton School of Business is in the major league of business schools; earning a Master’s in Business Administration from that august institution is no mean feat, and is likely to put the budding executive on a ballistic career trajectory. So the fact that ChatGPT could pass the exam is significant. But before you worry about a world in which our best and brightest business leaders are replaced with soulless automatons, relax. The exam presented to ChatGPT was just a final exam for one course, Operations Management, so it’s not like it aced everything an MBA is expected to know, and it took a lot of hints from a human helper to get it that far. It’s also reported that it made a lot of simple math mistakes, too, so maybe a Wharton MBA isn’t that much of a big deal after all.

Continue reading “Hackaday Links: January 29, 2023”

Shopping Cart Does The Tedious Work For You

Thanks to modern microcontrollers, basic home automation tasks such as turning lights on and off, opening blinds, and various other simple tasks have become common DIY projects. But with the advent of artificial intelligence and machine learning the amount of tasks that can be offloaded to computers has skyrocketed. This shopping cart that automates away the checkout lines at grocery stores certainly fits into this category.

The project was inspired by the cashierless Amazon stores where customers simply walk into a store, grab what they want, and leave. This is made possible by the fact that computers monitor their purchases and charge them automatically, but creator [kutluhan_aktar] wanted to explore a way of doing this without a fleet of sensors and cameras all over a store. By mounting the hardware to a shopping cart instead, the sensors travel with the shopper and monitor what’s placed in the cart instead of what’s taken from a shelf. It’s built around the OpenMV Cam H7, a microcontroller paired with a camera specifically designed for these types of tasks, and the custom circuitry inside the case also includes WiFi connectivity to make sure the shopping cart can report its findings properly.

[kutluhan_aktar] also built the entire software stack from the ground up and trained the model on a set of common products as a proof-of-concept. The idea was to allow smaller stores to operate more efficiently without needing a full suite of Amazon hardware and software backing it up, and this prototype seems to work pretty well to that end. If you want to develop a machine vision project on your own with more common hardware, take a look at this project which uses the Raspberry Pi instead.

Hackaday Links Column Banner

Hackaday Links: September 4, 2022

Say what you will about Tesla, but there’s little doubt that the electric vehicle maker inspires a certain degree of fanaticism in owners. We’re used to the ones who can’t stop going on about neck-snapping acceleration and a sci-fi interior. But the ones we didn’t see coming are those who feel their cars are so bad that they need to stage a hunger strike to get the attention of Tesla. The strike is being organized by a group of Tesla owners in Norway, who on their website enumerate a long list of grievances, including design defects, manufacturing issues, quality control problems, and customer service complaints. It’s not clear how many people are in the group, although we assume at least 18, as that’s the number of Tesla cars they used to spell out “HELP” in a parking lot. It’s also not clear how or even if the group is really off their feed, or if this is just a stunt to get the attention of Tesla honcho and notorious social media gadfly Elon Musk.

Continue reading “Hackaday Links: September 4, 2022”

Truthsayer Uses Facial Recognition To See If You’re Telling The Truth

It’s hard to watch [Mark Zuckerberg]’s 2018 Congressional testimony and not come to the conclusion that he is, at a minimum, quite a bit different than the average person. Of course, having built a multibillion-dollar company that drastically changed everything about the way people communicate is pretty solid evidence of that, but the footage at least made a fun test case for this AI truth-detecting algorithm.

Now, we’re not saying that anyone in these videos was lying, and neither is [Fletcher Heisler]. His algorithm, which analyzes video of a person and uses machine vision to pick up cues that might be associated with the stress of untruthfulness, is far from perfect. But as the first video below shows, it is a lot of fun to see it at work. The idea is to capture data like pulse rate, gaze direction, blink rate, mouth posture, and even hand position and use them as a proxy for lying. The second video, from [Fletcher]’s recent DEFCON talk, has much more detail.

The key to all this is finding human faces in a video — a task that seemed to fail suspiciously frequently when [Zuck] was on camera — using OpenCV and MediaPipe’s Face Mesh. The subject’s pulse is detected by watching for subtle changes in the color of a subject’s cheeks as blood flows through them, which we’ve heard about plenty of times but never before seen presented so clearly and executed so simply. Gaze direction, blinking, and lip compression are fairly easy to detect too. [Fletcher] also threw in the FER library for facial expression recognition, to get an idea of the subject’s mood. Together, these cues form a rough estimate of the subject’s truthiness, which [Fletcher] is quick to point out is just for entertainment purposes and totally shouldn’t be used on your colleagues on the next Zoom call.

Does [Fletcher]’s facial mesh look familiar? It should, since we once watched him twitch his way through a coding interview.

Continue reading “Truthsayer Uses Facial Recognition To See If You’re Telling The Truth”