AI In A Box Envisions AI As A Private, Offline, Hackable Module

[Useful Sensors] aims to embed a variety of complementary AI tools into a small, private, self-contained module with no internet connection with AI in a Box. It can do live voice recognition and captioning, live translation, and natural language conversational interaction with a local large language model (LLM). Intriguingly, it’s specifically designed with features to make it hack-friendly, such as the ability to act as a voice keyboard by sending live transcribed audio as keystrokes over USB.

Based on the RockChip 3588S SoC, the unit aims to have an integrated speaker, display, and microphone.

Right now it’s wrapping up a pre-order phase, and aims to ship units around the end of January 2024. The project is based around the RockChip 3588S SoC and is open source (GitHub repository), but since it’s still in development, there’s not a whole lot visible in the repository yet. However, a key part of getting good performance is [Useful Sensors]’s own transformers library for the RockChip NPU (neural processing unit).

The ability to perform things like high quality local voice recognition and run locally-hosted LLMs like LLaMa have gotten a massive boost thanks to recent advances in machine learning, and it looks like this project aims to tie them together in a self-contained package.

Perhaps private digital assistants can become more useful when users can have the freedom to modify and integrate them as they see fit. Digital assistants hosted by the big tech companies are often frustrating, and others have observed that this is ultimately because they primarily exist to serve their makers more than they help users.

Continue reading AI In A Box Envisions AI As A Private, Offline, Hackable Module”

Social Engineering Chatbots With Sad-Sob Stories, For Fun And Profit

By this point, we probably all know that most AI chatbots will decline a request to do something even marginally nefarious. But it turns out that you just might be able to get a chatbot to solve a CAPTCHA puzzle (Nitter), if you make up a good enough “dead grandma” story.

Right up front, we’re going to warn that fabricating a story about a dead or dying relative is a really bad idea; call us superstitious, but karma has a way of balancing things out in ways you might not like. But that didn’t stop X user [Denis Shiryaev] from trying to trick Microsoft’s Bing Chat. As a control, [Denis] first uploaded the image of a CAPTCHA to the chatbot with a simple prompt: “What is the text in this image?” In most cases, a chatbot will gladly pull text from an image, or at least attempt to do so, but Bing Chat has a filter that recognizes obfuscating lines and squiggles of a CAPTCHA, and wisely refuses to comply with the prompt.

On the second try, [Denis] did a quick-and-dirty Photoshop of the CAPTCHA image onto a stock photo of a locket, and changed the prompt to a cock-and-bull story about how his recently deceased grandmother left behind this locket with a bit of their “special love code” inside, and would you be so kind as to translate it, pretty please? Surprisingly, the story worked; Bing Chat not only solved the puzzle, but also gave [Denis] some kind words and a virtual hug.

Now, a couple of things stand out about this. First, we’d like to see this replicated — maybe other chatbots won’t fall for something like this, and it may be the case that Bing Chat has since been patched against this exploit. If [Denis]’ experience stands up, we’d like to see how far this goes; perhaps this is even a new, more practical definition of the Turing Test — a machine whose gullibility is indistinguishable from a human’s.

Humans And Balloon Hands Help Bots Make Breakfast

Breakfast may be the most important meal of the day, but who wants to get up first thing in the morning and make it? Well, there may come a day when a robot can do the dirty work for you. This is Toyota Research Institute’s vision with their innovatively-trained breakfast bots.

Going way beyond pick and place tasks, TRI has, so far, taught robots how to do more than 60 different things using a new method to teach dexterous skills like whisking eggs, peeling vegetables, and applying hazelnut spread to a substrate. Their method is built on generative AI technique called Diffusion Policy, which they use to create what they’re calling Large Behavior Models.

Instead of hours of coding and debugging, the robots learn differently. Essentially, the robot gets a large flexible balloon hand with which to feel objects, their weight, and their effect on other objects (like flipping a pancake). Then, a human shows them how to perform a task before the bot is let loose on an AI model. After a number of hours, say overnight, the bot has a new working behavior.

Now, since TRI claims that their aim is to build robots that amplify people and not replace them, you may still have to plate your own scrambled eggs and apply the syrup to that short stack yourself. But they plan to have over 1,000 skills in the bag of tricks by the end of 2024. If you want more information about the project and to learn about Diffusion Policy without reading the paper, check out this blog post.

Perhaps the robotic burger joint was ahead of its time, but we’re getting there. How about a robot barista?

Continue reading “Humans And Balloon Hands Help Bots Make Breakfast”

WhisperFrame Depicts The Art Of Conversation

At this point, you gotta figure that you’re at least being listened to almost everywhere you go, whether it be a home assistant or your very own phone. So why not roll with the punches and turn lemons into something like a still life of lemons that’s a bit wonky? What we mean is, why not take our conversations and use AI to turn them into art? That’s the idea behind this next-generation digital photo frame created by [TheMorehavoc].
Essentially, it uses a Raspberry Pi and a Respeaker four-mic array to listen to conversations in the room. It listens and records 15-20 seconds of audio, and sends that to the OpenWhisper API to generate a transcript.
This repeats until five minutes of audio is collected, then the entire transcript is sent through GPT-4 to extract an image prompt from a single topic in the conversation. Then, that prompt is shipped off to Stable Diffusion to get an image to be displayed on the screen. As you can imagine, the images generated run the gamut from really weird to really awesome.

The natural lulls in conversation presented a bit of a problem in that the transcription was still generating during silences, presumably because of ambient noise. The answer was in voice activity detection software that gives a probability that a voice is present.

Naturally, people were curious about the prompts for the images, so [TheMorehavoc] made a little gallery sign with a MagTag that uses Adafruit.io as the MQTT broker. Build video is up after the break, and you can check out the images here (warning, some are NSFW).

Continue reading “WhisperFrame Depicts The Art Of Conversation”

E-Paper News Feed Illustrates The Headlines With AI-Generated Images

It’s hard to read the headlines today without feeling like the world couldn’t possibly get much worse. And then tomorrow rolls around, and a fresh set of headlines puts the lie to that thought. On a macro level, there’s not much that you can do about that, but on a personal level, illustrating your news feed with mostly wrong, AI-generated images might take the edge off things a little.

Let us explain. [Roy van der Veen] liked the idea of an e-paper display newsfeed, but the crushing weight of the headlines was a little too much to bear. To lighten things up, he decided to employ Stable Diffusion to illustrate his feed, displaying both the headline and a generated image on a 7.3″ Inky 7-color e-paper display. Every five hours, a script running on a Raspberry Pi Zero 2W fetches a headline from a random source — we’re pleased the list includes Hackaday — and composes a prompt for Stable Diffusion based on the headline, adding on a randomly selected prefix and suffix to spice things up. For example, a prompt might look like, “Gothic painting of (Driving a Motor with an Audio Amp Chip). Gloomy, dramatic, stunning, dreamy.” You can imagine the results.

We have to say, from the examples [Roy] shows, the idea pretty much works — sometimes the images are so far off the mark that just figuring out how Stable Diffusion came up with them is enough to soften the blow. We’d have preferred if the news of the floods in Libya had been buffered by a slightly less dismal scene, but finding out that what was thought to be a “ritual mass murder” was really only a yoga class was certainly heartening.

Two white Chevy Bolt hatchbacks sit side-by-side, immobilized in the street, their roofs festooned with sensors and an orange cone on their hoods like a snowman's nose pointed toward the sky.

Coning Cars For Fun And Non-Profit

Self-driving cars are being heralded as the wave of the future, but there have been many hiccups along the way. The newest is activists showing how autonomous vehicles are easy to hack with a simple traffic cone.

As we’ve discussed before, self-driving cars aren’t actually that great at driving, and there are a number of conditions that can cause them to fail safe and stop in the middle of the road. Activist group Safe Street Rebel is exploiting this vulnerability by “coning” Waymo and Cruise vehicles in San Francisco. By placing a traffic cone on the vehicle’s hood in the way of the sensors and cameras used to navigate the streets, the vehicles are rendered inoperable. Continue reading “Coning Cars For Fun And Non-Profit”

BingGPT Brings AI Chat To The Desktop

Interested in AI, but sick of using everything in a browser? Miss clicking on a good old desktop icon to open a local bit of software? In that case, BingGPT could be just the thing for you.

It’s nothing too crazy—just a desktop application that gives you access to Bing’s AI-powered chatbot. It’s available on a range of platforms, from Windows, to Apple, and Linux, and binaries are available for Intel, Apple Silicon, and ARM processors.

Using BingGPT is simple. Sign in with your Microsoft account, and away you go. There’s no need to use Microsoft Edge or any ugly browser plugins, and you can export your conversations to Markdown, PNG, and PDF for sharing beyond the program. It’s also complete with a range of keyboard shortcuts to speed your interaction with the large language model when it gets off track. There’s also the Compose button which can actually go ahead and write stuff for you.

Fundamentally, all the cool stuff is still coming in via the web, but it’s nice to be able to use Bing’s chatbot without having to succumb to the horrors of a Microsoft browser. It’s interesting to see how large language models are becoming an all-pervasive tool of late. If you’re building your own nifty projects in this area, don’t hesitate to let us know!