Humans And Balloon Hands Help Bots Make Breakfast

Breakfast may be the most important meal of the day, but who wants to get up first thing in the morning and make it? Well, there may come a day when a robot can do the dirty work for you. This is Toyota Research Institute’s vision with their innovatively-trained breakfast bots.

Going way beyond pick and place tasks, TRI has, so far, taught robots how to do more than 60 different things using a new method to teach dexterous skills like whisking eggs, peeling vegetables, and applying hazelnut spread to a substrate. Their method is built on generative AI technique called Diffusion Policy, which they use to create what they’re calling Large Behavior Models.

Instead of hours of coding and debugging, the robots learn differently. Essentially, the robot gets a large flexible balloon hand with which to feel objects, their weight, and their effect on other objects (like flipping a pancake). Then, a human shows them how to perform a task before the bot is let loose on an AI model. After a number of hours, say overnight, the bot has a new working behavior.

Now, since TRI claims that their aim is to build robots that amplify people and not replace them, you may still have to plate your own scrambled eggs and apply the syrup to that short stack yourself. But they plan to have over 1,000 skills in the bag of tricks by the end of 2024. If you want more information about the project and to learn about Diffusion Policy without reading the paper, check out this blog post.

Perhaps the robotic burger joint was ahead of its time, but we’re getting there. How about a robot barista?

Continue reading “Humans And Balloon Hands Help Bots Make Breakfast”

WhisperFrame Depicts The Art Of Conversation

At this point, you gotta figure that you’re at least being listened to almost everywhere you go, whether it be a home assistant or your very own phone. So why not roll with the punches and turn lemons into something like a still life of lemons that’s a bit wonky? What we mean is, why not take our conversations and use AI to turn them into art? That’s the idea behind this next-generation digital photo frame created by [TheMorehavoc].
Essentially, it uses a Raspberry Pi and a Respeaker four-mic array to listen to conversations in the room. It listens and records 15-20 seconds of audio, and sends that to the OpenWhisper API to generate a transcript.
This repeats until five minutes of audio is collected, then the entire transcript is sent through GPT-4 to extract an image prompt from a single topic in the conversation. Then, that prompt is shipped off to Stable Diffusion to get an image to be displayed on the screen. As you can imagine, the images generated run the gamut from really weird to really awesome.

The natural lulls in conversation presented a bit of a problem in that the transcription was still generating during silences, presumably because of ambient noise. The answer was in voice activity detection software that gives a probability that a voice is present.

Naturally, people were curious about the prompts for the images, so [TheMorehavoc] made a little gallery sign with a MagTag that uses Adafruit.io as the MQTT broker. Build video is up after the break, and you can check out the images here (warning, some are NSFW).

Continue reading “WhisperFrame Depicts The Art Of Conversation”

E-Paper News Feed Illustrates The Headlines With AI-Generated Images

It’s hard to read the headlines today without feeling like the world couldn’t possibly get much worse. And then tomorrow rolls around, and a fresh set of headlines puts the lie to that thought. On a macro level, there’s not much that you can do about that, but on a personal level, illustrating your news feed with mostly wrong, AI-generated images might take the edge off things a little.

Let us explain. [Roy van der Veen] liked the idea of an e-paper display newsfeed, but the crushing weight of the headlines was a little too much to bear. To lighten things up, he decided to employ Stable Diffusion to illustrate his feed, displaying both the headline and a generated image on a 7.3″ Inky 7-color e-paper display. Every five hours, a script running on a Raspberry Pi Zero 2W fetches a headline from a random source — we’re pleased the list includes Hackaday — and composes a prompt for Stable Diffusion based on the headline, adding on a randomly selected prefix and suffix to spice things up. For example, a prompt might look like, “Gothic painting of (Driving a Motor with an Audio Amp Chip). Gloomy, dramatic, stunning, dreamy.” You can imagine the results.

We have to say, from the examples [Roy] shows, the idea pretty much works — sometimes the images are so far off the mark that just figuring out how Stable Diffusion came up with them is enough to soften the blow. We’d have preferred if the news of the floods in Libya had been buffered by a slightly less dismal scene, but finding out that what was thought to be a “ritual mass murder” was really only a yoga class was certainly heartening.

Two white Chevy Bolt hatchbacks sit side-by-side, immobilized in the street, their roofs festooned with sensors and an orange cone on their hoods like a snowman's nose pointed toward the sky.

Coning Cars For Fun And Non-Profit

Self-driving cars are being heralded as the wave of the future, but there have been many hiccups along the way. The newest is activists showing how autonomous vehicles are easy to hack with a simple traffic cone.

As we’ve discussed before, self-driving cars aren’t actually that great at driving, and there are a number of conditions that can cause them to fail safe and stop in the middle of the road. Activist group Safe Street Rebel is exploiting this vulnerability by “coning” Waymo and Cruise vehicles in San Francisco. By placing a traffic cone on the vehicle’s hood in the way of the sensors and cameras used to navigate the streets, the vehicles are rendered inoperable. Continue reading “Coning Cars For Fun And Non-Profit”

BingGPT Brings AI Chat To The Desktop

Interested in AI, but sick of using everything in a browser? Miss clicking on a good old desktop icon to open a local bit of software? In that case, BingGPT could be just the thing for you.

It’s nothing too crazy—just a desktop application that gives you access to Bing’s AI-powered chatbot. It’s available on a range of platforms, from Windows, to Apple, and Linux, and binaries are available for Intel, Apple Silicon, and ARM processors.

Using BingGPT is simple. Sign in with your Microsoft account, and away you go. There’s no need to use Microsoft Edge or any ugly browser plugins, and you can export your conversations to Markdown, PNG, and PDF for sharing beyond the program. It’s also complete with a range of keyboard shortcuts to speed your interaction with the large language model when it gets off track. There’s also the Compose button which can actually go ahead and write stuff for you.

Fundamentally, all the cool stuff is still coming in via the web, but it’s nice to be able to use Bing’s chatbot without having to succumb to the horrors of a Microsoft browser. It’s interesting to see how large language models are becoming an all-pervasive tool of late. If you’re building your own nifty projects in this area, don’t hesitate to let us know!

Programming A Poker Game With GPT Help

Although ChatGPT generated a huge amount of hype around replacing white collar workers completely when it was first released to the public, the general consensus now is that it won’t outright replace anyone yet, but rather people who know how to use it as a tool will replace those who don’t. Getting started with it is not too hard, either, but you’ll of course need a project to work on to familiarize yourself with the tool. [Volos Projects] gave himself the challenge of writing a poker game using ChatGPT not as the opposing player, but as a co-designer in order to learn more about it as an assistant.

The poker game is being built on an ESP32 board with a built-in AMOLED screen. Five buttons are wired to the microcontroller to allow the player to select which cards to discard and which to keep. The bet for each hand can be raised or lowered much like the tabletop poker games often seen in bars and restaurants. To program it, though, ChatGPT was used to help design the code at each step of the way, first describing the overall goal and then building each function one-by-one like shuffling the deck, dealing the hand, and then replacing and dealing new cards.

For anyone who hasn’t yet explored using ChatGPT to help design their programming projects, this effort goes a long way to showing just how useful a tool it can be. For more complex tasks, though, it does take a little bit of knowledge on the part of the user because ChatGPT can often turn out nonsense or factually inaccurate information, but at least in a programming environment you’ll generally find out quickly when that happens. It’s not just a useful tool for writing programs, either. It can accomplish a lot of ancillary tasks related to programming as well, even if it’s not writing the code directly.

Thanks to [Peter] for the tip!

Continue reading “Programming A Poker Game With GPT Help”

Here’s Why GPUs Are Deep Learning’s Best Friend

If you have a curiosity about how fancy graphics cards actually work, and why they are so well-suited to AI-type applications, then take a few minutes to read [Tim Dettmers] explain why this is so. It’s not a terribly long read, but while it does get technical there are also car analogies, so there’s something for everyone!

He starts off by saying that most people know that GPUs are scarily efficient at matrix multiplication and convolution, but what really makes them most useful is their ability to work with large amounts of memory very efficiently.

Essentially, a CPU is a latency-optimized device while GPUs are bandwidth-optimized devices. If a CPU is a race car, a GPU is a cargo truck. The main job in deep learning is to fetch and move cargo (memory, actually) around. Both devices can do this job, but in different ways. A race car moves quickly, but can’t carry much. A truck is slower, but far better at moving a lot at once. Continue reading “Here’s Why GPUs Are Deep Learning’s Best Friend”