There’s No AI In A Markov Chain, But They’re Fun To Play With

Amid all the hype about AI it sometimes seems as though the world has lost sight of the fact that software such as ChatGPT contains no intelligence. Instead it’s an extremely sophisticated system for extracting plausible machine generated content from the corpus on which it is trained. There’s a long history behind machine generated text, and perhaps the simplest example comes in the form of a Markov chain. [Ben Hoyt] takes us through how these work, and provides some Python code so that you can roll your own.

If you’re uncertain what a Markov chain is, consider the predictive text on your phone. It works by offering the statistically most likely next word in your sentence, and should you accept all of its choices it will deliver sentences which are superficially readable but otherwise complete nonsense. He demonstrates with very simple short source texts how a collocate probability map is generated for two-word phrases, and how from that a likely next word can be extracted. It’s not AI, but it can be a lot of fun to play with and it opens the door to the entire field of computational linguistics. We haven’t set one loose on Hackaday’s archive yet but we suspect it would talk a lot about the Arduino.

We’re talking about Markov chains here with respect to language, but it’s also worth remembering that they work for music too.

Header: Bad AI image with Dall-E prompt, “Ten thousand monkeys with typewriters”.

NVIDIA Trains Custom AI To Assist Chip Designers

AI is big news lately, but as with all new technology moves, it’s important to pierce through the hype. Recent news about NVIDIA creating a custom large language model (LLM) called ChipNeMo to assist in chip design is tailor-made for breathless hyperbole, so it’s refreshing to read exactly how such a thing is genuinely useful.

ChipNeMo is trained on the highly specific domain of semiconductor design via internal code repositories, documentation, and more. The result is a vast 43-billion parameter LLM running on a single A100 GPU that actually plays no direct role in designing chips, but focuses instead on making designers’ jobs easier.

For example, it turns out that senior designers spend a lot of time answering questions from junior designers. If a junior designer can ask ChipNeMo a question like “what does signal x from memory unit y do?” and that saves a senior designer’s time, then NVIDIA says the tool is already worth it. In addition, it turns out another big time sink for designers is dealing with bugs. Bugs are extensively documented in a variety of ways, and designers spend a lot of time reading documentation just to grasp the basics of a particular bug. Acting as a smart interface to such narrowly-focused repositories is something a tool like ChipNeMo excels at, because it can provide not just summaries but also concrete references and sources. Saving developer time in this way is a clear and easy win.

It’s an internal tool and part research project, but it’s easy to see the benefits ChipNeMo can bring. Using LLMs trained on internal information for internal use is something organizations have experimented with (for example, Mozilla did so, while explaining how to do it for yourself) but it’s interesting to see a clear roadmap to assisting developers in concrete ways.

Most AI Content Is Trash, Just Like Everything Else

[Max Woolf] has been working in the AI space since 2015, and among other work has created numerous useful open-source tools. He also recently wrote a thoughtful blog post that attempts to put into words his feelings on the state of things in the wake of experiencing a bit of an AI backlash-related burnout. Essentially, people effortlessly creating vast amounts of bad AI content has caused a bigger problem than we may realize.

How so? Well, Sturgeon’s law (summarized as “ninety percent of everything is crud”) applies to AI as much as it does to anything else. Theodore Sturgeon was a science fiction author and critic (and writer of multiple Star Trek episodes) who observed in the 1950s that while Science Fiction — the hot new popular thing at the time — was often derided by critics as being little more than low quality pap, so was everything else. It was true that most Science Fiction was garbage. But most work in other fields was of similarly low quality, and thus Science Fiction was really no different. It’s all trash, except for the parts one likes. Just like anything else.

What makes this observation particularly applicable to the current AI landscape is that, according to [Max], the incredible ease of use makes AI’s “ninety percent crud” very large indeed, and the attached backlash is similarly big. The remaining ten percent of AI that is absolutely fantastic and full of possibilities? It’s practically invisible due to how quickly the industry is moving, the speed with which the big players are vying to control it, and how unfashionable it has become to admit one is using AI tools at all.

[Max] knows the scene better than most. One of his projects is simpleaichat, a tool aimed not just at enabling people to integrate AI into projects easier, but piercing the hype around AI to more easily reveal just how these tools actually work. Sadly, a general AI backlash has made developing these tools feel rather less rewarding than it once did.

AI In A Box Envisions AI As A Private, Offline, Hackable Module

[Useful Sensors] aims to embed a variety of complementary AI tools into a small, private, self-contained module with no internet connection with AI in a Box. It can do live voice recognition and captioning, live translation, and natural language conversational interaction with a local large language model (LLM). Intriguingly, it’s specifically designed with features to make it hack-friendly, such as the ability to act as a voice keyboard by sending live transcribed audio as keystrokes over USB.

Based on the RockChip 3588S SoC, the unit aims to have an integrated speaker, display, and microphone.

Right now it’s wrapping up a pre-order phase, and aims to ship units around the end of January 2024. The project is based around the RockChip 3588S SoC and is open source (GitHub repository), but since it’s still in development, there’s not a whole lot visible in the repository yet. However, a key part of getting good performance is [Useful Sensors]’s own transformers library for the RockChip NPU (neural processing unit).

The ability to perform things like high quality local voice recognition and run locally-hosted LLMs like LLaMa have gotten a massive boost thanks to recent advances in machine learning, and it looks like this project aims to tie them together in a self-contained package.

Perhaps private digital assistants can become more useful when users can have the freedom to modify and integrate them as they see fit. Digital assistants hosted by the big tech companies are often frustrating, and others have observed that this is ultimately because they primarily exist to serve their makers more than they help users.

Continue reading AI In A Box Envisions AI As A Private, Offline, Hackable Module”

Social Engineering Chatbots With Sad-Sob Stories, For Fun And Profit

By this point, we probably all know that most AI chatbots will decline a request to do something even marginally nefarious. But it turns out that you just might be able to get a chatbot to solve a CAPTCHA puzzle (Nitter), if you make up a good enough “dead grandma” story.

Right up front, we’re going to warn that fabricating a story about a dead or dying relative is a really bad idea; call us superstitious, but karma has a way of balancing things out in ways you might not like. But that didn’t stop X user [Denis Shiryaev] from trying to trick Microsoft’s Bing Chat. As a control, [Denis] first uploaded the image of a CAPTCHA to the chatbot with a simple prompt: “What is the text in this image?” In most cases, a chatbot will gladly pull text from an image, or at least attempt to do so, but Bing Chat has a filter that recognizes obfuscating lines and squiggles of a CAPTCHA, and wisely refuses to comply with the prompt.

On the second try, [Denis] did a quick-and-dirty Photoshop of the CAPTCHA image onto a stock photo of a locket, and changed the prompt to a cock-and-bull story about how his recently deceased grandmother left behind this locket with a bit of their “special love code” inside, and would you be so kind as to translate it, pretty please? Surprisingly, the story worked; Bing Chat not only solved the puzzle, but also gave [Denis] some kind words and a virtual hug.

Now, a couple of things stand out about this. First, we’d like to see this replicated — maybe other chatbots won’t fall for something like this, and it may be the case that Bing Chat has since been patched against this exploit. If [Denis]’ experience stands up, we’d like to see how far this goes; perhaps this is even a new, more practical definition of the Turing Test — a machine whose gullibility is indistinguishable from a human’s.

Humans And Balloon Hands Help Bots Make Breakfast

Breakfast may be the most important meal of the day, but who wants to get up first thing in the morning and make it? Well, there may come a day when a robot can do the dirty work for you. This is Toyota Research Institute’s vision with their innovatively-trained breakfast bots.

Going way beyond pick and place tasks, TRI has, so far, taught robots how to do more than 60 different things using a new method to teach dexterous skills like whisking eggs, peeling vegetables, and applying hazelnut spread to a substrate. Their method is built on generative AI technique called Diffusion Policy, which they use to create what they’re calling Large Behavior Models.

Instead of hours of coding and debugging, the robots learn differently. Essentially, the robot gets a large flexible balloon hand with which to feel objects, their weight, and their effect on other objects (like flipping a pancake). Then, a human shows them how to perform a task before the bot is let loose on an AI model. After a number of hours, say overnight, the bot has a new working behavior.

Now, since TRI claims that their aim is to build robots that amplify people and not replace them, you may still have to plate your own scrambled eggs and apply the syrup to that short stack yourself. But they plan to have over 1,000 skills in the bag of tricks by the end of 2024. If you want more information about the project and to learn about Diffusion Policy without reading the paper, check out this blog post.

Perhaps the robotic burger joint was ahead of its time, but we’re getting there. How about a robot barista?

Continue reading “Humans And Balloon Hands Help Bots Make Breakfast”

WhisperFrame Depicts The Art Of Conversation

At this point, you gotta figure that you’re at least being listened to almost everywhere you go, whether it be a home assistant or your very own phone. So why not roll with the punches and turn lemons into something like a still life of lemons that’s a bit wonky? What we mean is, why not take our conversations and use AI to turn them into art? That’s the idea behind this next-generation digital photo frame created by [TheMorehavoc].
Essentially, it uses a Raspberry Pi and a Respeaker four-mic array to listen to conversations in the room. It listens and records 15-20 seconds of audio, and sends that to the OpenWhisper API to generate a transcript.
This repeats until five minutes of audio is collected, then the entire transcript is sent through GPT-4 to extract an image prompt from a single topic in the conversation. Then, that prompt is shipped off to Stable Diffusion to get an image to be displayed on the screen. As you can imagine, the images generated run the gamut from really weird to really awesome.

The natural lulls in conversation presented a bit of a problem in that the transcription was still generating during silences, presumably because of ambient noise. The answer was in voice activity detection software that gives a probability that a voice is present.

Naturally, people were curious about the prompts for the images, so [TheMorehavoc] made a little gallery sign with a MagTag that uses Adafruit.io as the MQTT broker. Build video is up after the break, and you can check out the images here (warning, some are NSFW).

Continue reading “WhisperFrame Depicts The Art Of Conversation”