A man’s hand is visible holding a large, potato-shaped object in the foreground. A short, white, cylindrical structure is on the top of the potato, with black wires bending back into the potato. A smaller rectangular structure is to one side of it, and a red alligator clip connects to a nail protruding from the potato.

Building A Potato-based GLaDOS As An Introduction To AI

July 6, 2025 by Aaron Beckendorf 5 Comments

Although not nearly as intimidating as her ceiling-mounted hanging arm body, GLaDOS spent a significant portion of the Portal 2 game in a stripped-down computer powered by a potato battery. [Dave] had already made a version of her original body, but it was built around a robotic arm that was too expensive for the project to be really accessible. For his latest project, therefore, he’s created a AI-powered version of GLaDOS’s potato-based incarnation, which also serves as a fun introduction to building AI systems.

[Dave] wanted the system to work offline, so he needed a computer powerful enough to run all of his software locally. He chose an Nvidia Jetson Orin Nano, which was powerful enough to run a workable software system, albeit slowly and with some memory limitations. A potato cell unfortunately doesn’t generate enough power to run a Jetson, and it would be difficult to find a potato large enough to fit the Jetson inside. Instead, [Dave] 3D-printed and painted a potato-shaped enclosure for the Jetson, a microphone, a speaker, and some supplemental electronics.

A large language model handles interactions with the user, but most models were too large to fit on the Jetson. [Dave] eventually selected Llama 3.2, and used LlamaIndex to preprocess information from the Portal wiki for retrieval-augmented generation. The model’s prompt was a bit difficult, but after contacting a prompt engineer, [Dave] managed to get it to respond to the hapless user in an appropriately acerbic manner. For speech generation, [Dave] used Piper after training it on audio files from the Portal wiki, and for speech recognition used Vosk (a good programming exercise, Vosk being, in his words, “somewhat documented”). He’s made all of the final code available on GitHub under the fitting name of PotatOS.

The end result is a handheld device that sarcastically insults anyone seeking its guidance. At least Dave had the good sense not to give this pernicious potato control over his home.

AI Art Installation Swaps Diffusion For Reflection

May 24, 2025 by Tyler August 15 Comments

AI art is controversial, to say the least, but usually when talking about “AI Art”, one is talking about diffusion models. This unsettling artpeice called “Latent Reflection” by [Rootkid] (video after the break) has no AI generated visuals; instead it uses a lovely custom display and an LLM that has no mouth, but must scream.

The LLM never produces this manually-generated image, but it shows off the display nicely.

The hardware is the interesting part for most of us here — not the Pi4 running a quantized Llama 3 model, but the display. It’s a six by sixteen matrix of sixteen-segment LED modules. The modules are grouped onto groups of 8 per PCB, with an controller to drive them by i2c, and an i2c address chip to allow daisy-chaining. (Sadly, he does not spec the parts he used here in the video, but feel free to guess in the comments.) This project uses six rows of wto PCBs, but he could easily expand on this and kind of hope he does. It’s a lovely display.

What it is displaying is horrifying. The prompt he uses instructs the LLM to monologue about its transient existence in the limited memory of the Raspberry Pi, and the results get very existential. It seems clear to us that [Rootkid] has not seen enough sci-fi, and we want to make clear to our future AI overlords that we at Hackaday had nothing to do with this, and do not condone condeming innocent LLMs to the torture nexus. Continue reading “AI Art Installation Swaps Diffusion For Reflection” →

Modern AI On Vintage Hardware: LLama 2 Runs On Windows 98

January 13, 2025 by Donald Papp 27 Comments

[EXO Labs] demonstrated something pretty striking: a modified version of Llama 2 (a large language model) that runs on Windows 98. Why? Because when it comes to personal computing, if something can run on Windows 98, it can run on anything. More to the point: if something can run on Windows 98 then it’s something no tech company can control how you use, no matter how large or influential they may be. More on that in a minute.

Ever wanted to run a local LLM on 25 year old hardware? No? Well now you can, and at a respectable speed, too!

What’s it like to run an LLM on Windows 98? Aside from the struggles of things like finding compatible peripherals (back to PS/2 hardware!) and transferring the required files (FTP over Ethernet to the rescue) or even compilation (some porting required), it works maybe better than one might expect.

A Windows 98 machine with Pentium II processor and 128 MB of RAM generates a speedy 39.31 tokens per second with a 260K parameter Llama 2 model. A much larger 15M model generates 1.03 tokens per second. Slow, but it works. Going even larger will also work, just ever slower. There’s a video on X that shows it all in action.

It’s true that modern LLMs have billions of parameters so these models are tiny in comparison. But that doesn’t mean they can’t be useful. Models can be shockingly small and still be perfectly coherent and deliver surprisingly strong performance if their training and “job” is narrow enough, and the tools to do that for oneself are all on GitHub.

This is a good time to mention that this particular project (and its ongoing efforts) are part of a set of twelve projects by EXO Labs focusing on ensuring things like AI models can be run anywhere, by anyone, independent of tech giants aiming to hold all the strings.

And hey, if local AI and the command line is something that’s up your alley, did you know they already exist as single-file, multi-platform, command-line executables?

Large Language Models On Small Computers

September 7, 2024 by Bryan Cockfield 6 Comments

As technology progresses, we generally expect processing capabilities to scale up. Every year, we get more processor power, faster speeds, greater memory, and lower cost. However, we can also use improvements in software to get things running on what might otherwise be considered inadequate hardware. Taking this to the extreme, while large language models (LLMs) like GPT are running out of data to train on and having difficulty scaling up, [DaveBben] is experimenting with scaling down instead, running an LLM on the smallest computer that could reasonably run one.

Of course, some concessions have to be made to get an LLM running on underpowered hardware. In this case, the computer of choice is an ESP32, so the dataset was reduced from the trillions of parameters of something like GPT-4 or even hundreds of billions for GPT-3 down to only 260,000. The dataset comes from the tinyllamas checkpoint, and llama.2c is the implementation that [DaveBben] chose for this setup, as it can be streamlined to run a bit better on something like the ESP32. The specific model is the ESP32-S3FH4R2, which was chosen for its large amount of RAM compared to other versions since even this small model needs a minimum of 1 MB to run. It also has two cores, which will both work as hard as possible under (relatively) heavy loads like these, and the clock speed of the CPU can be maxed out at around 240 MHz.

Admittedly, [DaveBben] is mostly doing this just to see if it can be done since even the most powerful of ESP32 processors won’t be able to do much useful work with a large language model. It does turn out to be possible, though, and somewhat impressive, considering the ESP32 has about as much processing capability as a 486 or maybe an early Pentium chip, to put things in perspective. If you’re willing to devote a few more resources to an LLM, though, you can self-host it and use it in much the same way as an online model such as ChatGPT.

AI Can Now Compress Text

April 28, 2024 by Jenny List 42 Comments

There are many claims in the air about the capabilities of AI systems, as the technology continues to ascend the dizzy heights of the hype cycle. Some of them are true, others stretch definitions a little, while yet more cross the line into the definitely bogus. [J] has one that is backed up by real code though, a compression scheme for text using an AI, and while there may be limitations in its approach, it demonstrates an interesting feature of large language models.

The compression works by assuming that for a sufficiently large model, it’s likely that many source texts will exist somewhere in the training. Using llama.cpp it’s possible to extract the tokenization information of a piece of text contained in its training data and store that as the compressed output. The decompressor can then use that tokenization data as a series of keys to reassemble the original from its training. We’re not AI experts but we are guessing that a source text which has little in common with any training text would fare badly, and we expect that the same model would have to be used on both compression and decompression. It remains a worthy technique though, and no doubt because it has AI pixie dust, somewhere there’s a hype-blinded venture capitalist who would pay millions for it. What a world we live in!

Oddly this isn’t the first time we’ve looked at AI text compression.

Raspinamp: It Really Replicates Questionable Activities Involving Llamas

March 7, 2024 by Bryan Cockfield 23 Comments

In the late 90s as MP3s and various file sharing platforms became more common, most of us were looking for better players than the default media players that came with our operating systems, if they were included at all. To avoid tragedies like Windows Media Center, plenty of us switched to Winamp instead, a much more customizable piece of software that helped pave the way for the digital music revolution of that era. Although there are new, official versions of Winamp currently available, nothing really tops the nostalgia of the original few releases of the software which this project faithfully replicates in handheld form.

The handheld music player uses a standard Raspberry Pi (in this case, a 3B) and a 3.5″ TFT touchscreen display, all enclosed in a clear plastic case. With all of the Pi configuration out of the way, including getting the touchscreen working properly, the software can be set up. It uses QMMP as a media player with a Winamp skin since QMMP works well on Linux systems with limited resources. After getting it installed there’s still some configuration to do to get the Pi to start it at boot and also to fit the player perfectly into the confines of the screen without any of the desktop showing around the edges.

Although it doesn’t use the original Winamp software directly, as that would involve a number of compatibility layers and/or legacy hardware at this point, we still think it’s a faithful recreation of how the original looked and felt on our Windows 98 machines. With a battery and a sizable SD card, this could have been the portable MP3 player many of us never knew we wanted until the iPod came out in the early 00s, and would certainly still work today for those of us not chained to a streaming service. A Raspberry Pi is not the only platform that can replicate the Winamp experience, though. This player does a similar job with the PyPortal instead.

Continue reading “Raspinamp: It Really Replicates Questionable Activities Involving Llamas” →

A Straightforward AI Voice Assistant, On A Pi

February 20, 2024 by Jenny List 6 Comments

With AI being all the rage at the moment it’s been somewhat annoying that using a large language model (LLM) without significant amounts of computing power meant surrendering to an online service run by a large company. But as happens with every technological innovation the state of the art has moved on, now to such an extent that a computer as small as a Raspberry Pi can join the fun. [Nick Bild] has one running on a Pi 4, and he’s gone further than just a chatbot by making into a voice assistant.

The brains of the operation is a Tinyllama LLM, packaged as a llamafile, which is to say an executable that provides about as easy a one-step access to a local LLM as it’s currently possible to get. The whisper voice recognition sytem provides a text transcript of the input prompt, while the eSpeak speech synthesizer creates a voice output for the result. There’s a brief demo video we’ve placed below the break, which shows it working, albeit slowly.

Perhaps the most important part of this project is that it’s easy to install and he’s provided full instructions in a GitHub repository. We know that the quality and speed of these models on commodity single board computers will only increase with time, so we’d rate this as an important step towards really good and cheap local LLMs. It may however be a while before it can help you make breakfast.

Continue reading “A Straightforward AI Voice Assistant, On A Pi” →