USB Stick Hides Large Language Model

Large language models (LLMs) are all the rage in the generative AI world these days, with the truly large ones like GPT, LLaMA, and others using tens or even hundreds of billions of parameters to churn out their text-based responses. These typically require glacier-melting amounts of computing hardware, but the “large” in “large language models” doesn’t really need to be that big for there to be a functional, useful model. LLMs designed for limited hardware or consumer-grade PCs are available now as well, but [Binh] wanted something even smaller and more portable, so he put an LLM on a USB stick.

This USB stick isn’t just a jump drive with a bit of memory on it, though. Inside the custom 3D printed case is a Raspberry Pi Zero W running llama.cpp, a lightweight, high-performance version of LLaMA. Getting it on this Pi wasn’t straightforward at all, though, as the latest version of llama.cpp is meant for ARMv8 and this particular Pi was running the ARMv6 instruction set. That meant that [Binh] needed to change the source code to remove the optimizations for the more modern ARM machines, but with a week’s worth of effort spent on it he finally got the model on the older Raspberry Pi.

Getting the model to run was just one part of this project. The rest of the build was ensuring that the LLM could run on any computer without drivers and be relatively simple to use. By setting up the USB device as a composite device which presents a filesystem to the host computer, all a user has to do to interact with the LLM is to create an empty text file with a filename, and the LLM will automatically fill the file with generated text. While it’s not blindingly fast, [Binh] believes this is the first plug-and-play USB-based LLM, and we’d have to agree. It’s not the least powerful computer to ever run an LLM, though. That honor goes to this project which is able to cram one on an ESP32.

Continue reading “USB Stick Hides Large Language Model”

Hackaday Links Column Banner

Hackaday Links: February 9, 2025

January 9 ended up being a very expensive day for a Culver City, California man after he pleaded guilty to recklessly operating a drone during the height of the Pacific Palisades wildfire. We covered this story a bit when it happened (second item), which resulted in the drone striking and damaging the leading edge of a Canadian “Super Scooper” plane that was trying to fight the fire. Peter Tripp Akemann, 56, admitted to taking the opportunity to go to the top of a parking garage in Santa Monica and launching his drone to get a better view of the action to the northwest. Unfortunately, the drone got about 2,500 meters away, far beyond visual range and, as it turns out, directly in the path of the planes refilling their tanks by skimming along the waters off Malibu. The agreement between Akemann and federal prosecutors calls for a guilty plea along with full restitution to the government of Quebec, which owns the damaged plane, plus the costs of repair. Akemann needs to write a check for $65,169 plus perform 150 hours of community service related to the relief effort for the fire’s victims. Expensive, yes, but probably better than the year in federal prison such an offense could have earned him.

Continue reading “Hackaday Links: February 9, 2025”

More Details On Why DeepSeek Is A Big Deal

The DeepSeek large language models (LLM) have been making headlines lately, and for more than one reason. IEEE Spectrum has an article that sums everything up very nicely.

We shared the way DeepSeek made a splash when it came onto the AI scene not long ago, and this is a good opportunity to go into a few more details of why this has been such a big deal.

For one thing, DeepSeek (there’s actually two flavors, -V3 and -R1, more on them in a moment) punches well above its weight. DeepSeek is the product of an innovative development process, and freely available to use or modify. It is also indirectly highlighting the way companies in this space like to label their LLM offerings as “open” or “free”, but stop well short of actually making them open source.

The DeepSeek-V3 LLM was developed in China and reportedly cost less than 6 million USD to train. This was possible thanks to developing DualPipe, a highly optimized and scalable method of training the system despite limitations due to export restrictions on Nvidia hardware. Details are in the technical paper for DeepSeek-V3.

There’s also DeepSeek-R1, a chain-of-thought “reasoning” model which handily provides its thought process enclosed within easily-parsed <think> and </think> pseudo-tags that are included in its responses. A model like this takes an iterative step-by-step approach to formulating responses, and benefits from prompts that provide a clear goal the LLM can aim for. The way DeepSeek-R1 was created was itself novel. Its training started with supervised fine-tuning (SFT) which is a human-led, intensive process as a “cold start” which eventually handed off to a more automated reinforcement learning (RL) process with a rules-based reward system. The result avoided problems that come from relying too much on RL, while minimizing the human effort of SFT. Technical details on the process of training DeepSeek-R1 are here.

DeepSeek-V3 and -R1 are freely available in the sense that one can access the full-powered models online or via an app, or download distilled models for local use on more limited hardware. It is free and open as in accessible, but not open source because not everything needed to replicate the work is actually released. Like with most LLMs, the training data and actual training code used are not available.

What is released and making waves of its own are the technical details of how researchers produced what they did, and that means there are efforts to try to make an actually open source version. Keep an eye out for Open-R1!

Modern AI On Vintage Hardware: LLama 2 Runs On Windows 98

[EXO Labs] demonstrated something pretty striking: a modified version of Llama 2 (a large language model) that runs on Windows 98. Why? Because when it comes to personal computing, if something can run on Windows 98, it can run on anything. More to the point: if something can run on Windows 98 then it’s something no tech company can control how you use, no matter how large or influential they may be. More on that in a minute.

Ever wanted to run a local LLM on 25 year old hardware? No? Well now you can, and at a respectable speed, too!

What’s it like to run an LLM on Windows 98? Aside from the struggles of things like finding compatible peripherals (back to PS/2 hardware!) and transferring the required files (FTP over Ethernet to the rescue) or even compilation (some porting required), it works maybe better than one might expect.

A Windows 98 machine with Pentium II processor and 128 MB of RAM generates a speedy 39.31 tokens per second with a 260K parameter Llama 2 model. A much larger 15M model generates 1.03 tokens per second. Slow, but it works. Going even larger will also work, just ever slower. There’s a video on X that shows it all in action.

It’s true that modern LLMs have billions of parameters so these models are tiny in comparison. But that doesn’t mean they can’t be useful. Models can be shockingly small and still be perfectly coherent and deliver surprisingly strong performance if their training and “job” is narrow enough, and the tools to do that for oneself are all on GitHub.

This is a good time to mention that this particular project (and its ongoing efforts) are part of a set of twelve projects by EXO Labs focusing on ensuring things like AI models can be run anywhere, by anyone, independent of tech giants aiming to hold all the strings.

And hey, if local AI and the command line is something that’s up your alley, did you know they already exist as single-file, multi-platform, command-line executables?

Hackaday Links Column Banner

Hackaday Links: December 15, 2024

It looks like we won’t have Cruise to kick around in this space anymore with the news that General Motors is pulling the plug on its woe-beset robotaxi project. Cruise, which GM acquired in 2016, fielded autonomous vehicles in various test markets, but the fleet racked up enough high-profile mishaps (first item) for California regulators to shut down test programs in the state last year. The inevitable layoffs ensued, and GM is now killing off its efforts to build robotaxis to concentrate on incorporating the Cruise technology into its “Super Cruise” suite of driver-assistance features for its full line of cars and trucks. We feel like this might be a tacit admission that surmounting the problems of fully autonomous driving is just too hard a nut to crack profitably with current technology, since Super Cruise uses eye-tracking cameras to make sure the driver is paying attention to the road ahead when automation features are engaged. Basically, GM is admitting there still needs to be meat in the seat, at least for now.

Continue reading “Hackaday Links: December 15, 2024”

An Animated Walkthrough Of How Large Language Models Work

If you wonder how Large Language Models (LLMs) work and aren’t afraid of getting a bit technical, don’t miss [Brendan Bycroft]’s LLM Visualization. It is an interactively-animated step-by-step walk-through of a GPT large language model complete with animated and interactive 3D block diagram of everything going on under the hood. Check it out!

nano-gpt has only around 85,000 parameters, but the operating principles are all the same as for larger models.

The demonstration walks through a simple task and shows every step. The task is this: using the nano-gpt model, take a sequence of six letters and put them into alphabetical order.

A GPT model is a highly complex prediction engine, so the whole process begins with tokenizing the input (breaking up words and assigning numerical values to the chunks) and ends with choosing an appropriate output from a list of probabilities. There are of course many more steps in between, and different ways to adjust the model’s behavior. All of these are made quite clear by [Brendan]’s process breakdown.

We’ve previously covered how LLMs work, explained without math which eschews gritty technical details in favor of focusing on functionality, but it’s also nice to see an approach like this one, which embraces the technical elements of exactly what is going on.

We’ve also seen a much higher-level peek at how a modern AI model like Anthropic’s Claude works when it processes requests, extracting human-understandable concepts that illustrate what’s going on under the hood.

Playing Chess Against LLMs And The Mystery Of Instruct Models

At first glance, trying to play chess against a large language model (LLM) seems like a daft idea, as its weighted nodes have, at most, been trained on some chess-adjacent texts. It has no concept of board state, stratagems, or even whatever a ‘rook’ or ‘knight’ piece is. This daftness is indeed demonstrated by [Dynomight] in a recent blog post (Substack version), where the Stockfish chess AI is pitted against a range of LLMs, from a small Llama model to GPT-3.5. Although the outcomes (see featured image) are largely as you’d expect, there is one surprise: the gpt-3.5-turbo-instruct model, which seems quite capable of giving Stockfish a run for its money, albeit on Stockfish’s lower settings.

Each model was given the same query, telling it to be a chess grandmaster, to use standard notation, and to choose its next move. The stark difference between the instruct model and the others calls investigation. OpenAI describes the instruct model as an ‘InstructGPT 3.5 class model’, which leads us to this page on OpenAI’s site and an associated 2022 paper that describes how InstructGPT is effectively the standard GPT LLM model heavily fine-tuned using human feedback.

Continue reading “Playing Chess Against LLMs And The Mystery Of Instruct Models”