Nanochat Lets You Build Your Own Hackable LLM

Few people know LLMs (Large Language Models) as thoroughly as [Andrej Karpathy], and luckily for us all he expresses that in useful open-source projects. His latest is nanochat, which he bills as a way to create “the best ChatGPT $100 can buy”.

What is it, exactly? nanochat in a minimal and hackable software project — encapsulated in a single speedrun.sh script — for creating a simple ChatGPT clone from scratch, including web interface. The codebase is about 8,000 lines of clean, readable code with minimal dependencies, making every single part of the process accessible to be tampered with.

An accessible, end-to-end codebase for creating a simple ChatGPT clone makes every part of the process hackable.

The $100 is the cost of doing the computational grunt work of creating the model, which takes about 4 hours on a single NVIDIA 8XH100 GPU node. The result is a 1.9 billion parameter micro-model, trained on some 38 billion tokens from an open dataset. This model is, as [Andrej] describes in his announcement on X, a “little ChatGPT clone you can sort of talk to, and which can write stories/poems, answer simple questions.” A walk-through of what that whole process looks like makes it as easy as possible to get started.

Unsurprisingly, a mere $100 doesn’t create a meaningful competitor to modern commercial offerings. However, significant improvements can be had by scaling up the process. A $1,000 version (detailed here) is far more coherent and capable; able to solve simple math or coding problems and take multiple-choice tests.

[Andrej Karpathy]’s work lends itself well to modification and experimentation, and we’re sure this tool will be no exception. His past work includes a method of training a GPT-2 LLM using only pure C code, and years ago we saw his work on a character-based Recurrent Neural Network (mis)used to generate baroque music by cleverly representing MIDI events as text.

Your LLM Won’t Stop Lying Any Time Soon

Researchers call it “hallucination”; you might more accurately refer to it as confabulation, hornswaggle, hogwash, or just plain BS. Anyone who has used an LLM has encountered it; some people seem to find it behind every prompt, while others dismiss it as an occasional annoyance, but nobody claims it doesn’t happen. A recent paper by researchers at OpenAI (PDF) tries to drill down a bit deeper into just why that happens, and if anything can be done.

Spoiler alert: not really. Not unless we completely re-think the way we’re training these models, anyway. The analogy used in the conclusion is to an undergraduate in an exam room. Every right answer is going to get a point, but wrong answers aren’t penalized– so why the heck not guess? You might not pass an exam that way going in blind, but if you have studied (i.e., sucked up the entire internet without permission for training data) then you might get a few extra points. For an LLM’s training, like a student’s final grade, every point scored on the exam is a good point. Continue reading “Your LLM Won’t Stop Lying Any Time Soon”

LLM Dialogue In Animal Crossing Actually Works Very Well

In the original Animal Crossing from 2001, players are able to interact with a huge cast of quirky characters, all with different interests and personalities. But after you’ve played the game for awhile, the scripted interactions can become a bit monotonous. Seeing an opportunity to improve the experience, [josh] decided to put a Large Language Model (LLM) in charge of these interactions. Now when the player chats with other characters in the game, the dialogue is a lot more engaging, relevant, and sometimes just plain funny.

How does one go about hooking a modern LLM into a 24-year-old game built for an entirely offline console? [josh]’s clever approach required a lot of poking about, and did a good job of leveraging some of the game’s built-in features for a seamless result.

Continue reading “LLM Dialogue In Animal Crossing Actually Works Very Well”

Macintosh System 7 Ported To X86 With LLM Help

You can use large language models for all sorts of things these days, from writing terrible college papers to bungling legal cases. Or, you can employ them to more interesting ends, such as porting Macintosh System 7 to the x86 architecture, like [Kelsi Davis] did.

When Apple created the Macintosh lineup in the 1980s, it based the computer around Motorola’s 68K CPU architecture. These 16-bit/32-bit CPUs were plenty capable for the time, but the platform ultimately didn’t have the same expansive future as Intel’s illustrious x86 architecture that underpinned rival IBM-compatible machines.

[Kelsi Davis] decided to port the Macintosh System 7 OS to run on native x86 hardware, which would be challenging enough with full access to the source code. However, she instead performed this task by analyzing and reverse engineering the System 7 binaries with the aid of Ghidra and a large language model. Soon enough, she had the classic System 7 desktop running on QEMU with a fully-functional Finder and the GUI working as expected. [Kelsi] credits the LLM with helping her achieve this feat in just three days, versus what she would expect to be a multi-year effort if working unassisted.

Files are on GitHub for the curious. We love a good port around these parts; we particularly enjoyed these efforts to recreate Portal on the N64. If you’re doing your own advanced tinkering with Macintosh software from yesteryear, don’t hesitate to let us know.

Fully-Local AI Agent Runs On Raspberry Pi, With A Little Patience

[Simone]’s AI assistant, dubbed Max Headbox, is a wakeword-triggered local AI agent capable of following instructions and doing simple tasks. It’s an experiment in many ways, but also a great demonstration not only of what is possible with the kinds of open tools and hardware available to a modern hobbyist, but also a reminder of just how far some of these software tools have come in only a few short years.

Max Headbox is not just a local large language model (LLM) running on Pi hardware; the model is able to make tool calls in a loop, chaining them together to complete tasks. This means the system can break down a spoken instruction (for example, “find the weather report for today and email it to me”) into a series of steps to complete, utilizing software tools as needed throughout the process until the task is finished.

Continue reading “Fully-Local AI Agent Runs On Raspberry Pi, With A Little Patience”

OpenAI Releases Gpt-oss AI Model, Offers Bounty For Vulnerabilities

OpenAI have just released gpt-oss, an AI large language model (LLM) available for local download and offline use licensed under Apache 2.0, and optimized for efficiency on a variety of platforms without compromising performance. This is their first such “open” release, and it’s with a model whose features and capabilities compare favorably to some of their hosted services.

OpenAI have partnered with ollama for the launch which makes onboarding ridiculously easy. ollama is an open source, MIT-licensed project for installing and running local LLMs, but there’s no real tie-in to that platform. The models are available separately: gpt-oss-20b can run within 16 GB of memory, and the larger and more capable gpt-oss-120b requires 80 GB. OpenAI claims the smaller model is comparable to their own hosted o3-mini “reasoning” model, and the larger model outperforms it. Both support features like tool use (such as web browsing) and more.

LLMs that can be downloaded and used offline are nothing new, but a couple things make this model release a bit different from others. One is that while OpenAI have released open models such as Whisper (a highly capable speech-to-text model), this is actually the first LLM they have released in such a way.

The other notable thing is this release coincides with a bounty challenge for finding novel flaws and vulnerabilities in gpt-oss-20b. Does ruining such a model hold more appeal to you than running it? If so, good news because there’s a total of $500,000 to be disbursed. But there’s no time to waste; submissions need to be in by August 26th, 2025.

AI Code Review The Right Way

Do you use a spell checker? We’ll guess you do. Would you use a button that just said “correct all spelling errors in document?” Hopefully not. Your word processor probably doesn’t even offer that as an option. Why? Because a spellchecker will reject things not in its dictionary (like Hackaday, maybe). It may guess the wrong word as the correct word. Of course, it also may miss things like “too” vs. “two.” So why would you just blindly accept AI code review? You wouldn’t, and that’s [Bill Mill’s] point with his recent tool made to help him do better code reviews.

He points out that he ignores most of the suggestions the tool outputs, but that it has saved him from some errors. Like a spellcheck, sometimes you just hit ignore. But at least you don’t have to check every single word.

Continue reading “AI Code Review The Right Way”