Making The Smallest And Dumbest LLM With Extreme Quantization

Turns out that training on Twitch quotes doesn't make an LLM a math genius. (Credit: Codeically, YouTube)
Turns out that training on Twitch quotes doesn’t make an LLM a math genius. (Credit: Codeically, YouTube)

The reason why large language models are called ‘large’ is not because of how smart they are, but as a factor of their sheer size in bytes. At billions of parameters at four bytes each, they pose a serious challenge when it comes to not just their size on disk, but also in RAM, specifically the RAM of your videocard (VRAM). Reducing this immense size, as is done routinely for the smaller pretrained models which one can download for local use, involves quantization. This process is explained and demonstrated by [Codeically], who takes it to its logical extreme: reducing what could be a GB-sized model down to a mere 63 MB by reducing the bits per parameter.

While you can offload a model, i.e. keep only part of it in VRAM and the rest in system RAM, this massively impacts performance. An alternative is to use fewer bits per weight in the model, called ‘compression’, which typically involves reducing 16-bit floating point to 8-bit, reducing memory usage by about 75%. Going lower than this is generally deemed unadvisable.

Using GPT-2 as the base, it was trained with a pile of internet quotes, creating parameters with a very anemic 4-bit integer size. After initially manually zeroing the weights made the output too garbled, the second attempt without the zeroing did somewhat produce usable output before flying off the rails. Yet it did this with a 63 MB model at 78 tokens a second on just the CPU, demonstrating that you can create a pocket-sized chatbot to spout nonsense even without splurging on expensive hardware.

Continue reading “Making The Smallest And Dumbest LLM With Extreme Quantization”

The Lambda Papers: When LISP Got Turned Into A Microprocessor

The physical layout of the SCHEME-78 LISP-based microprocessor by Steele and Sussman. (Source: ACM, Vol 23, Issue 11, 1980)
The physical layout of the SCHEME-78 LISP-based microprocessor by Steele and Sussman. (Source: ACM, Vol 23, Issue 11, 1980)

During the AI research boom of the 1970s, the LISP language – from LISt Processor – saw a major surge in use and development, including many dialects being developed. One of these dialects was Scheme, developed by [Guy L. Steele] and [Gerald Jay Sussman], who wrote a number of articles that were published by the Massachusetts Institute of Technology (MIT) AI Lab as part of the AI Memos. This subset, called the Lambda Papers, cover the ideas from both men about lambda calculus, its application with LISP and ultimately the 1980 paper on the design of a LISP-based microprocessor.

Scheme is notable here because it influenced the development of what would be standardized in 1994 as Common Lisp, which is what can be called ‘modern Lisp’. The idea of creating dedicated LISP machines was not a new one, driven by the processing requirements of AI systems. The mismatch between the S-expressions of LISP and the typical way that assembly uses the CPUs of the era led to the development of CPUs with dedicated hardware support for LISP.

The design described by [Steele] and [Sussman] in their 1980 paper, as featured in the Communications of the ACM, features an instruction set architecture (ISA) that matches the LISP language more closely. As described, it is effectively a hardware-based LISP interpreter, implemented in a VLSI chip, called the SCHEME-78. By moving as much as possible into hardware, obviously performance is much improved. This is somewhat like how today’s AI boom is based around dedicated vector processors that excel at inference, unlike generic CPUs.

During the 1980s LISP machines began to integrate more and more hardware features, with the Symbolics and LMI systems featuring heavily. Later these systems also began to be marketed towards non-AI uses like 3D modelling and computer graphics. As however funding for AI research dried up and commodity hardware began to outpace specialized processors, so too did these systems vanish.

Top image: Symbolics 3620 and LMI Lambda Lisp machines (Credit: Jason Riedy)

Nanochat Lets You Build Your Own Hackable LLM

Few people know LLMs (Large Language Models) as thoroughly as [Andrej Karpathy], and luckily for us all he expresses that in useful open-source projects. His latest is nanochat, which he bills as a way to create “the best ChatGPT $100 can buy”.

What is it, exactly? nanochat in a minimal and hackable software project — encapsulated in a single speedrun.sh script — for creating a simple ChatGPT clone from scratch, including web interface. The codebase is about 8,000 lines of clean, readable code with minimal dependencies, making every single part of the process accessible to be tampered with.

An accessible, end-to-end codebase for creating a simple ChatGPT clone makes every part of the process hackable.

The $100 is the cost of doing the computational grunt work of creating the model, which takes about 4 hours on a single NVIDIA 8XH100 GPU node. The result is a 1.9 billion parameter micro-model, trained on some 38 billion tokens from an open dataset. This model is, as [Andrej] describes in his announcement on X, a “little ChatGPT clone you can sort of talk to, and which can write stories/poems, answer simple questions.” A walk-through of what that whole process looks like makes it as easy as possible to get started.

Unsurprisingly, a mere $100 doesn’t create a meaningful competitor to modern commercial offerings. However, significant improvements can be had by scaling up the process. A $1,000 version (detailed here) is far more coherent and capable; able to solve simple math or coding problems and take multiple-choice tests.

[Andrej Karpathy]’s work lends itself well to modification and experimentation, and we’re sure this tool will be no exception. His past work includes a method of training a GPT-2 LLM using only pure C code, and years ago we saw his work on a character-based Recurrent Neural Network (mis)used to generate baroque music by cleverly representing MIDI events as text.

Your LLM Won’t Stop Lying Any Time Soon

Researchers call it “hallucination”; you might more accurately refer to it as confabulation, hornswaggle, hogwash, or just plain BS. Anyone who has used an LLM has encountered it; some people seem to find it behind every prompt, while others dismiss it as an occasional annoyance, but nobody claims it doesn’t happen. A recent paper by researchers at OpenAI (PDF) tries to drill down a bit deeper into just why that happens, and if anything can be done.

Spoiler alert: not really. Not unless we completely re-think the way we’re training these models, anyway. The analogy used in the conclusion is to an undergraduate in an exam room. Every right answer is going to get a point, but wrong answers aren’t penalized– so why the heck not guess? You might not pass an exam that way going in blind, but if you have studied (i.e., sucked up the entire internet without permission for training data) then you might get a few extra points. For an LLM’s training, like a student’s final grade, every point scored on the exam is a good point. Continue reading “Your LLM Won’t Stop Lying Any Time Soon”

Divining Air Quality With A Cheap Computer Vision Device

There are all kinds of air quality sensors on the market that rely on all kinds of electro-physical effects to detect gases or contaminants and report them back as a value. [lucascreator] has instead been investigating a method of determining air quality that is closer to divination than measurement—using computer vision and a trained AI model.

The system relies on an Unihiker K10—a microcontroller module based around the ESP32-S3 at heart. The chip is running a lightweight convolutional neural network (CNN) trained on 12,000 images of the sky. These images were sourced from a public dataset; they were taken in India and Nepal, and tagged with the relevant Air Quality Index at the time of capture. [lucascreator] used this data to train their model to look at an image taken with a camera attached to the ESP32 and estimate the air quality index based on what it has seen in that existing dataset.

It might sound like a spurious concept, but it does have some value. [lucascreator] cites studies where video data was used for low-cost air quality estimation—not as a replacement for proper measurement, but as an additional data point that could be sourced from existing surveillance infrastructure. Performance of such models has, in some cases, been remarkably accurate.

[lucascreator] is pragmatic about the limitations of their implementation of this concept, noting that their very compact model didn’t always perform the best in terms of determining actual air quality. The concept may have some value, but implementing it on an ESP32 isn’t so easy if you’re looking for supreme accuracy. We’ve featured some other great air quality projects before, though, if you’re looking for other ways to capture this information. Video after the break.

Continue reading “Divining Air Quality With A Cheap Computer Vision Device”

Mesa Project Adds Code Comprehension Requirement After AI Slop Incident

Recently [Faith Ekstrand] announced on Mastodon that Mesa was updating its contributor guide. This follows a recent AI slop incident where someone submitted a massive patch to the Mesa project with the claim that this would improve performance ‘by a few percent’. The catch? The entire patch was generated by ChatGPT, with the submitter becoming somewhat irate when the very patient Mesa developers tried to explain that they’d happily look at the issue after the submitter had condensed the purported ‘improvement’ into a bite-sized patch.

The entire saga is summarized in a recent video by [Brodie Robertson] which highlights both how incredibly friendly the Mesa developers are, and how the use of ChatGPT and kin has made some people with zero programming skills apparently believe that they can now contribute code to OSS projects. Unsurprisingly, the Mesa developers were unable to disabuse this particular individual from that notion, but the diff to the Mesa contributor guide by [Timur Kristóf] should make abundantly clear that someone playing Telephone between a chatbot and OSS project developers is neither desirable nor helpful.

That said, [Brodie] also highlights a recent post by [Daniel Stenberg] of Curl fame, who thanked [Joshua Rogers] for contributing a massive list of potential issues that were found using ‘AI-assisted tools’, as detailed in this blog post by [Joshua]. An important point here is that these ‘AI tools’ are not LLM-based chatbots, but rather tweaked existing tools like static code analyzers with more smarts bolted on. They’re purpose-made tools that still require you to know what you’re doing, but they can be a real asset to a developer, and a heck of a lot more useful to a project like Curl than getting sent fake bug reports by a confabulating chatbot as has happened previously.

Continue reading “Mesa Project Adds Code Comprehension Requirement After AI Slop Incident”

LLM Dialogue In Animal Crossing Actually Works Very Well

In the original Animal Crossing from 2001, players are able to interact with a huge cast of quirky characters, all with different interests and personalities. But after you’ve played the game for awhile, the scripted interactions can become a bit monotonous. Seeing an opportunity to improve the experience, [josh] decided to put a Large Language Model (LLM) in charge of these interactions. Now when the player chats with other characters in the game, the dialogue is a lot more engaging, relevant, and sometimes just plain funny.

How does one go about hooking a modern LLM into a 24-year-old game built for an entirely offline console? [josh]’s clever approach required a lot of poking about, and did a good job of leveraging some of the game’s built-in features for a seamless result.

Continue reading “LLM Dialogue In Animal Crossing Actually Works Very Well”