Graph showing accuracy vs model

Why You Shouldn’t Trade Walter Cronkite For An LLM

Has anyone noticed that news stories have gotten shorter and pithier over the past few decades, sometimes seeming like summaries of what you used to peruse? In spite of that, huge numbers of people are relying on large language model (LLM) “AI” tools to get their news in the form of summaries. According to a study by the BBC and European Broadcasting Union, 47% of people find news summaries helpful. Over a third of Britons say they trust LLM summaries, and they probably ought not to, according to the beeb and co.

It’s a problem we’ve discussed before: as OpenAI researchers themselves admit, hallucinations are unavoidable. This more recent BBC-led study took a microscope to LLM summaries in particular, to find out how often and how badly they were tainted by hallucination.

Not all of those errors were considered a big deal, but in 20% of cases (on average) there were “major issues”–though that’s more-or-less independent of which model was being used. If there’s good news here, it’s that those numbers are better than they were when the beeb last performed this exercise earlier in the year. The whole report is worth reading if you’re a toaster-lover interested in the state of the art. (Especially if you want to see if this human-produced summary works better than an LLM-derived one.) If you’re a luddite, by contrast, you can rest easy that your instincts not to trust clanks remains reasonable… for now.

Either way, for the moment, it might be best to restrict the LLM to game dialog, and leave the news to totally-trustworthy humans who never err.

Expert Systems: The Dawn Of AI

We’ll be honest. If you had told us a few decades ago we’d teach computers to do what we want, it would work some of the time, and you wouldn’t really be able to explain or predict exactly what it was going to do, we’d have thought you were crazy. Why not just get a person? But the dream of AI goes back to the earliest days of computers or even further, if you count Samuel Butler’s letter from 1863 musing on machines evolving into life, a theme he would revisit in the 1872 book Erewhon.

Of course, early real-life AI was nothing like you wanted. Eliza seemed pretty conversational, but you could quickly confuse the program. Hexapawn learned how to play an extremely simplified version of chess, but you could just as easily teach it to lose.

But the real AI work that looked promising was the field of expert systems. Unlike our current AI friends, expert systems were highly predictable. Of course, like any computer program, they could be wrong, but if they were, you could figure out why.

Experts?

As the name implies, expert systems drew from human experts. In theory, a specialized person known as a “knowledge engineer” would work with a human expert to distill his or her knowledge down to an essential form that the computer could handle.

This could range from the simple to the fiendishly complex, and if you think it was hard to do well, you aren’t wrong. Before getting into details, an example will help you follow how it works.

Continue reading “Expert Systems: The Dawn Of AI”

Making The Smallest And Dumbest LLM With Extreme Quantization

Turns out that training on Twitch quotes doesn't make an LLM a math genius. (Credit: Codeically, YouTube)
Turns out that training on Twitch quotes doesn’t make an LLM a math genius. (Credit: Codeically, YouTube)

The reason why large language models are called ‘large’ is not because of how smart they are, but as a factor of their sheer size in bytes. At billions of parameters at four bytes each, they pose a serious challenge when it comes to not just their size on disk, but also in RAM, specifically the RAM of your videocard (VRAM). Reducing this immense size, as is done routinely for the smaller pretrained models which one can download for local use, involves quantization. This process is explained and demonstrated by [Codeically], who takes it to its logical extreme: reducing what could be a GB-sized model down to a mere 63 MB by reducing the bits per parameter.

While you can offload a model, i.e. keep only part of it in VRAM and the rest in system RAM, this massively impacts performance. An alternative is to use fewer bits per weight in the model, called ‘compression’, which typically involves reducing 16-bit floating point to 8-bit, reducing memory usage by about 75%. Going lower than this is generally deemed unadvisable.

Using GPT-2 as the base, it was trained with a pile of internet quotes, creating parameters with a very anemic 4-bit integer size. After initially manually zeroing the weights made the output too garbled, the second attempt without the zeroing did somewhat produce usable output before flying off the rails. Yet it did this with a 63 MB model at 78 tokens a second on just the CPU, demonstrating that you can create a pocket-sized chatbot to spout nonsense even without splurging on expensive hardware.

Continue reading “Making The Smallest And Dumbest LLM With Extreme Quantization”

The Lambda Papers: When LISP Got Turned Into A Microprocessor

The physical layout of the SCHEME-78 LISP-based microprocessor by Steele and Sussman. (Source: ACM, Vol 23, Issue 11, 1980)
The physical layout of the SCHEME-78 LISP-based microprocessor by Steele and Sussman. (Source: ACM, Vol 23, Issue 11, 1980)

During the AI research boom of the 1970s, the LISP language – from LISt Processor – saw a major surge in use and development, including many dialects being developed. One of these dialects was Scheme, developed by [Guy L. Steele] and [Gerald Jay Sussman], who wrote a number of articles that were published by the Massachusetts Institute of Technology (MIT) AI Lab as part of the AI Memos. This subset, called the Lambda Papers, cover the ideas from both men about lambda calculus, its application with LISP and ultimately the 1980 paper on the design of a LISP-based microprocessor.

Scheme is notable here because it influenced the development of what would be standardized in 1994 as Common Lisp, which is what can be called ‘modern Lisp’. The idea of creating dedicated LISP machines was not a new one, driven by the processing requirements of AI systems. The mismatch between the S-expressions of LISP and the typical way that assembly uses the CPUs of the era led to the development of CPUs with dedicated hardware support for LISP.

The design described by [Steele] and [Sussman] in their 1980 paper, as featured in the Communications of the ACM, features an instruction set architecture (ISA) that matches the LISP language more closely. As described, it is effectively a hardware-based LISP interpreter, implemented in a VLSI chip, called the SCHEME-78. By moving as much as possible into hardware, obviously performance is much improved. This is somewhat like how today’s AI boom is based around dedicated vector processors that excel at inference, unlike generic CPUs.

During the 1980s LISP machines began to integrate more and more hardware features, with the Symbolics and LMI systems featuring heavily. Later these systems also began to be marketed towards non-AI uses like 3D modelling and computer graphics. As however funding for AI research dried up and commodity hardware began to outpace specialized processors, so too did these systems vanish.

Top image: Symbolics 3620 and LMI Lambda Lisp machines (Credit: Jason Riedy)

Nanochat Lets You Build Your Own Hackable LLM

Few people know LLMs (Large Language Models) as thoroughly as [Andrej Karpathy], and luckily for us all he expresses that in useful open-source projects. His latest is nanochat, which he bills as a way to create “the best ChatGPT $100 can buy”.

What is it, exactly? nanochat in a minimal and hackable software project — encapsulated in a single speedrun.sh script — for creating a simple ChatGPT clone from scratch, including web interface. The codebase is about 8,000 lines of clean, readable code with minimal dependencies, making every single part of the process accessible to be tampered with.

An accessible, end-to-end codebase for creating a simple ChatGPT clone makes every part of the process hackable.

The $100 is the cost of doing the computational grunt work of creating the model, which takes about 4 hours on a single NVIDIA 8XH100 GPU node. The result is a 1.9 billion parameter micro-model, trained on some 38 billion tokens from an open dataset. This model is, as [Andrej] describes in his announcement on X, a “little ChatGPT clone you can sort of talk to, and which can write stories/poems, answer simple questions.” A walk-through of what that whole process looks like makes it as easy as possible to get started.

Unsurprisingly, a mere $100 doesn’t create a meaningful competitor to modern commercial offerings. However, significant improvements can be had by scaling up the process. A $1,000 version (detailed here) is far more coherent and capable; able to solve simple math or coding problems and take multiple-choice tests.

[Andrej Karpathy]’s work lends itself well to modification and experimentation, and we’re sure this tool will be no exception. His past work includes a method of training a GPT-2 LLM using only pure C code, and years ago we saw his work on a character-based Recurrent Neural Network (mis)used to generate baroque music by cleverly representing MIDI events as text.

Your LLM Won’t Stop Lying Any Time Soon

Researchers call it “hallucination”; you might more accurately refer to it as confabulation, hornswaggle, hogwash, or just plain BS. Anyone who has used an LLM has encountered it; some people seem to find it behind every prompt, while others dismiss it as an occasional annoyance, but nobody claims it doesn’t happen. A recent paper by researchers at OpenAI (PDF) tries to drill down a bit deeper into just why that happens, and if anything can be done.

Spoiler alert: not really. Not unless we completely re-think the way we’re training these models, anyway. The analogy used in the conclusion is to an undergraduate in an exam room. Every right answer is going to get a point, but wrong answers aren’t penalized– so why the heck not guess? You might not pass an exam that way going in blind, but if you have studied (i.e., sucked up the entire internet without permission for training data) then you might get a few extra points. For an LLM’s training, like a student’s final grade, every point scored on the exam is a good point. Continue reading “Your LLM Won’t Stop Lying Any Time Soon”

Divining Air Quality With A Cheap Computer Vision Device

There are all kinds of air quality sensors on the market that rely on all kinds of electro-physical effects to detect gases or contaminants and report them back as a value. [lucascreator] has instead been investigating a method of determining air quality that is closer to divination than measurement—using computer vision and a trained AI model.

The system relies on an Unihiker K10—a microcontroller module based around the ESP32-S3 at heart. The chip is running a lightweight convolutional neural network (CNN) trained on 12,000 images of the sky. These images were sourced from a public dataset; they were taken in India and Nepal, and tagged with the relevant Air Quality Index at the time of capture. [lucascreator] used this data to train their model to look at an image taken with a camera attached to the ESP32 and estimate the air quality index based on what it has seen in that existing dataset.

It might sound like a spurious concept, but it does have some value. [lucascreator] cites studies where video data was used for low-cost air quality estimation—not as a replacement for proper measurement, but as an additional data point that could be sourced from existing surveillance infrastructure. Performance of such models has, in some cases, been remarkably accurate.

[lucascreator] is pragmatic about the limitations of their implementation of this concept, noting that their very compact model didn’t always perform the best in terms of determining actual air quality. The concept may have some value, but implementing it on an ESP32 isn’t so easy if you’re looking for supreme accuracy. We’ve featured some other great air quality projects before, though, if you’re looking for other ways to capture this information. Video after the break.

Continue reading “Divining Air Quality With A Cheap Computer Vision Device”