Can Skynet Be A Statesman?

There’s been a lot of virtual ink spilled about LLMs and their coding ability. Some people swear by the vibes, while others, like the  FreeBSD devs have sworn them off completely. What we don’t often think about is the bigger picture: What does AI do to our civilization? That’s the thrust of a recent paper from the Boston University School of Law, “How AI Destroys Institutions”. Yes, Betteridge strikes again.

We’ve talked before about LLMs and coding productivity, but [Harzog] and [Sibly] from the school of law take a different approach. They don’t care how well Claude or Gemini can code; they care what having them around is doing to the sinews of civilization. As you can guess from the title, it’s nothing good.

"A computer must never make a management decision."
Somehow the tl;dr was written decades before the paper was.

The paper a bit of a slog, but worth reading in full, even if the language is slightly laywer-y. To summarize in brief, the authors try and identify the key things that make our institutions work, and then show one by one how each of these pillars is subtly corroded by use of LLMs. The argument isn’t that your local government clerk using ChatGPT is going to immediately result in anarchy; rather it will facilitate a slow transformation of the democratic structures we in the West take for granted. There’s also a jeremiad about LLMs ruining higher education buried in there, a problem we’ve talked about before.

If you agree with the paper, you may find yourself wishing we could launch the clankers into orbit… and turn off the downlink. If not, you’ll probably let us know in the comments. Please keep the flaming limited to below gas mark 2.

A photo of the cats and the generated image

The Cutest Weather Forecast On E-Ink And ESP32

There’s a famous book that starts: “It is a truth universally acknowledged that a man in possession of a good e-ink display, must be in want of a weather station.” — or something like that, anyway. We’re not English majors. We are, however, major fans of this feline-based e-ink weather display by [Jesse Ward-Bond]. It’s got everything: e-ink, cats, and AI.

The generated image needs a little massaging to look nice on the Spectra6 e-ink display.

AI? Well, it might seem a bit gratuitous for a simple weather display, but [Jesse] wanted something a little more personalized and dynamic than just icons. With that in the design brief, he turned to Google’s Nano Banana API, feeding it the forecast and a description of his cats to automatically generate a cute scene to match the day’s weather.

That turned out to not be enough variety for the old monkey brain, so the superiority of silicon — specifically Gemini–was called upon to write unique daily prompts for Nano Banana using a random style from a list presumably generated by TinyLlama running on a C64. Okay, no, [Jesse] wrote the prompt for Gemini himself. It can’t be LLM’s all the way down, after all. Gemini is also picking the foreground, background, and activity the cats will be doing for maximum neophilia.

Aside from the parts that are obviously on Google servers, this is all integrated in [Jesse]’s Home Assistant server. That server stores the generated image until the ESP32 fetches it. He’s using a reTerminal board from SeedStudio that includes an ESP32-S3 and a Spectra6 colour e-ink display. That display leaves something to be desired in coloration, so on top of dithering the image to match the palette of the display, he’s also got a bit of color-correction in place to make it really pop.

If you’re interested in replicating this feline forecast, [Jesse] has shared the code on GitHub, but it comes with a warning: cuteness isn’t free. That is to say, the tokens for the API calls to generate these images aren’t free; [Jesse] estimates that when the sign-up bonus is used up, it should cost about fourteen cents a pop at current rates. Worth it? That’s a personal choice. Some might prefer saving their pennies and checking the forecast on something more physical, while others might prefer the retro touch only a CRT can provide. 

A graph showing the poisoning success rate of 7B and 13B parameter models

It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds

It stands to reason that if you have access to an LLM’s training data, you can influence what’s coming out the other end of the inscrutable AI’s network. The obvious guess is that you’d need some percentage of the overall input, though exactly how much that was — 2%, 1%, or less — was an active research question. New research by Anthropic, the UK AI Security Institute, and the Alan Turing Institute shows it is actually a lot easier to poison the well than that.

We’re talking parts-per-million of poison for large models, because the researchers found that with just 250 carefully-crafted poison pills, they could compromise the output of any size LLM. Now, when we say poison the model, we’re not talking about a total hijacking, at least in this study. The specific backdoor under investigation was getting the model to produce total gibberish.

Continue reading “It Only Takes A Handful Of Samples To Poison Any Size LLM, Anthropic Finds”

Graph showing accuracy vs model

Why You Shouldn’t Trade Walter Cronkite For An LLM

Has anyone noticed that news stories have gotten shorter and pithier over the past few decades, sometimes seeming like summaries of what you used to peruse? In spite of that, huge numbers of people are relying on large language model (LLM) “AI” tools to get their news in the form of summaries. According to a study by the BBC and European Broadcasting Union, 47% of people find news summaries helpful. Over a third of Britons say they trust LLM summaries, and they probably ought not to, according to the beeb and co.

It’s a problem we’ve discussed before: as OpenAI researchers themselves admit, hallucinations are unavoidable. This more recent BBC-led study took a microscope to LLM summaries in particular, to find out how often and how badly they were tainted by hallucination.

Not all of those errors were considered a big deal, but in 20% of cases (on average) there were “major issues”–though that’s more-or-less independent of which model was being used. If there’s good news here, it’s that those numbers are better than they were when the beeb last performed this exercise earlier in the year. The whole report is worth reading if you’re a toaster-lover interested in the state of the art. (Especially if you want to see if this human-produced summary works better than an LLM-derived one.) If you’re a luddite, by contrast, you can rest easy that your instincts not to trust clanks remains reasonable… for now.

Either way, for the moment, it might be best to restrict the LLM to game dialog, and leave the news to totally-trustworthy humans who never err.

Nanochat Lets You Build Your Own Hackable LLM

Few people know LLMs (Large Language Models) as thoroughly as [Andrej Karpathy], and luckily for us all he expresses that in useful open-source projects. His latest is nanochat, which he bills as a way to create “the best ChatGPT $100 can buy”.

What is it, exactly? nanochat in a minimal and hackable software project — encapsulated in a single speedrun.sh script — for creating a simple ChatGPT clone from scratch, including web interface. The codebase is about 8,000 lines of clean, readable code with minimal dependencies, making every single part of the process accessible to be tampered with.

An accessible, end-to-end codebase for creating a simple ChatGPT clone makes every part of the process hackable.

The $100 is the cost of doing the computational grunt work of creating the model, which takes about 4 hours on a single NVIDIA 8XH100 GPU node. The result is a 1.9 billion parameter micro-model, trained on some 38 billion tokens from an open dataset. This model is, as [Andrej] describes in his announcement on X, a “little ChatGPT clone you can sort of talk to, and which can write stories/poems, answer simple questions.” A walk-through of what that whole process looks like makes it as easy as possible to get started.

Unsurprisingly, a mere $100 doesn’t create a meaningful competitor to modern commercial offerings. However, significant improvements can be had by scaling up the process. A $1,000 version (detailed here) is far more coherent and capable; able to solve simple math or coding problems and take multiple-choice tests.

[Andrej Karpathy]’s work lends itself well to modification and experimentation, and we’re sure this tool will be no exception. His past work includes a method of training a GPT-2 LLM using only pure C code, and years ago we saw his work on a character-based Recurrent Neural Network (mis)used to generate baroque music by cleverly representing MIDI events as text.

Your LLM Won’t Stop Lying Any Time Soon

Researchers call it “hallucination”; you might more accurately refer to it as confabulation, hornswaggle, hogwash, or just plain BS. Anyone who has used an LLM has encountered it; some people seem to find it behind every prompt, while others dismiss it as an occasional annoyance, but nobody claims it doesn’t happen. A recent paper by researchers at OpenAI (PDF) tries to drill down a bit deeper into just why that happens, and if anything can be done.

Spoiler alert: not really. Not unless we completely re-think the way we’re training these models, anyway. The analogy used in the conclusion is to an undergraduate in an exam room. Every right answer is going to get a point, but wrong answers aren’t penalized– so why the heck not guess? You might not pass an exam that way going in blind, but if you have studied (i.e., sucked up the entire internet without permission for training data) then you might get a few extra points. For an LLM’s training, like a student’s final grade, every point scored on the exam is a good point. Continue reading “Your LLM Won’t Stop Lying Any Time Soon”

LLM Dialogue In Animal Crossing Actually Works Very Well

In the original Animal Crossing from 2001, players are able to interact with a huge cast of quirky characters, all with different interests and personalities. But after you’ve played the game for awhile, the scripted interactions can become a bit monotonous. Seeing an opportunity to improve the experience, [josh] decided to put a Large Language Model (LLM) in charge of these interactions. Now when the player chats with other characters in the game, the dialogue is a lot more engaging, relevant, and sometimes just plain funny.

How does one go about hooking a modern LLM into a 24-year-old game built for an entirely offline console? [josh]’s clever approach required a lot of poking about, and did a good job of leveraging some of the game’s built-in features for a seamless result.

Continue reading “LLM Dialogue In Animal Crossing Actually Works Very Well”