Why Model Collapse In LLMs Is Inevitable With Self-Learning

There is a persistent belief in the ‘AI’ community that large language models (LLMs) have the ability to learn and self-improve by tweaking the weights in their vector space. Although there’s scant evidence that tweaking a probability vector space is anything like the learning process in biological brains, we nevertheless get sold the idea that artificial general intelligence (AGI) is just around the corner if we do just enough tweaking.

Instead of emerging super intelligence, the most likely outcome is what is called model collapse, with a recent paper by [Hector Zenil] going over the details on why self-training/learning in LLMs and similar systems is a fool’s errand. For those who just want the brief summary with all the memes, [Metin] wrote a blog post covering the basics.

In the end an LLM as well as a diffusion model (DM) is a statistical model of input data using which a statistically likely output can be generated (inferred) based on an input query. It follows intuitively that by using said outputĀ  to adjust the model with, the model will over time converge on a kind of statistical singularity rather than some ‘AI singularity’ event. This is also why these models need to be constantly trained with external, human-generated data in order to prevent such a collapse.

In the paper by [Hector] a mathematical model is created to demonstrate that an LLM, DM or similar statistical model undergoes degenerative dynamics whenever said external input is reduced. Although in the paper a mechanism is suggested to counter the entropy decay within the model, the ultimate point is that a statistical model cannot improve itself without continuous external anchoring.

The idea of LLMs being at all intelligent in any sense has been a contentious one, with the concept of language models being equated with ‘AI’ dating back to the 20th century, including as fun home computer projects. Much of the problem probably lies in humans projecting intelligent behavior onto these statistical models, turning LLMs into ‘counterfeit humans’, not helped by how closely generated text can resemble something written by a human, even if completely confabulated.

Thanks to [deshipu] for the tip.

Trying Pair Programming With An LLM Chatbot

When it comes to software developers, there are a few distinct types. For example, the extroverted, chatty type, who is always going out there to share the latest and newest libraries and projects with everyone, and is very much into bouncing ideas off others, regardless of whether they know what you’re talking about. Then there is the introverted loner, who prefers to tackle programming challenges by bouncing things around inside their own minds and going on long walks to mull things over before committing to anything significant.

This leads to interesting scenarios when it comes to management-enforced ‘optimization’ strategies, like Pair Programming. This approach involves two developers sharing the same computer and keyboard, theoretically doubling the effective output by some kind of metric, but realistically often leading to at least one side feeling pretty miserable and disconnected unless you put two of the chatty types together.

As a certified introverted loner developer, the idea of using an LLM chatbot as a coding assistant naturally triggers unpleasant flashbacks to hours of forced awkward pair ‘programming’. However, maybe using an LLM chatbot could be more pleasant because you can skip the whole awkward socializing bit. In order to give it a shake, I put together a little experiment to see whether LLM-based coding assistants is something that I could come to appreciate, unlike pair programming.

Continue reading “Trying Pair Programming With An LLM Chatbot”

Reverse-Engineering Human Cognition And Decision Making In A Modern Age

Cognitive processes are not something that we generally pay much attention to until something goes wrong, but they cover the entire scope of us ingesting sensory information, the processing and recalling thereof, as well as any resulting decisions made based on such internal deliberation.

Within that context there has also long been a struggle between those who feel that it’s fine for humans to rely on available technologies to make tasks like information recall and calculations easier, and those who insist that a human should be perfectly capable of doing such tasks without any assistance. Plato argued that reading and writing hurt our ability to memorize, and for the longest time it was deemed inappropriate for students to even consider taking one of those newfangled digital calculators into an exam, while now we have many arguing that using an ‘AI’ is the equivalent of using a calculator.

At the root of this conundrum lies the distinction between that which enhances and that which hampers human cognition. When does one merely offload tasks to a device or object, and when does one harm one’s own cognition?

Continue reading “Reverse-Engineering Human Cognition And Decision Making In A Modern Age”

How Vibe Coding Is Killing Open Source

Does vibe coding risk destroying the Open Source ecosystem? According to a pre-print paper by a number of high-profile researchers, this might indeed be the case based on observed patterns and some modelling. Their warnings mostly center around the way that user interaction is pulled away from OSS projects, while also making starting a new OSS project significantly harder.

“Vibe coding” here is defined as software development that is assisted by an LLM-backed chatbot, where the developer asks the chatbot to effectively write the code for them. Arguably this turns the developer into more of a customer/client of the chatbot, with no requirement for the former to understand what the latter’s code does, just that what is generated does the thing that the chatbot was asked to create.

This also removes the typical more organic selection process of libraries and tooling, replacing it with whatever was most prevalent in the LLM’s training data. Even for popular projects visits to their website decrease as downloads and documentation are replaced by LLM chatbot interactions, reducing the possibility of promoting commercial plans, sponsorships, and community forums. Much of this is also reflected in the plummet in usage of community forums like Stack Overflow.

Continue reading “How Vibe Coding Is Killing Open Source”

Where Is Mathematics Going? Large Language Models And Lean Proof Assistant

If you’re a hacker you may well have a passing interest in math, and if you have an interest in math you might like to hear about the direction of mathematical research. In a talk on this topic [Kevin Buzzard], professor of pure mathematics at Imperial College London, asks the question: Where is Mathematics Going?

It starts by explaining that in 2017 he had a mid-life crisis, of sorts, becoming disillusioned with the way mathematics research was being done, and he started looking to computer science for solutions.

He credits Euclid, as many do, with writing down some axioms and starting mathematics, over 2,000 years ago. From axioms came deductions, and deductions became mathematical facts, and math proceeded in this fashion. This continues to be the way mathematical research is done in mathematical departments around the world. The consequence of this is that mathematics is now incomprehensibly large. Similarly the mathematical proofs themselves are exceedingly large, he gives an example of one proof that is 10,000 pages long and still hasn’t been completely written down after having been announced more than 20 years ago.

The conclusion from this is that mathematics has become so complex that traditional methods of documenting it struggle to cope. He says that a tertiary education in mathematics aims to “get students to the 1940s”, whereas a tertiary education in computer science will expose students to the state of the art.

Continue reading “Where Is Mathematics Going? Large Language Models And Lean Proof Assistant”

Measuring The Impact Of LLMs On Experienced Developer Productivity

Recently AI risk and benefit evaluation company METR ran a randomized control test (RCT) on a gaggle of experienced open source developers to gain objective data on how the use of LLMs affects their productivity. Their findings were that using LLM-based tools like Cursor Pro with Claude 3.5/3.7 Sonnet reduced productivity by about 19%, with the full study by [Joel Becker] et al. available as PDF.

This study was also intended to establish a methodology to assess the impact from introducing LLM-based tools in software development. In the RCT, 16 experienced open source software developers were given 246 tasks, after which their effective performance was evaluated.

A large focus of the methodology was on creating realistic scenarios instead of using canned benchmarks. This included adding features to code, bug fixes and refactoring, much as they would do in the work on their respective open source projects. The observed increase in the time it took to complete tasks with the LLM’s assistance was found to be likely due to a range of factors, including over-optimism about the LLM tool capabilities, LLMs interfering with existing knowledge on the codebase, poor LLM performance on large codebases, low reliability of the generated code and the LLM doing very poorly on using tacit knowledge and context.

Although METR suggests that this poor showing may improve over time, it seems fair to argue whether LLM coding tools are at all a useful coding partner.

AI Is Only Coming For Fun Jobs

In the past few years, what marketers and venture capital firms term “artificial intelligence” but is more often an advanced predictive text model of some sort has started taking people’s jobs and threatening others. But not tedious jobs that society might like to have automated away in the first place. These AI tools have generally been taking rewarding or enjoyable jobs like artist, author, filmmaker, programmer, and composer. This project from a research team might soon be able to add astronaut to that list.

The team was working within the confines of the Kerbal Space Program Differential Game Challenge, an open-source plugin from MIT that allows developers to test various algorithms and artificial intelligences in simulated spacecraft situations. Generally, purpose-built models are used here with many rounds of refinement and testing, but since this process can be time consuming and costly the researchers on this team decided to hand over control to ChatGPT with only limited instructions. A translation layer built by the researchers allows generated text to be converted to spacecraft controls.

We’ll note that, at least as of right now, large language models haven’t taken the jobs of any actual astronauts yet. The game challenge is generally meant for non-manned spacecraft like orbital satellites which often need to make their own decisions to maintain orbits and avoid obstacles. This specific model was able to place second in a recent competition as well, although we’ll keep rooting for humans in certain situations like these.