The Math You Need To Start Understanding LLMs

May 4, 2026 by Maya Posch 18 Comments

Once you peel back the hype and mysticism, large language models (LLMs) are a fascinating application of statistical models, effectively what you get when you dial a basic auto-complete model up to eleven. In order to analyze a mind-boggling amount of text and produce meaningful auto-completion results quite a bit of math is involved, with a recent three-part article series by [Giles] going through the basics of inference, being the prediction step using a trained model.

The text is encoded in the LLM’s vector space as token IDs, each token being a text fragment that has some probability of following another ID, such as when cats may be found on desks, as in the above photo by [Giles]. With inference multiple of such IDs are retrieved in a vector from which in successive steps a sentence can be pieced together. These so-called logits are detailed in the first article in the series, with the second article focusing on vocabulary space and embedding, as well as the matrix operations used for inference.

Finally, the third article puts all of this together and looks at transformers, which is a crucial part of GPT (generative pretrained transformer) LLM architecture. Of note is the attention mechanism, which takes GPTs beyond merely being glorified auto-complete systems by adding pattern matching. Here we can see how the statistical model of the LLM is used to generate a rather plausible output, which is where the human has to ask themselves in how far they feel that it is correct.

Of course, there goes a lot more into making LLMs and GPTs performant, such as key-value caches that massively speed up inference.

Why Model Collapse In LLMs Is Inevitable With Self-Learning

April 29, 2026 by Maya Posch 76 Comments

There is a persistent belief in the ‘AI’ community that large language models (LLMs) have the ability to learn and self-improve by tweaking the weights in their vector space. Although there’s scant evidence that tweaking a probability vector space is anything like the learning process in biological brains, we nevertheless get sold the idea that artificial general intelligence (AGI) is just around the corner if we do just enough tweaking.

Instead of emerging super intelligence, the most likely outcome is what is called model collapse, with a recent paper by [Hector Zenil] going over the details on why self-training/learning in LLMs and similar systems is a fool’s errand. For those who just want the brief summary with all the memes, [Metin] wrote a blog post covering the basics.

In the end an LLM as well as a diffusion model (DM) is a statistical model of input data using which a statistically likely output can be generated (inferred) based on an input query. It follows intuitively that by using said output to adjust the model with, the model will over time converge on a kind of statistical singularity rather than some ‘AI singularity’ event. This is also why these models need to be constantly trained with external, human-generated data in order to prevent such a collapse.

In the paper by [Hector] a mathematical model is created to demonstrate that an LLM, DM or similar statistical model undergoes degenerative dynamics whenever said external input is reduced. Although in the paper a mechanism is suggested to counter the entropy decay within the model, the ultimate point is that a statistical model cannot improve itself without continuous external anchoring.

The idea of LLMs being at all intelligent in any sense has been a contentious one, with the concept of language models being equated with ‘AI’ dating back to the 20th century, including as fun home computer projects. Much of the problem probably lies in humans projecting intelligent behavior onto these statistical models, turning LLMs into ‘counterfeit humans’, not helped by how closely generated text can resemble something written by a human, even if completely confabulated.

Thanks to [deshipu] for the tip.

Trying Pair Programming With An LLM Chatbot

April 27, 2026 by Maya Posch 73 Comments

When it comes to software developers, there are a few distinct types. For example, the extroverted, chatty type, who is always going out there to share the latest and newest libraries and projects with everyone, and is very much into bouncing ideas off others, regardless of whether they know what you’re talking about. Then there is the introverted loner, who prefers to tackle programming challenges by bouncing things around inside their own minds and going on long walks to mull things over before committing to anything significant.

This leads to interesting scenarios when it comes to management-enforced ‘optimization’ strategies, like Pair Programming. This approach involves two developers sharing the same computer and keyboard, theoretically doubling the effective output by some kind of metric, but realistically often leading to at least one side feeling pretty miserable and disconnected unless you put two of the chatty types together.

As a certified introverted loner developer, the idea of using an LLM chatbot as a coding assistant naturally triggers unpleasant flashbacks to hours of forced awkward pair ‘programming’. However, maybe using an LLM chatbot could be more pleasant because you can skip the whole awkward socializing bit. In order to give it a shake, I put together a little experiment to see whether LLM-based coding assistants is something that I could come to appreciate, unlike pair programming.

Continue reading “Trying Pair Programming With An LLM Chatbot” →

Reverse-Engineering Human Cognition And Decision Making In A Modern Age

April 13, 2026 by Maya Posch 32 Comments

Cognitive processes are not something that we generally pay much attention to until something goes wrong, but they cover the entire scope of us ingesting sensory information, the processing and recalling thereof, as well as any resulting decisions made based on such internal deliberation.

Within that context there has also long been a struggle between those who feel that it’s fine for humans to rely on available technologies to make tasks like information recall and calculations easier, and those who insist that a human should be perfectly capable of doing such tasks without any assistance. Plato argued that reading and writing hurt our ability to memorize, and for the longest time it was deemed inappropriate for students to even consider taking one of those newfangled digital calculators into an exam, while now we have many arguing that using an ‘AI’ is the equivalent of using a calculator.

At the root of this conundrum lies the distinction between that which enhances and that which hampers human cognition. When does one merely offload tasks to a device or object, and when does one harm one’s own cognition?

Continue reading “Reverse-Engineering Human Cognition And Decision Making In A Modern Age” →

How Vibe Coding Is Killing Open Source

February 2, 2026 by Maya Posch 130 Comments

Does vibe coding risk destroying the Open Source ecosystem? According to a pre-print paper by a number of high-profile researchers, this might indeed be the case based on observed patterns and some modelling. Their warnings mostly center around the way that user interaction is pulled away from OSS projects, while also making starting a new OSS project significantly harder.

“Vibe coding” here is defined as software development that is assisted by an LLM-backed chatbot, where the developer asks the chatbot to effectively write the code for them. Arguably this turns the developer into more of a customer/client of the chatbot, with no requirement for the former to understand what the latter’s code does, just that what is generated does the thing that the chatbot was asked to create.

This also removes the typical more organic selection process of libraries and tooling, replacing it with whatever was most prevalent in the LLM’s training data. Even for popular projects visits to their website decrease as downloads and documentation are replaced by LLM chatbot interactions, reducing the possibility of promoting commercial plans, sponsorships, and community forums. Much of this is also reflected in the plummet in usage of community forums like Stack Overflow.

Continue reading “How Vibe Coding Is Killing Open Source” →

Where Is Mathematics Going? Large Language Models And Lean Proof Assistant

October 8, 2025 by John Elliot V 31 Comments

If you’re a hacker you may well have a passing interest in math, and if you have an interest in math you might like to hear about the direction of mathematical research. In a talk on this topic [Kevin Buzzard], professor of pure mathematics at Imperial College London, asks the question: Where is Mathematics Going?

It starts by explaining that in 2017 he had a mid-life crisis, of sorts, becoming disillusioned with the way mathematics research was being done, and he started looking to computer science for solutions.

He credits Euclid, as many do, with writing down some axioms and starting mathematics, over 2,000 years ago. From axioms came deductions, and deductions became mathematical facts, and math proceeded in this fashion. This continues to be the way mathematical research is done in mathematical departments around the world. The consequence of this is that mathematics is now incomprehensibly large. Similarly the mathematical proofs themselves are exceedingly large, he gives an example of one proof that is 10,000 pages long and still hasn’t been completely written down after having been announced more than 20 years ago.

The conclusion from this is that mathematics has become so complex that traditional methods of documenting it struggle to cope. He says that a tertiary education in mathematics aims to “get students to the 1940s”, whereas a tertiary education in computer science will expose students to the state of the art.

Continue reading “Where Is Mathematics Going? Large Language Models And Lean Proof Assistant” →

Measuring The Impact Of LLMs On Experienced Developer Productivity

July 11, 2025 by Maya Posch 75 Comments

Recently AI risk and benefit evaluation company METR ran a randomized control test (RCT) on a gaggle of experienced open source developers to gain objective data on how the use of LLMs affects their productivity. Their findings were that using LLM-based tools like Cursor Pro with Claude 3.5/3.7 Sonnet reduced productivity by about 19%, with the full study by [Joel Becker] et al. available as PDF.

This study was also intended to establish a methodology to assess the impact from introducing LLM-based tools in software development. In the RCT, 16 experienced open source software developers were given 246 tasks, after which their effective performance was evaluated.

A large focus of the methodology was on creating realistic scenarios instead of using canned benchmarks. This included adding features to code, bug fixes and refactoring, much as they would do in the work on their respective open source projects. The observed increase in the time it took to complete tasks with the LLM’s assistance was found to be likely due to a range of factors, including over-optimism about the LLM tool capabilities, LLMs interfering with existing knowledge on the codebase, poor LLM performance on large codebases, low reliability of the generated code and the LLM doing very poorly on using tacit knowledge and context.

Although METR suggests that this poor showing may improve over time, it seems fair to argue whether LLM coding tools are at all a useful coding partner.

Hackaday

large language models

8 Articles

The Math You Need To Start Understanding LLMs

Why Model Collapse In LLMs Is Inevitable With Self-Learning

Trying Pair Programming With An LLM Chatbot

Reverse-Engineering Human Cognition And Decision Making In A Modern Age

How Vibe Coding Is Killing Open Source

Where Is Mathematics Going? Large Language Models And Lean Proof Assistant

Measuring The Impact Of LLMs On Experienced Developer Productivity

Search

Never miss a hack

If you missed it

Magnets Are Bad For Hardware Again

Between-Device Sharing Still Sucks

How Search Engines Enabled Finding Needles In A WWW-Sized Haystack

Teardown: ChargeTab Emergency Phone Charger

2026 Hackaday Europe: Pre-party, More Workshops, And Everything Else

Our Columns

Hackaday Links: May 24, 2026

Amazing Stories

Hackaday Podcast Episode 370: Softer Cyberdecks, A Simulated Clutch, And An Overstuffed Mailbox

This Week In Security: AI Generated Reports, More AI Generated Reports, GitHub Chaos, And More Linux Vulnerabilities

Tech In Plain Sight: The Mechanics Of String Trimmers

Search

Never miss a hack

Subscribe

If you missed it

Our Columns