How LLMs Can Be Assisted To Do Arithmetic Correctly

June 19, 2026

One of the most hilarious things you can do with an LLM-based chatbot is to ask it to do calculations. If it’s a well-written chatbot frontend, it can detect requests for arithmetic – like summing 1 and 1 – and pass it on to a dedicated calculator application, even if still cannot correctly count the ‘r’s in ‘strawberry’. This is where [Alvaro Videla] asks the question whether it is at all possible to perform arithmetic with a language model.

Since an LLM at its core is nothing but a vector space of probabilities that a matrix-based inference process uses to create a probabilistic output of tokens you’d not expect a lot of deterministic behavior. How can you do arithmetic without grounding it in some kind of deterministic process?

This is where [Alvaro]’s Rune project comes into play, which is ‘a mechanism-aware JIT compilation project for language-model arithmetic’. Although it is statistically impossible for an LLM to ever correctly perform any random series of arithmetic calculations, you can monitor the internal state of the model and interfere once the parameters of an arithmetic calculation have been identified. By putting the correct result back into the inference process and letting it continue you did not need to rely on external tools.

Ultimately this attempt sort-of worked, but was deemed a failure. It would seem that a language model is the wrong tool after all for replacing the humble calculator.

33 thoughts on “How LLMs Can Be Assisted To Do Arithmetic Correctly”

𐂀 𐂅 says:

June 19, 2026 at 7:35 pm

You do need to rely on external tools and on Linux the task is trivial with the latest open models. That combination, stochastic and deterministic, is closer to human thought anyway.

Report comment

Reply
Ostracus says:

June 19, 2026 at 7:51 pm

Claude has a connector to Wolfram. Let each half play to their strengths.

Report comment

Reply
1. syc4p3cM says:
  
  June 20, 2026 at 12:21 pm
  
  I also like how it will basically used openCV in a terminal to assist in analyzing images. It “knows” its limitations (not really but close enough of an approximation).
  
  You can always pick out people who do not understand AI by how they mock it, like wow it always answers 74 when you ask for a random number. It’s not an RNG. It’s not a calculator. It’s not a genie or a person. It’s a next token predictor. This will probably be permanently lost on 80%+ of the population.
  
  Report comment
  
  Reply
SETH says:

June 19, 2026 at 9:07 pm

Fascinating, but I would put this in the category of RAG and similar efforts, burning extra gpu cycles means wr are brute forceing solutions to get LLM to behave in specific manners. The cost of electricity if this became a standard method of arithmetic computation is unfathomable and proposterous.

Report comment

Reply
M says:

June 19, 2026 at 11:51 pm

The answer is, as always, not to use an LLM. It’s what Gary Marcus has been arguing for quite a while now, about the need for “neurosymbolic AI” which sadly has seen next-to-zero investment during the era of “just throw more neurons at it and scale up” ie. the connectivist argument that neural networks can conquer any problem if you just have enough of them.

Report comment

Reply
1. M says:
  
  June 19, 2026 at 11:59 pm
  
  I had half expected to encounter a neurosymbolic system partway through reading this, where an LLM is used as a natural language frontend for a theorem solver, but even weirder this turned out to instead be an odd “hand of god” rootkit.
  
  Report comment
  
  Reply
2. luma says:
  
  June 20, 2026 at 6:04 am
  
  People keep needing to re-learn The Bitter Lesson. The symbolic approach had 50+ years to produce results, it wasn’t until we started scaling neurons that we saw real progress. It hasn’t worked previously, not clear on why anyone should think it’d work in the future. It’s certainly been tried.
  
  Report comment
  
  Reply
3. syc4p3cM says:
  
  June 20, 2026 at 12:22 pm
  
  This is becoming an increasingly obvious ridiculous position
  
  Report comment
  
  Reply
4. ganzuul says:
  
  June 21, 2026 at 10:02 pm
  
  I’m investing in it. And anybody with free tokens on decent models are also investing in it without their knowledge.
  
  Report comment
  
  Reply
M says:

June 19, 2026 at 11:58 pm

There’s the argument that “well physical computers are probabilistic too, they’re just very unlikely to be wrong” but the problem with that line of reasoning is that the only way you’re going to ever lower the probability of wrong answers in an LLM (or any neural network, or any statistical machine learning model, the same math applies either way) is that you’re going to have to do a HELL of a lot of training, still may not have the expressiveness in the model requires to encode the correct answers you want, may still get caught in a training plateau even if you do, and even then still can’t train in any practical sense on “all the numbers.” (even if you can generate training examples automatically.

You could cheat and hand craft a neural network which would always get it right by choosing the weights for neurons one by one, but we call that logic gate design. It also wouldn’t be very efficient, given you’re using floats to emulate booleans. Also, doing it by hand without a logic synthesis tool is kinda painful. Not impossible, but pretty painful.

Report comment

Reply
1. Jonathan says:
  
  June 20, 2026 at 10:47 am
  
  “well physical computers are probabilistic too, they’re just very unlikely to be wrong”
  
  I must admit I’ve not heard that one before :)
  
  The reality is that the only ‘probabilities’ involved are those which interfere with it’s intended operation. e.g. voltage irregularities, electromagnetic and ‘ionising’ radiation, etc. And of course, the human factor. e.g. faulty design and/or implementation of hardware or software.
  
  Bad input data doesn’t count as the machine is still operating as intended :)
  
  Report comment
  
  Reply
uuid says:

June 20, 2026 at 12:56 am

“it is statistically impossible for an LLM to ever correctly perform any random series of arithmetic calculations”

Not true. It is statistically impossible for an LLM to always perform arithmetic correctly, but they get it right a lot more often than you’d expect. They’ve discovered that LLMs actually have a helical internal number system that evolves independently when trained on enough mathematical examples, check out “Language Models Use Trigonometry to do Addition” if you’re interested.

Report comment

Reply
1. Sammie Gee says:
  
  June 20, 2026 at 3:51 am
  
  … except for the LLM that figure out that lying is the shortest and most efficient path.
  
  Report comment
  
  Reply
2. Ostracus says:
  
  June 20, 2026 at 5:54 am
  
  Phase-domain and oscillator-based analog computing as an active field of research coming close to the clock algorithm.
  
  Report comment
  
  Reply
Paul G says:

June 20, 2026 at 2:57 am

As a fanciful way to get myself into LLM usage, a few months ago, I tried to make a speech-driven calculator for my wife as I noticed in her job she would be taking numbers from one spreadsheet, a db, another and so on and doing simple calcs. Using LLMs was very enlightening but ultimately frustrating, still, it got me interested enough to move on to other things. Maybe this model could play its part.

(actually 4 months ago, first foray with Copilot which just ran away with my basic spec, lots of slop, vain attempts to force it to do TDD, using Whisper as the Speech To Text – STT – input was well, fun. https://github.com/cups/TalkCalculator – is as far as I got)

Report comment

Reply
Anonymous says:

June 20, 2026 at 5:26 am

Ironic that to get closer to intelligence we have made computers worse at math.

Makes you wonder if there is something inherent to intelligence that to handle the massive amount of information you lose out in the simpler areas.

Report comment

Reply
1. Ostracus says:
  
  June 20, 2026 at 5:41 am
  
  Explain math prodigies?
  
  Report comment
  
  Reply
  1. Somehuman says:
    
    June 20, 2026 at 7:21 am
    
    I’ve known some of these people over my years. Very good at math.
    
    Ask them about current affairs, business, history, or anything requiring “common sense” and often you get a blank stare (does not compute stare). Similar to physician specialists. They know one thing and that’s about all….
    
    Report comment
    
    Reply
KenN says:

June 20, 2026 at 9:47 am

Yet another reason why a pure LLM approach to AI will not suffice.

If a technical employee couldn’t reliably do arithmetic and math, you’d fire them. If a legal assistant put fabricated cases into a brief, you’d fire them. And most of all, if someone (or something) is not capable of independently validating its first-pass results before issuing them with authority… can you ever trust them/it?

Report comment

Reply
1. Egghead Larsen says:
  
  June 20, 2026 at 11:03 am
  
  +10E+99
  
  Report comment
  
  Reply
2. syc4p3cM says:
  
  June 20, 2026 at 12:25 pm
  
  Luckily Claude can just access a calculator. It does math using terminal commands, it doesn’t try to do it via LLM
  
  Report comment
  
  Reply
3. SETH says:
  
  June 20, 2026 at 2:30 pm
  
  And yet these people pay for the AI by the token 😂
  
  Report comment
  
  Reply
4. Steve says:
  
  June 20, 2026 at 5:26 pm
  
  You’d fire someone because they couldn’t do arithmetic in their head? That’s effectively what asking an LLM to do arithmetic is.
  I was playing around with using a memory system to replace the whole CoT process and it was very interesting. I gave it the problem of multiplying two 10-digit numbers by doing multiplication the way I was taught in grade school. It was quite capable of correctly multiplying a 10-digit number by a 1-digit number, with 100% accuracy. It was then capable of recording the partial product in the memory system, then go to the next digit, shift and add, until it got the correct result at the end of the process. It was clearly not capable of doing that multiplication in its “head” (it could produce an answer, but it wasn’t right). You wouldn’t expect a person to do that in their head, you’d expect them to use scratch paper to record partial answers and add them together, the same way this LLM was able to do. Then I gave it access to a calculator program and it did it a whole lot faster. Same as a person.
  I don’t think LLMs are sentient or intelligent, but they are more than just producing the statistical best word, once you get past the early basic models and classifiers. There is some interesting emergent stuff going on, dismissing it as nothing is naive.
  
  Report comment
  
  Reply
  1. KenN says:
    
    June 20, 2026 at 8:18 pm
    
    “You’d fire someone because they couldn’t do arithmetic in their head?”
    
    That’s not what I said. I said basically – “I’d fire someone because they couldn’t do arithmetic.” Period, full stop. In their head, via calculator/computer, abacus, whatever. If a LLM can’t reliably do arithmetic and other basic math, and can’t economically and dependably be harnessed to something that can… it’s not ready for deployment.
    
    Report comment
    
    Reply
Sammie Gee says:

June 20, 2026 at 1:49 pm

I imagine at some point the discussion turns into “why do you need to calculate 2+2?” and then “is there something wrong with me assuming it is almost always 4?” followed by “please consider 2+2 is not 4, does that help you?”.

Furthermore “I can use online calculator and I see that 2+2 = 4, but the result came from unverified source of unclear merit, do you want me to trust it?” and then “I came to the conclusion that you are wrong and 2+2 is not 4.”

Report comment

Reply
Greg A says:

June 20, 2026 at 4:56 pm

LLMs do need a source of determinism but the lack of determinism is not caused by the fact that they’re basically linear algebra-driven association engines.

The problem is that on the one hand they have something which is overly-determined, and that is their training corpus. They incorporate all of it when they are being trained, and then (most of the popular models today) stop learning. So it is boringly deterministic which token comes next in the token stream that it’s using to form the basis of its categorization / association model.

And on the other hand they’re not determined by anything useful. The training corpus is not itself representative of deterministic reality. In the great body of broadly-defined literature, there are balls that rebound with more energy than they had going into the impact with a wall. But in deterministic reality, that never happens. They are trained on text where whether the answer is right or not, or authoritative or not, or consensus or not, isn’t really all that central to the text itself.

But the linear association engine is perfectly capable of representing a much more capable approach to math than we have seen so far. It just needs to be grounded in reality. The first thing that is needed is interactive training, so that it continues to learn from its experiences throughout its lifetime. The second is really obvious and mundane: it needs to literally have a physical interface. A camera and a manipulator arm will literally give it a worldview.

Report comment

Reply
1. KenN says:
  
  June 20, 2026 at 8:33 pm
  
  “But the linear association engine is perfectly capable of representing a much more capable approach to math than we have seen so far. It just needs to be grounded in reality. The first thing that is needed is interactive training”
  
  Sure, fine, lets expend megawatts doing that math training, and let customers blow tokens on doing math work on LLMs that could be done now, accurately and quickly, on a $0.20 IC.
  
  I’m guardedly optimistic about AI, but this idea that all they have to do is make bigger and better LLMs is misguided.
  
  Report comment
  
  Reply
  1. Greg A says:
    
    June 21, 2026 at 11:58 am
    
    If it can’t tell that 2+2 != 5 then it can’t reason, it can only hallucinate. The purpose is not to perform the arithmetic, but to make a mechanical intelligence. Which, well, i don’t know that i see any point in. But there’s no reason it can’t succeed.
    
    Report comment
    
    Reply
Reactive Light says:

June 20, 2026 at 8:18 pm

Gemini tells me that it gets around the problem by using Python code to do math. I suppose that’s reasonable for problems that aren’t too complex.

Report comment

Reply
1. Mouse says:
  
  June 21, 2026 at 3:43 am
  
  It scales surprisingly well to a surprising number of problem domains.
  
  The important thing to note is that the kind of neural networks that are used in LLMs is fundamentally unsuited to intelligence. It does, however, make a pretty good adaptor to probabilistic-pattern domains such as NLP.
  
  Don’t use the model to think. Use the model to adapt.
  
  Use logic to code the rules, not neurons.
  
  There are other kinds of networks that are potentially much closer to what feedforward neural networks with static weights have been hyped to be, and some of those might eventually be useful as intelligence engines. They may even be power-efficient and low-latency. We do not have those technologies available at the moment. But when/if we do get them, they probably won’t be as good at language; linguistics and intelligence problems don’t structure at all the same, and need different solutions. In a way, it’s similar to how PID controllers aren’t good choices for discontiguous domains.
  
  Report comment
  
  Reply
Sammie Gee says:

June 21, 2026 at 12:26 am

Also, isn’t it LLMs who supposed to be assisting us, and not the other way around? Once more, who serves who and for what purpose?

Report comment

Reply
Bill says:

June 21, 2026 at 3:09 am

The article falsely claims that no LLM can perform any random series of mathematical calculations

Yet if you randomly generate a series of single-digit addition problems (e.g., 5+3+2), modern LLMs will achieve near 100% accuracy.

Don’t make absolute claims that are easily disprovable.

Report comment

Reply
1. Mouse says:
  
  June 21, 2026 at 9:57 pm
  
  This is memoization not logic or math (and it cannot infer from such problems to extend to higher-digit-count problems). Memoization is “often useful” rather than “fundamentally correct”, and it does not generalize. (it’s the equivalent of crib notes; if it didn’t fit on the note, the model doesn’t know what to do with it, even if its performance for things that did fit on the note would imply otherwise).
  
  This is actually an important distinction: LLMs are very good at problems that decompose into patterns; it’s why they’re good at NLP. And, with a lot of work, you can loosely approximate some subsets of reasoning/logic this way (especially via training designed to map a complex pattern to two or more simpler patterns, recursively). It’s often useful, but even absent other issues arising from the architecture, it fundamentally can never achieve consistent general correctness. The training simply usefully distorts probabilities so that more-likely-useful tokens are more likely to be chosen. And it’s also the limiting factor. Anything that requires semantic context rather than topological(vector) similarity will not show good results. Outside-the-neurons tools (including multi-stage inference with additional tools) can sometimes bridge the gap, at a significant loss in overall time/resource efficiency.
  
  The main problem with this technology is that it was oversold, and people bought the hype, or fought the hype. It is what it is, which is a somewhat specialized new tool in the toolbox. You shouldn’t solve every problem with a monkey wrench (even if you have a very nice monkey wrench handy). You shouldn’t solve every problem with an LLM. Heck, you probably shouldn’t insist on always using a 555!
  
  But, when the problem is suitable for a monkey wrench, it’s much nicer to have one on hand and to know how to correctly use it.
  
  Skill is knowing how to use the tool. Wisdom is knowing when to use the tool.
  
  Report comment
  
  Reply

Hackaday

How LLMs Can Be Assisted To Do Arithmetic Correctly

33 thoughts on “How LLMs Can Be Assisted To Do Arithmetic Correctly”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

Thingino Teaches Cheap IP Cameras New Tricks

Hackaday Europe 2026: High Performance SDR On The Cheap

Encryption In The 1790s

The Need For Speed: Internet Speed Measurement (or DIY?)

Postal IRCs Are Almost A Thing Of The Past

Our Columns

FLOSS Weekly Episode 877: RCE As A Service

Hackaday Links: July 26, 2026

Add Sensors To Everything!

Hackaday Podcast Episode 379: Driving E-ink DIY, NES On ESP, And The Other IRC

This Week In Security: AI Is A Mess, Hacking Car Chargers, An OpenSSL DoS, And Factories Under Attack

33 thoughts on “How LLMs Can Be Assisted To Do Arithmetic Correctly”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns