Where Is Mathematics Going? Large Language Models And Lean Proof Assistant

If you’re a hacker you may well have a passing interest in math, and if you have an interest in math you might like to hear about the direction of mathematical research. In a talk on this topic [Kevin Buzzard], professor of pure mathematics at Imperial College London, asks the question: Where is Mathematics Going?

It starts by explaining that in 2017 he had a mid-life crisis, of sorts, becoming disillusioned with the way mathematics research was being done, and he started looking to computer science for solutions.

He credits Euclid, as many do, with writing down some axioms and starting mathematics, over 2,000 years ago. From axioms came deductions, and deductions became mathematical facts, and math proceeded in this fashion. This continues to be the way mathematical research is done in mathematical departments around the world. The consequence of this is that mathematics is now incomprehensibly large. Similarly the mathematical proofs themselves are exceedingly large, he gives an example of one proof that is 10,000 pages long and still hasn’t been completely written down after having been announced more than 20 years ago.

The conclusion from this is that mathematics has become so complex that traditional methods of documenting it struggle to cope. He says that a tertiary education in mathematics aims to “get students to the 1940s”, whereas a tertiary education in computer science will expose students to the state of the art.

He investigates the effect “computer as calculator” has had on mathematics since the middle of the 20th century, stating that it is less than you might have thought. More recently though we have large language models (LLMs) giving us “computer as generator of mathematics” and interactive theorem provers (ITPs) as “computer as checker of mathematics”, both being new ways to use computers for mathematics research. He notes that each of these technologies have flaws and that neither has, so far, told us anything profound which we didn’t already know. As he puts it mathematics has not seen a “Deep Blue moment“.

The point is then made that the problem with LLMs is that they hallucinate statements which introduces errors and the problem with ITPs is that all the code needs to be hand-written by humans. He floats the “no brainer idea” of combining LLM tech with ITP tech, the LLMs can propose mathematics and the ITP can verify it.

He concludes with the idea that LLM + ITP is the best future for mathematics, enabling mathematics to go from “mostly right” to “definitely right”.

If you have a passing interest in math you might also like to read Getting The Most Out Of ISM Transceivers Using Math and Design Scanimations In A Snap With The Right Math.

31 thoughts on “Where Is Mathematics Going? Large Language Models And Lean Proof Assistant

    1. It’s sometimes true. Depending on what 2 you are talking about. Not true for integers. Sometimes true if the numbers have been rounded, and are at the higher end of what we round to 2
      2.4+2.4=4.8
      Round to one significant figure.
      2+2=5

        1. The idea is you don’t know the precision of the data you are given. So adding two data points that are ‘2’ could give you a real value if you added them physically of somewhere between ‘3’ and ‘5’.

          1. No he didn’t, he rounded the end result and did not do so ‘prior to computation’

            Incidentally, two units of 1.5 plus two units of 1 equals 5 units, shipping is free, local taxes may apply.

        2. It’s operator overloading.

          You are all assuming scalar Cartesian addition.

          Granting ‘2’ is a scalar, fancy math uses jargon and conventions that they make up as they go along.

      1. 1.5 + 1.5 = 3
        So by your logic 2 + 2 can be [3,5] if the numbers were rounded for writing down but not for addition. Ignoring there are also different types of rounding.

        You are basically redefining the addition operator by adding a deround functionality.

        What you want can be done by extending the numbers with tolerance. You can use square, gaussian or, any other distribution forcthat tolerance. Or simply leave out distribution:
        2+-0.5 + 2+-0.5 = 4+-1

        Another way is to note the numbers are rounded:
        2.— + 2.— = [4,6)

  1. Theorem provers are great because they work well with what traditional computers do best: combining many different truth values together. Good for verification, but not for innovation.

    LLM’s aren’t really good at innovation either, unless you set their temperature up high, but that’s got a negative effect on the truth value of what comes out. To get something new, you have to take less probable tokens and just roll with it.

    So the idea of combining with a solver sounds good, but it’s also a little bit like adding a proofreader to the infinite monkeys working on Shakespeare. I’m not 100% sure it’s a good use of either the monkeys’ or the proofreaders’ time, but I guess it’s better than just letting them go wild? If they are all willing to work for bananas?

    1. I don’t really use chat, but image generation with AI is most fun when it gets a bit weird I find.
      Realistic and compliant to reality is a bit boring, we already got that.
      The fun part is if the elements are real-ish but the image is weird.

      I guess Dali would have agreed.

      But does that apply to LLM? I mean if you have it write a story or script then I guess you would want some interesting plot twists and not ‘bob went to the store and bought some bread’

    2. To be clear, Kevin’s not advocating for using LLMs to do new math – at least not in the next decade or two. I’m not aware of many mathematicians who want LLMs to make substantial new math. We don’t exactly want innovation here – the beauty and draw of mathematics is not getting answers, but understanding them, so a machine that just spits out answers is no fun. The primary interest right now is in using LLMs to assist with formalizing mathematics in a language like Lean. After all, mathematical proofs are written in natural language, and each field is subject to its own colloquialisms. Authors will omit detailed proofs and indicate the style with which one proves a lemma if it’s a sufficiently standard technique. But these are details that need to be filled in when formalizing a proof.

      LLMs seem to be pretty good at generating code from natural language when given precise instructions. I suppose this should not be a surprise, given the transformer architecture originated in research on machine translation. So I think there are reasons to be hopeful that this application of LLMs could be productive. The big challenge that Kevin’s identified is that even formalizing statements of theorems can be a large task – the same is true for definitions of mathematical objects. These are instances where, if the LLM hallucinates or makes an error in translating, the theorem prover cannot automatically catch this error and any formalized proofs building off of these flawed definitions are useless.

  2. My understanding is that this is similar to how many modern chess engines work. A neural network and a correct analysis step work in tandem. The neural network performs a qualitative assessment much faster (or with more subtlety, take your pick) than the correct analysis could. And then the correct analysis uses that to decide which branches are worthy of correct investigation. A composite heuristic.

  3. This makes a lot of sense.

    Proof assistants like lean, coq or mizar all take as their inputs formal proof languages which are far more verbose and tedious than textbook math proofs.

    Using an ai to translate a proof written by a human in a textbook into a program to be read by a formal proof checker is going to take far less time than having humans translate.

    Similarly, asking a language model to explain in math-ese a formal digital proof in lean is going to be faster than reading the program and trying to read and understand the code.

  4. A few months ago Terrance Tao live streamed himself doing a proof in LEAN and using an LLM based coding assistant to do as much of the work for him as he could get it to do. If I remember correctly, he initially made lots of progress letting the coding assistant do most of the work, then got stuck had to reread and get a better understanding of what was being done in the proof attempt, and then redirect it in the correct direction. I suspect that is where LLM + ITP mathematics is going. It will help you make progress on the small details but it won’t get you to the completed proof without sufficient understanding.

  5. My experience is similar to Tao’s. I use Claude Code to assist me with Lean proofs. It’s true that Claude rarely comes up with genuinely new ideas for how to tackle a proof (though I have seen that, in small doses), it does a great job of turning my paper-and-pencil proof sketches into working Lean code, and if the first version of the code doesn’t work, it’s usually able to correct type errors and flesh out proofs in a second or third round. I’ve even had the experience of Claude repeatedly struggling to get a proof over the finish line, despite much assistance from me, leading me to eventually realize that there was an assumption missing in the theorem I was proving. Claude has also found outright inconsistencies (not just typos) in my formalizations.

  6. As a mathematician myself I must disagree. The future isn’t LLM +ITPs, but a combination of philosophy, mathematics and engineering with a large sprinkle of computer science.
    After all artificial intelligence is NOT equivalent to machine learning and deep learning.
    My personal motto is “Nulla veritas sine arte” and curiosity, creativity and the sense of wonder and discovery of artful patterns in nature is not easily replicated in (sentient) machine intelligence.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.