Where Is Mathematics Going? Large Language Models And Lean Proof Assistant

If you’re a hacker you may well have a passing interest in math, and if you have an interest in math you might like to hear about the direction of mathematical research. In a talk on this topic [Kevin Buzzard], professor of pure mathematics at Imperial College London, asks the question: Where is Mathematics Going?

It starts by explaining that in 2017 he had a mid-life crisis, of sorts, becoming disillusioned with the way mathematics research was being done, and he started looking to computer science for solutions.

He credits Euclid, as many do, with writing down some axioms and starting mathematics, over 2,000 years ago. From axioms came deductions, and deductions became mathematical facts, and math proceeded in this fashion. This continues to be the way mathematical research is done in mathematical departments around the world. The consequence of this is that mathematics is now incomprehensibly large. Similarly the mathematical proofs themselves are exceedingly large, he gives an example of one proof that is 10,000 pages long and still hasn’t been completely written down after having been announced more than 20 years ago.

The conclusion from this is that mathematics has become so complex that traditional methods of documenting it struggle to cope. He says that a tertiary education in mathematics aims to “get students to the 1940s”, whereas a tertiary education in computer science will expose students to the state of the art.

Continue reading “Where Is Mathematics Going? Large Language Models And Lean Proof Assistant”

Measuring The Impact Of LLMs On Experienced Developer Productivity

Recently AI risk and benefit evaluation company METR ran a randomized control test (RCT) on a gaggle of experienced open source developers to gain objective data on how the use of LLMs affects their productivity. Their findings were that using LLM-based tools like Cursor Pro with Claude 3.5/3.7 Sonnet reduced productivity by about 19%, with the full study by [Joel Becker] et al. available as PDF.

This study was also intended to establish a methodology to assess the impact from introducing LLM-based tools in software development. In the RCT, 16 experienced open source software developers were given 246 tasks, after which their effective performance was evaluated.

A large focus of the methodology was on creating realistic scenarios instead of using canned benchmarks. This included adding features to code, bug fixes and refactoring, much as they would do in the work on their respective open source projects. The observed increase in the time it took to complete tasks with the LLM’s assistance was found to be likely due to a range of factors, including over-optimism about the LLM tool capabilities, LLMs interfering with existing knowledge on the codebase, poor LLM performance on large codebases, low reliability of the generated code and the LLM doing very poorly on using tacit knowledge and context.

Although METR suggests that this poor showing may improve over time, it seems fair to argue whether LLM coding tools are at all a useful coding partner.

AI Is Only Coming For Fun Jobs

In the past few years, what marketers and venture capital firms term “artificial intelligence” but is more often an advanced predictive text model of some sort has started taking people’s jobs and threatening others. But not tedious jobs that society might like to have automated away in the first place. These AI tools have generally been taking rewarding or enjoyable jobs like artist, author, filmmaker, programmer, and composer. This project from a research team might soon be able to add astronaut to that list.

The team was working within the confines of the Kerbal Space Program Differential Game Challenge, an open-source plugin from MIT that allows developers to test various algorithms and artificial intelligences in simulated spacecraft situations. Generally, purpose-built models are used here with many rounds of refinement and testing, but since this process can be time consuming and costly the researchers on this team decided to hand over control to ChatGPT with only limited instructions. A translation layer built by the researchers allows generated text to be converted to spacecraft controls.

We’ll note that, at least as of right now, large language models haven’t taken the jobs of any actual astronauts yet. The game challenge is generally meant for non-manned spacecraft like orbital satellites which often need to make their own decisions to maintain orbits and avoid obstacles. This specific model was able to place second in a recent competition as well, although we’ll keep rooting for humans in certain situations like these.