Parsing Math In Python

September 18, 2020

Programming computers used to be harder. Don’t get us wrong — today, people tend to solve harder problems with computers, but the fundamental act of programming is easier. We have high-level languages, toolkits, and even help from our operating systems. Most people never have to figure out how to directly read from a disk drive, deblock the data into records, and perform multiplication using nothing but shifts and adds. While that’s a good thing, sometimes it is good to study the basics. That was [gnebehay’s] thought when his university studies were too high level, so he decided to write an arithmetic expression parser in Python. It came out in about 100 lines of code.

Interpreting math expressions is one of those things that seems simple until you get into it. The first problem is correctly lexing the input — a term that means splitting into tokens. For a human, it seems simple that 5-3 is three tokens, {5, -, and 3} and that’s easy to figure out. But what about 5+-3? That’s also three tokens: {5,+,-3}. Tricky.

Precedence is the other problem. If you look at 5+3*2, you should remember that the answer is 11 or 5+(3*2). However, other than by convention, it is equally valid to consider the answer is 16, or (5+3)*2. Another convention is left association so that 7-4+2 is 5 instead of 1. As it turns out, [gnebehay’s] parser currently spits out 1 for that expression, but he’s promised to fix it soon.

While it is interesting to read the code, the real value is the readme file which documents the creation of the parser and some of the theory behind it. This sort of thing used to be a staple of computer science classes, but today you are less likely to encounter it as classes focus on higher-level constructs.

You probably won’t use this code for anything. You rarely need to parse just a math expression and even if you did, there are many tools to help do that now. But you might just learn something about how interpreters and compilers digest text and garner meaning.

Today, you would probably build a parser with ANTLR or some similar tool. While 100 lines seems small, we’ve seen tiny languages that are smaller.

10 thoughts on “Parsing Math In Python”

Rob Fielding says:

September 18, 2020 at 7:28 pm

It is so trivial to parse LISP syntax, that up until 2000 or so, every large program embedded an interpreter. You make trivial built-ins for logical and numeric operators, and you have a perfectly useful language that can be both data and code … something we lost when we started using json and yaml

Report comment

Reply
1. Daniel Dunn says:
  
  September 18, 2020 at 9:35 pm
  
  I always wondered why compiled languages targeting LISP aren’t really a thing. It’s the perfect bytecode for a high level language, I just wouldn’t want to actually write in it myself.
  
  I do use LISP like structures as the backend for my visual scripting engine in my lighting control software, but it’s stored in YAML and interpreted by Python.
  
  Report comment
  
  Reply
  1. Kiran Shila says:
    
    September 19, 2020 at 12:18 pm
    
    What do you mean? Clojure and Common Lisp are very popular.
    
    Report comment
    
    Reply
    1. Daniel Dunn says:
      
      October 9, 2020 at 12:45 am
      
      They’re popular as languages to directly write programs in, but they’re not popular as a compiled target for other languages (like how Typescript targets JavaScript)
      
      Report comment
      
      Reply
      1. combinatorylogic says:
        
        June 27, 2023 at 6:11 am
        
        Have you ever seen the GIMPLE and other intermediate representations in gcc? They’re basically Lisp.
        
        Report comment
paulvdh says:

September 19, 2020 at 12:47 am

However, other than by convention, it is equally valid to consider…

If you go down that path, then you may as well use * for addition and = for multiplication and / for logical XOR, maybe also dependent on moon phases and planet alignment.

Those conventions are a good thing to have. It’s because they add meaning to the gibberish.
How would you read this text without the common convention of the 26 squiggles we call the letters of the alphabet?

Report comment

Reply
Tom G says:

September 19, 2020 at 2:10 am

Just so. The “it is only a convention” nonsense was neatly pointed out by a logician over 15p years ago….

“I don’t know what you mean by ‘glory,’ ” Alice said.
Humpty Dumpty smiled contemptuously. “Of course you don’t—till I tell you. I meant ‘there’s a nice knock-down argument for you!'”
“But ‘glory’ doesn’t mean ‘a nice knock-down argument’,” Alice objected.
“When I use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”
“The question is,” said Alice, “whether you can make words mean so many different things.”
“The question is,” said Humpty Dumpty, “which is to be master—that’s all.”

Report comment

Reply
Brian says:

September 19, 2020 at 3:29 am

Per Python’s precedence rules for operators, which is same for most programming languages, precedence is not a “problem”. Parsing math equations is one of the first things you will do for the mundane undergrad course typically titled Data Structures and Algorithms.

Programmatic parsing of strings, symbols, etc is a solved problem. Many ways to do it – mostly covered in the compiler theory course. I still have my dragon book.

Report comment

Reply
1. Comedicles says:
  
  September 19, 2020 at 2:23 pm
  
  I think there is pretty definitive stuff in “The Art of Computer Programming” from around 1969. And isn’t this the typical example for using a stack?
  
  Report comment
  
  Reply
Alexander Pruss says:

September 19, 2020 at 6:26 am

In my experience there is nothing particularly rare about having to write an infix notation parser. I’ve ended up doing it at a bunch of times in my life. Once or twice for a function graphing app for some PDA. Once for Boolean searching in a PalmOS Bible reading app. Most recently, in OpenSCAD to compensate for the fact that the language doesn’t let you pass functions to functions or modules, so at least I can pass formulas as strings and then parse then myself.
In one case, I used bison to generate the parser, but in a number of other cases I wrote the parser from scratch, either for efficiency or as I was using a language other than C.
It’s not that hard, but always a nuisance. And generally I do it by translating into some alternate notation, like tokenized RPN that can be evaluated faster repeatedly.

Report comment

Reply

Hackaday

Parsing Math In Python

10 thoughts on “Parsing Math In Python”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

One Laptop Manufacturer Had To Stop Janet Jackson Crashing Laptops

The 2025 Iberian Peninsula Blackout: From Solar Wobbles To Cascade Failures

Field Guide To The North American Weigh Station

The Rise And The Fall Of The Mail Chute

Mining And Refining: Drilling And Blasting

Our Columns

C++ Encounters Of The Rusty Zig Kind

Data Visualization And Aggregation: Time Series Databases, Grafana And More

Hackaday Links: June 29, 2025

Limitations, Creativity, And Challenges

Hackaday Podcast Episode 326: A DIY Pockels Cell, Funny Materials To 3D Print With, And Pwning A Nissan Leaf

10 thoughts on “Parsing Math In Python”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns