DIY Regular Expressions

In the Star Wars universe, not everyone uses a lightsaber, and those who do wield them had to build them themselves. There’s something to be said about that strategy. Building a car or a radio is a great way to learn how those things work. That’s what [Low Level JavaScript] points out about regular expressions. Sure, a lot of people think they are scary. So why not write your own regular expression parser and engine? Get that under your belt and you’ll probably never fear another regular expression.

Of course, most of us probably won’t do it ourselves, but you can still watch the process in the video below. The code is surprisingly short, but don’t expect all the bells and whistles you might find in Python or even Perl.

Continue reading “DIY Regular Expressions”

Parsing Math In Python

Programming computers used to be harder. Don’t get us wrong — today, people tend to solve harder problems with computers, but the fundamental act of programming is easier. We have high-level languages, toolkits, and even help from our operating systems. Most people never have to figure out how to directly read from a disk drive, deblock the data into records, and perform multiplication using nothing but shifts and adds. While that’s a good thing, sometimes it is good to study the basics. That was [gnebehay’s] thought when his university studies were too high level, so he decided to write an arithmetic expression parser in Python. It came out in about 100 lines of code.

Interpreting math expressions is one of those things that seems simple until you get into it. The first problem is correctly lexing the input — a term that means splitting into tokens. For a human, it seems simple that 5-3 is three tokens, {5, -, and 3} and that’s easy to figure out. But what about 5+-3? That’s also three tokens: {5,+,-3}. Tricky.

Continue reading “Parsing Math In Python”

Tiny Programming Language In 25 Lines Of Code

There are certain kinds of programs that fascinate certain kinds of software hackers. Maybe you are into number crunching, chess programs, operating systems, or artificial intelligence. However, on any significant machine, most of the time those activities will require some sort of language. Sure, we all have some processor we can write hex code for in our head, but you really want at least an assembler if not something sturdier. Writing languages can be addictive, but jumping right into a big system like gcc and trying to make changes is daunting for anyone. If you want a gentle introduction, check out [mgechev’s] language that resides in 25 lines of Javascript.

The GitHub page bills it as a tiny compiler, although that is a bit misleading and even the README says it is a transpiler. Actually, the code reads a simple language, uses recursive descent parsing to build a tree, and then uses a “compiler” to convert the tree to JavaScript (which can then be executed, of course). It can also just interpret the tree and produce a numerical answer.

Continue reading “Tiny Programming Language In 25 Lines Of Code”

Language Parsing With ANTLR

There are many projects that call out for a custom language parser. If you need something standard, you can probably lift the code from someplace on the Internet. If you need something custom, you might consider reading [Federico Tomassetti’s] tutorial on using ANTLR to build a complete parser-based system. [Frederico] also expanded on this material for his book, but there’s still plenty to pick up from the eight blog posts.

His language, Sandy, is complex enough to be a good example, but not too complex to understand. In addition to the posts, you can find the code on GitHub.

Continue reading “Language Parsing With ANTLR”