A Browser Approach To Parsing

There are few rites of programmer passage as iconic as writing your first parser. You might want to interpret or compile a scripting language, or you might want to accept natural-language-like commands. You need a parser. [Varunramesh] wants to show you parser combinators, a technique used to make practical parsers. But the demonstration using interactive code cells in the web page is nearly as interesting as the technique.

Historically, you parse tokens, and this technique can do that too, but it can also operate directly on character streams if you prefer. The idea is related to recursive descent parsing, where you attempt to parse certain things, and if those things fail, you try again.

There are ways to match in a fuzzy way using Levenshtein distance. That way, if the user enters a typo, you can often recover from it. You could probably implement other schemes for this, too, like soundex, if you were parsing names. These types of parsers do have some limitations, but they are much easier to create, maintain, and debug than traditional parsers.

This is the first part in a series on creating parsers with combinators. Future installments promise to cover abstract syntax trees, error reporting, infix operations, and limiting backtracking. We’ll be interested to read those, too.

If you want some different parser tutorials, we got you. The usual tough problem is algebraic expressions, although you can always try RPN.

3 thoughts on “A Browser Approach To Parsing

  1. Fascinating problem and solution! Looking forward to the series and hoping it eases newcomers spelunking into this rabbit hole.

    Also just love to see actual CS theory on here, and that’s not (entirely) a backhanded compliment since you do a damn sight better at peppering the theoretical into the practical than, i.e. the uno reverse card of that in how my Big State School undergraduate alma mater decided their senior level “advanced data structures” course has no need to acknowledge such afterthoughts as “use cases” in between the enlightening time spent rigorously cramming O(n) matrices like some technical interview boot camp… Anyway I’m not high on edibles, you’re high on edibles.

  2. Algebraic expressions are not tough at all if you’re using Pratt parsing. And you can use it in combination with PEG / Packrat for everything else (or with simple recursive descent with parsing combinators), so there’s really no reason at all not to use it. It makes both precedence and associativity trivial.

  3. Forth ENCLOSE parser can be [was] written in transparent portable c.

    Forth us a blank, hex 20, as THE delimiter.

    c=a*b issue can be solved for the forth parser with a simple c
    preprocessor which expands to c = a * b so that ENLCOSE works?

    c ENCLOSE tested on x86 and ARM [Pi 4B] platforms.

    FIND modified for 64-bit Intel MCS BASIC-52 VM next?

    Processing modules no greater than one page of code to comply with
    Boeing hardware engineers’ software standards, of course.

    Software modules [subroutines with headers] longer than one page of code cannot be certified because of testing difficulties?

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.