Create A Compiler Step-By-Step

November 27, 2022

While JavaScript might not be the ideal language to write a production compiler, you might enjoy the “Create Your Own Compiler” tutorial that does an annotated walkthrough of “The Super Tiny Compiler” and teaches you the basics of writing a compiler from scratch.

The super tiny compiler itself is about 200 lines of code. The source code is well, over 1,000 but that’s because of the literate programming comments. The fancy title comments are about half as large as the actual compiler.

The compiler’s goal is to take Lisp-style functions and convert them to equivalent C-style function calls. For example: (add 5 (subtract 3 1) would become add(5,subtract(3,1)).

Of course, there are several shortcut methods you could use to do this pretty easily, but the compiler uses a structure like most full-blown modern compilers. There is a parser, an abstract representation phase, and code generation.

Even if you don’t like the slide-show approach, the literate commented JavaScript is easy to read and very instructive. If you don’t know JavaScript it should still be fairly easy to work it out if you know any common programming language.

At the bottom left of the page are two buttons: Verbose and Internals. You can press these at any time. Text due to the verbose button will have a blue line next to it and text about internals will have a red line. This allows you to tune the experience depending on how much detail you’d like to read.

This isn’t the first compiler we’ve seen chopped up for exposition. If you are interested in what existing compilers generate for source input, we were always impressed with Compiler Explorer.

19 thoughts on “Create A Compiler Step-By-Step”

Jonathan Bennett says:

November 27, 2022 at 7:21 pm

Reminds me of From Nand to Tetris, which includes a section on building a compiler. In assembly in that case, I think.

Report comment

Reply
Johannes Burgel says:

November 28, 2022 at 2:59 am

Isn’t that a Transpiler instead of a Compiler?

Report comment

Reply
1. DJ says:
  
  November 28, 2022 at 5:38 am
  
  Or are transpilers technically compilers?
  
  Report comment
  
  Reply
  1. Gravis says:
    
    November 28, 2022 at 5:46 am
    
    No, they are different. On of the key difference is that transpilers avoid transforming the code where compilers specifically transform it.
    
    Report comment
    
    Reply
2. Gravis says:
  
  November 28, 2022 at 5:44 am
  
  Actually, no… but just barely. One of the things a compiler does that a transpiler does not is transform the code which this program does therefore making it a compiler.
  
  Report comment
  
  Reply
  1. example says:
    
    April 20, 2023 at 12:47 pm
    
    Citation needed: you need to prove that a transpiler does not transform code.
    
    Report comment
    
    Reply
3. Christian says:
  
  November 28, 2022 at 8:41 am
  
  If you compile to a similar level of abstraction, it’s a transpiler. Ie new javascript to old JavaScript.
  
  All transpilers are compilers. Transpilers are just a more specific type. But it’s still correct to call it a compiler.
  
  In this case, since we’re just adding lisp style functions to JS, “transpiler” is more precise, but compiler is still technically correct.
  
  Report comment
  
  Reply
4. example says:
  
  April 20, 2023 at 12:49 pm
  
  Transpiler translate code to the same level of abstraction, while compiler translate code to a lower level of abstraction. A decompiler translate code to a higher level of abstraction. The hard part is how do we measure the level of abstraction.
  
  Report comment
  
  Reply
Erik says:

November 28, 2022 at 4:53 am

Nand to Tetris has an assignment for compiler (source to VM to assembly) and an assembler.

Report comment

Reply
Greg A says:

November 28, 2022 at 8:04 am

one of the biggest influences on me as a compiler writer is a class i took from Dan Friedman where he taught the “baskin robins compiler”, a 33-pass compiler. i’m surprised to find that even though he wrote several books, he doesn’t seem to have written a book based on that lesson.

the idea is that each pass should do only one simple thing. if you are using a pattern-matching language like ML, then it is often very easy to specify the passes. for the class, we used scheme with a couple macros that provided a pattern matching idiom. each pass produces a new intermediate language with different invariants, and obviously (because it’s Dan Friedman), it makes a new copy with each pass instead of side-effecting.

the first passes are obvious, character stream -> token stream -> tree. then there were passes to rename variables to disambiguate scope (like if you have ‘x’ in two different scopes, it gives two different names), to convert looping/conditional statements into branches and labels (flattening control flow), to convert nested expressions into sequential assignments to temporary variables (flattening expressions), and so on. the point is that each pass is so simple that problems which ordinarily seem like a lot of intricate book-keeping become quite obvious when that book-keeping is the only thing going on. the variable-renaming pass, for example, keeps track of scope information, but then that scope information is gone, not used for any other passes.

after a dozen or so very simple passes, it is basically a register-transfer language (or you could do SSA) and then you allocate registers and then convert to assembler mnemonics.

in practice, my compilers don’t have that many passes, but it’s a great introduction…and when you’re working on another compiler and dealing with awkward overlapping book-keeping problems, it is really helpful to remember the baskin robins compiler had these separated into different passes. the problem you face isn’t hard, it’s just become obfuscated by cramming too much work into one pass.

Report comment

Reply
1. NSFW says:
  
  November 28, 2022 at 8:00 pm
  
  So now I’m wondering if it would be possible to generically transforms two passes into a single pass, and then apply that transformation to the 33-pass compiler repeatedly until the result is a one-pass compiler.
  
  Report comment
  
  Reply
  1. Greg A says:
    
    November 29, 2022 at 7:44 am
    
    hahaha sick!
    
    a lot of the passes can be trivially composed in that way (and probably should be), but i think it’s impossible to “generally” do it. especially because some of the passes aren’t just the straightforward depth-first tree-visit that most of them are. for example, register allocation has a pass over the intermediate language generating conflict sets, then multiple passes over the conflict sets to determine the assignment, then a second pass over the intermediate language to substitute in the new assignments. so conceptually i’d describe it as one pass but even in that pass it has two different visits that can’t be composed together.
    
    but if you intentionally limited yourself to versions of those passes that *can* be composed, then it’s entirely possible! and i think it’s been done…i can’t remember the name of it but i know there’s a very-light-weight C compiler out there that i think goes to some effort to be one-pass. and i know there’s a register allocation algorithm out there that isn’t as good but does work on just a single visit.
    
    of course, that also depends on your language. i’m not 100% sure but i think C++ is intrinsically multi-pass, for example, because you are allowed to have some level of forward references. if for no other reason than you have to cache the templates and play them back when they’re instantiated. maybe the look-ahead could all be “dumb” like that.
    
    Report comment
    
    Reply
  2. blaisepascal2014 says:
    
    November 29, 2022 at 7:48 pm
    
    If you write it with that in mind, yes.
    
    As described, each pass is essentially a function of the type ILn -> ILn’ (a function from one intermediate language to the next. That is, you’d have a series of functions like:
    
    tokenizer :: Text -> [Tokens]
    parser :: [Tokens] -> ParseTree
    varRename :: ParseTree -> UniqueVarParseTree
    unlooper :: UniqueVarParseTree -> LoopFreeTree
    expFlattener :: LoopFreeTree -> ExpFlatTree
    …
    objectFile :: AssemblerCode -> ObjectFile
    
    and then you can do
    
    compiler :: Text -> ObjectFile
    compiler = objectFile
    . …
    . expFlattener
    . unlooper
    . varRename
    . parser
    . tokenizer
    
    to glue together all 33 passes.
    
    Report comment
    
    Reply
  3. example says:
    
    April 20, 2023 at 12:55 pm
    
    The minimum pass is two, because you cannot determine the offset of jump locations in the one pass for selection (if, switch, etc.) and loop (for, while) statements. If you use unoptimized functional recursion, this is not a problem, but welcome to overflowing your stack.
    
    Report comment
    
    Reply
PK says:

November 29, 2022 at 8:21 am

I remember seeing a program in my commodore 64 magazine. You had to type a program compiler and a checksum before you can actually type the chess program in.

Report comment

Reply
atbillp37 says:

December 1, 2022 at 6:56 pm

Porting Intel MCS BASIC-52 64-bit to portable Implementations using gcc c 1 x86 2 ARM 3 RISC-V platforms better idea?

Report comment

Reply
honzucha@gmail.com says:

December 1, 2022 at 10:44 pm

since a lot of language compilers are using LLVM, we can declare all of them transpilers.

Report comment

Reply
1. example says:
  
  April 20, 2023 at 12:57 pm
  
  It’s actually a question of the level of abstraction. Compile to lower level, trans[com]pile to equal level, decompile to higher level.
  
  Report comment
  
  Reply
Anonymous says:

May 17, 2023 at 9:58 am

Will there be a tutorial created similarly in the C or C++ language?

Report comment

Reply

Hackaday

Create A Compiler Step-By-Step

19 thoughts on “Create A Compiler Step-By-Step”

Leave a Reply to PKCancel reply

Search

Never miss a hack

If you missed it

Forced E-Waste PCs And The Case Of Windows 11’s Trusted Platform

Remotely Interesting: Stream Gages

Hands-On: EufyMake E1 UV Printer

A Brief History Of Fuel Cells

Big Chemistry: Fuel Ethanol

Our Columns

Hackaday Podcast Episode 323: Impossible CRT Surgery, Fuel Cells, Stream Gages, And A Love Letter To Microcontrollers

This Week In Security: CIA Star Wars, Git* Prompt Injection And More

Researchers Are Slowly Finding Ways To Stem The Tide Of PFAS Contamination

FLOSS Weekly Episode 834: It Was Cool In 2006

Supercon 2024: Using An Oscilloscope To Peek Below The Noise Floor

19 thoughts on “Create A Compiler Step-By-Step”

Leave a Reply to PKCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns