Here’s How To Build A Tiny Compiler From Scratch

Believe it or not, building a tiny compiler from scratch can be as fun as it is accessible. [James Smith] demonstrates by making a tiny compiler for an extremely simple programming language, and showing off a hello world.

Here’s what happens with a compiler: human-written code gets compiled into low-level machine code, creating a natively-executable result for a particular processor. [James]’ compiler — created from scratch — makes native x64 Linux ELF binary executables with no dependencies, an experience [James] found both educational and enjoyable. The GitHub repository linked below has everything one needs, but [James] also wrote a book, From Source Code to Machine Code, which he offers for sale to anyone who wants to step through the nitty-gritty.

The (very tiny) compiler is on GitHub as The Pretty Laughable Programming Language. It’s tiny, the only data types are integers and pointers, and all it can do is make Linux syscalls — but it’s sufficient to make a program with. Here’s what the code for “Hello world!” looks like before being fed into the compiler:

; the write() syscall:
; ssize_t write(int fd, const void *buf, size_t count);
(syscall 1 1 "Hello world!\n" 13)

Working at such a low level can be rewarding, but back in the day the first computers actually relied on humans to be compilers. Operators would work with pencil and paper to convert programs into machine code, and you can get a taste of that with a project that re-creates what it was like to program a computer using just a few buttons as inputs.

21 thoughts on “Here’s How To Build A Tiny Compiler From Scratch

  1. My first system was a Motorola D2 kit. It came as a couple of PCBs with lots of components and had to be assembled. It was programmed directly in machine code via a hex keypad and display. I actually wrote my programs in assembly language and hand-assembled them.

    1. I have a D2 kit in the attic. It has seen much use. You could add a n RS-232 interface and install the MIKBUG PROM and program the kit from a serial terminal

  2. May I offer “the dragon book”, Aho and Ullman’s “Principles of Compiler Design”. Once you’ve looked at that, Linux has (inherited from Unix) all the tools you need to build your own compiler: lex, yacc, etc.

    1. this project clearly comes from the lisp and forth school of compiler design. lex and yacc and a lot of the ‘classic’ stuff in those books is completely irrelevant and unnecessary for this sort of compiler. there’s a lot of neat stuff in those books but almost the entire content of both of those books is almost completely irrelevant to ‘simple’ compilers.

      there’s some classic stuff you can’t hardly get away from if you want to make a ‘real’ compiler (like register allocation), but i’m gonna say even if you’re going in that direction, i wouldn’t recommend starting with a book. write the simple compiler and tackle the classic questions as they come to you. that’s just my opinion.

      1. Unnecessary, maybe, but writing a compiler for a context free grammar with (f)lex and yacc/bison finite state machine is almost trivial. Of course there are more modern tools available for the job as well, e.g. antlr. Or just ask ChatGPT to write you a simple compiler.

      2. Ugh. A masochist! Wouldn’t you rather play along with a maestro, than try to rediscover all the rules of symphonic composition on your own? I disagree with your approach, but to each their own. I think my suggestion (and that of the other poster) get you more value, faster.

    2. In high school I could code in assembly for both the c64 and Apple iie as well as read Hex. At one stage I remember looking at a project to compile BASIC, but it was easier to just use Machine code.

      1. And this was a great way to go at the time, especially since the C64 and Apple iie use the same microprocessor. But today, if you want to do a microcontroller project, it’s nice to not have to care about what the instruction set is, especially if you’re having to upgrade a project from 8-bit to 32-bit, or just adapt to a new architecture for supply chain reasons.

    3. I’ve implemented a few language parsers with lex and yacc, and I’ve implemented a few with no tools and recursive descent. I found it way easier to develop and maintain the recursive descent parsers.

      1. I was wondering about that. Seems like having to learn the “language” needed to write the specification for a code generator might be more challenging than writing the parser, and just raise the overall complexity of the design process.

  3. What I’d suggest is write a translator to convert the language to C, then compile that. It’s a much quicker path to production. Later, if there’s a need, you can write a direct compiler to assembler for the language.

    1. This is what happened when Pascal went out of favor – someone came up with a Pascal-to-C translator, and that was the last we heard of Pascal.
      But I think a better way to go is the other direction.
      C was a failed concept that just won’t go away, and yet I use it almost exclusively because it’s the best existing solution that is available for every architecture. But this has been a roadblock many times, like in the early days of microcontrollers, when C compilers for them were expensive, and because of the complexity of C, difficult to write.
      I’d far rather have a simpler syntax that does little more than remove the need to know the instruction set of a given computer, rather than trying to read like English, and then make a translator that converts C to that lower-level language.
      So a project like this has a great deal of appeal for me.

      1. Pascal and C were so similar you could almost use the others compiler with a set of macros.

        C++ and Object Pascal, the same. Just enough screwy differences to make things complicated.
        Kind of like Java and C#. Right down to one being clearly better.

        C a failed concept? No. Just bonkers wrong. Is sharp tool, don’t cut self. Do not attempt to perform bris with C.

  4. Has anyone actually read the book that is sorta-being-advertised here? Coming from the Lisp/Forth world myself, I’m curious, but I want to make sure this is a solidly-written book before I plunk down $29+. One reviewer on Amazon quipped that the author’s DB book was more “a bunch of notes” than a “book.” (But as we know, Amazon reviews are never biased :-). I do want to believe. But this book is so new and self-published, that a second opinion would be very much welcome.

  5. Example given in the linked article:
    (def foo (n)
    (if (le n 0)
    (then 0)
    (else (+ n (call foo (- n 1))))))

    Instead, PLEASE:
    (def foo (n)
    (if (le n 0)
    (then 0)
    (else (+ n (call foo (- n 1))))
    I shouldn’t have to rely on the parenthesis-matching features of a coding-focused text editor to read your code.

Leave a Reply

Please be kind and respectful to help make the comments section excellent. (Comment Policy)

This site uses Akismet to reduce spam. Learn how your comment data is processed.