All You’ve Ever Wanted To Know About Compilers

January 5, 2020

They say that in order to understand recursion, you must first understand recursion. Once you master that concept, you might decide that it’s time to write your own compiler that can compile itself as a fun side project. According to [Warren] aka [DoctorWkt], who documented every step of writing this C compiler from scratch, a true compiler will be able to do that.

Some of the goals for the project included self-compiling, focusing on a real hardware platform, practicality, and simplicity. [Warren] outlines a lot of the theory of compilers as well, including all the lexical, grammar, and semantic analysis and then the final translation into assembly language, but really focuses on making this compiler one for practical use rather than just a theoretical implementation. He focuses on Intel x86-64 and 32-bit ARM platforms too, which are widely available.

This project is a long read and very thoroughly documented at around 100,000 words, so if you’ve ever been interested in compilers this is a great place to start. There are a lot of other great compiler tools floating around too, like the Compiler Explorer which shows you generated code as you write in a higher level language.

[via Hackaday.io]

21 thoughts on “All You’ve Ever Wanted To Know About Compilers”

Jim Shortz says:

January 5, 2020 at 5:29 am

The Dragon book (Aho and Sethi) is the standard on this topic. However, Wirth’s Compiler Construction is very good as well and is a free download.

Report comment

Reply
1. Zerg says:
  
  January 5, 2020 at 8:49 am
  
  The Dragon book is the college standard and it’s a tough slog. Wirth’s book is a lot more readable and has source code for a Oberon-0 compiler.
  
  Wirth has also wrote Project Oberon which includes the design and construction of a compiler, operating system and a microprocessor to run it on(in Verilog). Source for all this is at:
  http://www.projectoberon.com/
  
  Report comment
  
  Reply
  1. Michael Black says:
    
    January 5, 2020 at 8:55 am
    
    Of course, Dr. Dobbs magazine had some articles about Small-C, which were eventually collected in a book. Since it was more aimed at hobbyists, thst material might be an introduction. C is of course “simple” since there are few key words, so maybe easy to grasp the companies oncepts of parsing etc.
    
    I have the dragon book, bought at a used book sale for a few dollars.
    
    Report comment
    
    Reply
  2. Jim Shortz says:
    
    January 5, 2020 at 11:07 am
    
    Agreed. Wirth shows you one way to build a compiler. Dragon shows you all of them :)
    
    Report comment
    
    Reply
    1. Pez says:
      
      January 5, 2020 at 12:03 pm
      
      IIRC, the Aho and Sethi book is strongly oriented towards the parser-generator paradigm. If I’m wrong about this, my apologies .. I haven’t looked at the book in a long, long .. long, time.
      
      In any case, there are many, many ways to skin the compiler cat.
      
      I’m definitely not dissing the parser-generator approach .. I’ve written my fair share of specialized tools using lex, yacc and such. Whether these were “compilers” or not might be a matter of how you define “compilation”. These were mostly tools to automate the translation of code from various industrial control systems to a particular target system. They did parse formal languages to ASTs, and then (ultimately) converted that AST representation into running code on the target system, using various combinations of declarative and imperative transformation and so forth along the way .. so I guess the would qualify as specialized compilers .. maybe ..
      
      In any case, a good ol’ LALR(1) parser-generator was definitely the “go to” approach when it was the right tool for the job, but there are plenty of cases where they are inordinately hard to use, or flat out can’t do the job.
      
      If you search for the term “parser combinator” on the following linked page, it should take you to a short, sweet, insightful review that expounds on this a little.
      https://www.amazon.com/Compilers-Principles-Techniques-Alfred-Aho-ebook/dp/B009TGD06W
      
      Report comment
      
      Reply
      1. JustMeAgain says:
        
        January 6, 2020 at 3:46 am
        
        Indeed.
        
        Federico Tomassetti is a firm advocate of ‘Language Workbenches’ as an alternative to ‘traditional’ methods, but as usual, it comes down to choosing the right tool for the job in hand.
        
        He gives a good overview here: https://tomassetti.me/domain-specific-languages
        
        Report comment
  3. JustMeAgain says:
    
    January 6, 2020 at 4:25 am
    
    You shouldn’t take everything Wirth says at face value. The man is a fanatic who truly believes that his way is the One True Way(tm).
    
    Report comment
    
    Reply
2. AnonymousMonk says:
  
  November 28, 2020 at 1:19 pm
  
  Assemblers, interpreters, translators, compilers — All doing the same thing, Differently and with full, on the nose Complexity.
  
  I offer one word for you, obviously more experienced Coders than I: Forth. Yes, that Forth, Chuck Moore’s creation. For all it flexibility, complexity, hackabiluty, and down right simplicity, it can be expanded, dimnished, ported, translated, reimplented, on any hardware, any language, both computer and spoken, and is eminently comprehendable, but its mastery can take years.
  
  Now remember that Forth was born and developed at the same time as KR C, 1969. You can even find a little bit of Lispiness in it; the concepts at least.
  
  I’m not going to preach, just wanted to nostalogise with you. Thanks much,. I too have all the titles mentioned above by the others, tough Slog, InDeed!
  
  Stay well – your AnonymousMonk.
  
  Report comment
  
  Reply
Paul says:

January 5, 2020 at 7:07 am

One may think “Why?”, but one important reason to consider writing your own compiler from scratch is described by Ken Thompson in his excellent (and terrifying) 1984 ACM Turing award speech Reflections on Trusting Trust: https://www.archive.ece.cmu.edu/~ganger/712.fall02/papers/p761-thompson.pdf also summarized here: https://wiki.c2.com/?TheKenThompsonHack

Basically, a maliciously-modified compiler can create executable code with undetectable back doors in it even when used to compile itself. So a bad actor could (or could have) installed this hack in long-ago generations of (say) gcc, and that malicious code could persist to this day.

The inimitable Bruce Schneier revisits the notion, with a possible countermeasure described by by David Wheeler, here https://www.schneier.com/blog/archives/2006/01/countering_trus.html

Report comment

Reply
1. Jim Shortz says:
  
  January 5, 2020 at 5:34 pm
  
  Or, as Ken has done more recently, you might invent a new language for which a compiler doesn’t yet exist :)
  
  I remember in college people being dismissive of studying compilers since gcc already existed. This was before Python, Java, Rust, Go, PHP, JavaScript, and Ruby were invented!
  
  Report comment
  
  Reply
  1. G says:
    
    January 2, 2021 at 4:54 am
    
    In those four years between gcc and Python first releases! (87 and 91 respectively)
    
    Report comment
    
    Reply
Gravis says:

January 5, 2020 at 8:31 am

“Intel x86-64”? That’s like the one thing that it’s never been called.

per wikipedia: “x86-64 (also known as x64, x86_64, AMD64 and Intel 64[note 1])”
The note being that “Intel initially used the names IA-32e and EM64T before finally settling on “Intel 64″”

Tech people call it x86_64 but previously called it AMD64.

Report comment

Reply
Ken says:

January 5, 2020 at 9:26 am

Turning to my left I see the dragon book still on my shelf – almost 40 years since that class – fond memories of the pascal compiler we wrote for compilers class.

Report comment

Reply
drenehtsral says:

January 5, 2020 at 11:17 am

Very neat! There’s nothing quite like writing a compiler or interpreter for a general purpose programming language to refresh and deepen your understanding of the bottom couple layers of abstraction that one can otherwise easily be lulled into taking for granted.

This is one of those great system-level projects (like making a simple CPU on an FPGA with Verilog, or writing a minimal kernel with just enough smarts to schedule and switch tasks, configure page tables, and handle page faults and I/O interrupts) where it is well worth reinventing a wheel or two for the sake of the richer understanding of the technology that comes from tackling the basics in a hands-on way.

Bravo! And all the much more for taking a practical incremental approach and for debugging in the open thus showing that nobody gets it 100% right the first time. Both the incremental approach and the visible test-debug-fix cycle go a long way to making the process approachable for those who might be inspired to set out on a similar journey themselves.

Report comment

Reply
1. WestfW says:
  
  January 6, 2020 at 12:49 am
  
  I was a bit disappointed with my compiler class (in the same timeframe.) Lots of stuff about grammars and parsing (that has been very useful over the years), but very, very little about the code generation part. I guess a semester only has so many days…)
  
  Report comment
  
  Reply
  1. JustMeAgain says:
    
    January 6, 2020 at 4:03 am
    
    Time constraints is only a small part of it. To quote Terrence Parr from his book ‘Language Implementation Patterns’.
    
    “Unfortunately, optimizing and generating machine code from an AST (or any other intermediate rep-
    resentation) is really hard.”
    
    He recommends using LLVM to do the “heavy lifting”.
    
    Report comment
    
    Reply
    1. JustMeAgain says:
      
      January 6, 2020 at 4:05 am
      
      Bitten by the formatting bug again I see :)
      
      Report comment
      
      Reply
Anonymous says:

January 5, 2020 at 11:42 am

“Back in the day…”, says a grey beard… ALL introductory graduate CS curriculum included writing assemblers and then compilers. At the time, the “Dragon book” was the best reference as FOSS wasn’t a thing and Usenet was one resource. That was the heyday of syntax/grammar generators (YACC was popular at that time), if one chose to go that route. I can remember some difficulty coding “null transitions states”. I still remember the intro class, the most successful (meaning they completed the assignment) students wrote their assemblers in assembly (IBM 360 assembly). I chose SNOBOL after a brief trial of Fortran. Other options later on included Turbo-C, Turbo-Pascal, Ada (Booth’s book)… all of which I used professionally at the time.

Report comment

Reply
eternityforest says:

January 5, 2020 at 5:37 pm

“Fun side project”….. You guys can’t fool me! A fun side project doesn’t involve throwing keyboards at walls!

Report comment

Reply
1. JustMeAgain says:
  
  January 6, 2020 at 4:43 am
  
  heh.
  
  There’s a very good reason why most ‘compiler’ courses used Pascal as their subject language instead of C.
  
  C was the inspiration, if that’s the right expression, behind the The International Obfuscated C Code Contest (https://ioccc.org/index.html)
  
  Report comment
  
  Reply
Duncan says:

January 9, 2020 at 7:27 am

I’m really, really enjoying this. I’m not quite following all the details of the concepts in all the chapters I’ve read so far, but it is surprisingly gripping to see the compiler unfold. Thanks for sharing!

Report comment

Reply

Hackaday

All You’ve Ever Wanted To Know About Compilers

21 thoughts on “All You’ve Ever Wanted To Know About Compilers”

Leave a ReplyCancel reply

Search

Never miss a hack

If you missed it

The DEW Line Remembered

The Fight To Save Lunar Trailblazer

Hacking When It Counts: DIY Prosthetics And The Prison Camp Lathe

Dearest C++, Let Me Count The Ways I Love/Hate Thee

Personal Reflections On Immutable Linux

Our Columns

FLOSS Weekly Episode 841: Drupal And AI: The Right Tool For Everything

Mach Cutoff: Bending The Sonic Boom

Robots Want The Jobs You Can’t Do

Hackaday Links: July 13, 2025

Trickle Down: When Doing Something Silly Actually Makes Sense

21 thoughts on “All You’ve Ever Wanted To Know About Compilers”

Leave a ReplyCancel reply

Search

Never miss a hack

Subscribe

If you missed it

Our Columns