Skip to content

blynn/compiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

6c4e739 · Dec 7, 2024
Dec 7, 2024
Feb 9, 2023
Mar 8, 2024
May 9, 2020
Mar 8, 2024
Sep 24, 2024
Sep 1, 2024
Dec 7, 2024
Jul 27, 2021
Sep 25, 2024
Jan 31, 2024
Sep 25, 2024
Aug 27, 2024
Sep 25, 2024
Sep 25, 2024
Sep 12, 2024
May 8, 2023
Jan 31, 2024
Sep 24, 2024
Jan 31, 2024
May 8, 2023
Feb 15, 2020
Feb 9, 2023
Mar 15, 2020
Sep 25, 2024
Dec 7, 2024
Sep 25, 2024
Jun 10, 2021
May 26, 2023
Feb 15, 2020
Jan 3, 2024
May 8, 2023
Sep 25, 2024
Feb 9, 2023
Sep 12, 2024
Jun 17, 2023
Jan 2, 2024
Dec 25, 2019
Sep 25, 2024
Apr 21, 2023
Mar 8, 2024
Mar 8, 2024
Mar 8, 2024
Aug 8, 2024
Sep 1, 2024
Aug 27, 2024
Mar 8, 2024
Apr 21, 2023
May 8, 2023
Oct 8, 2024
Sep 25, 2024
Sep 19, 2024
Sep 12, 2024
Sep 19, 2024
Oct 8, 2024
May 8, 2023
Sep 25, 2024
Jul 17, 2020
Feb 9, 2023
Mar 13, 2020
Sep 18, 2024
Feb 3, 2024
Jan 2, 2023
Aug 8, 2024
Sep 25, 2024
Sep 25, 2024
Sep 25, 2024
Sep 25, 2024
Apr 26, 2023
Sep 24, 2024
Aug 27, 2024
Sep 25, 2024
Jul 11, 2021
Sep 1, 2022
May 8, 2023
Sep 24, 2024
Sep 24, 2024
Apr 19, 2023
Sep 18, 2024
Oct 23, 2024
Feb 28, 2020
Feb 9, 2023
Apr 21, 2023
Feb 9, 2023

Repository files navigation

Compiler Quest

The main goal is to continually level-up a self-hosting Haskell compiler. However, it’s a survival game as well as an RPG: it should always be possible to build this compiler starting with only a C compiler.

I had thought this was a daunting challenge. In the past, constructing a parser alone was a laborious grind for me. Then it got worse: compilers manipulate abstract syntax trees, so the self-hosting requirement demands the source language be complex enough to handle compound data types.

This time around, I found it shockingly easy to bootstrap a compiler for stripped-down Haskell (or perhaps I should say souped-up lambda calculus), thanks to a few tricks:

  • Parsing combinators are a joy to build from scratch and a joy to use.

  • Kiselyov’s bracket abstraction algorithm is simple yet practical.

  • Interpreting basic combinators is child’s play, and almost the only task we need perform in C or assembly.

  • Lambda calculus gives us the Scott encoding for free. Thus we effortlessly gain algebraic data types.

  • Laziness is at odds with native instructions, which are eagerly evaluated. However, we can readily reconcile their differences with shrewdly chosen combinators.

Perhaps the greatest difficulty was mustering the discipline to mark the road taken so that anyone with a C compiler can follow.

How to build

The earliest generations of our compiler reside in vm.c:

$ cc -O2 vm.c -o vm

When run without arguments, this program gets each compiler to compile its successor, which results in a series of numbers that we output to raw:

$ ./vm > raw

These numbers are a compiled form of the barely.hs compiler. The vm run command reads this raw file and interprets it to compile a given Haskell file:

$ echo "prependH s = 'H':s;" > /tmp/example.hs
$ echo "ello, World!" | ./vm run /tmp/example.hs

It only accepts code that type-checks. Moreover, the (⇐) operator is undefined and disallows mixing the (++) operator with (:) unless its fixity has been declared. This conflicts with the examples bundled with my IOCCC entry. The vm ioccc command inserts code to fix these issues:

$ ./vm ioccc fib.hs

The compilers thus far expect pure functions as input. The last function should have type String → String, and we implicitly wrap it in the interact function from the standard Haskell Prelude during compilation.

The effectively.hs compiler bucks the trend. It assumes the last function has type IO (), and treats it like main in a standard Haskell program. It also has support for FFI.

The lonely.hs compiler is the first generation with a main function of type IO (). It also outputs C code that should be appended to rts.c.

In sum, after running the above to produce raw, we can build a standalone compiler with:

$ (cat rts.c; ./vm run effectively.hs < lonely.hs) > lonely.c
$ cc -O2 lonely.c -o lonely

(A few generations later, our virtually.hs compiler bundles the runtime system with the output, so the cat is no longer needed.)