Scheme to x86-64 nanopass Compiler written from scratch in OCaml.
Armeet Singh Jatyani armeet@caltech.edu
Ashug Gurijala agurijal@caltech.edu
Example (left: scheme-like, right: x86-64 assembly)
;; OUTPUT: 42
| main_start:
| leaq add(%rip), %rcx
(define (add (x : Integer) (y : Integer)) : Integer | movq $40, %rdi
(+ x y)) | movq $2, %rsi
| movq %rcx, %rax
(add 40 2) | popq %rbp
| jmp *%rax
|
| .globl main
| .align 8
|
| main:
| pushq %rbp
| movq %rsp, %rbp
| movq $65536, %rdi
| movq $65536, %rsi
| callq initialize
| movq rootstack_begin(%rip), %r15
| jmp main_start
|
| main_conclusion:
| popq %rbp
| retq
|
| add_start:
| movq %rdi, %rcx
| movq %rsi, %rdx
| movq %rcx, %rax
| addq %rdx, %rax
| jmp add_conclusion
|
| .globl add
| .align 8
|
| add:
| pushq %rbp
| movq %rsp, %rbp
| jmp add_start
|
| add_conclusion:
| popq %rbp
| retq
The compiler is split into a backend (heavy reduction logic) and frontend (reduced languages to assembly with light optimizations).
- Backend
uniquify
- ensure all let expressions have unique namesreveal_functions
- replace any function name variables with FunRef formlimit_functions
- in any function with more than 6 args, additional args are packaged into a tupleexpose_allocation
- replace vector form with 3 lower level forms (collect, allocate, global_value) for garbage collectionuncover_get
remove_complex
- transform all arithmetic operands into atomic expressions (Ints or Vars)explicate_control
- flattens nested let statements into sequential assignments followed by a final returnremove_unused
- removes any jump blocks that can never be reached (full name remove unused blocks)
- Frontend
select_instructions
- convert from intermediate language to assembly intermediate languageuncover_live
- liveness analysisbuild_interference
- interference graph coloring algorithm for liveness analysisallocate_registers
- allocate registers as determined by graph coloring algorithmremove_jumps
- optimization: if a block only ever jumps to another, merge thempatch_instructions
- "patch" x86 instructions to follow restrictions (types, parameter types, order of operands)prelude_conclusion
- converts from the x86int language to the x86asm language (Sexp will be used to convert to string expressions)optimize
- remove instructions with no effect (adding/removing 0 from register/stack location, self-moves, etc.), trim reciprocal moves or identical moves
We implement a nanopass compiler, meaning we gradually convert higher-level languages into lower-level ones.
Each pass implements a small reduction. For example, the remove_complex
(remove complex operands) pass, ensures that
operands of arithmetic expressions are atomic expressions such as integers or variables. The uniquify
pass ensures that
variables have unique names. The graph_coloring
and allocate_registers
passes dynamically allocate registers to variables, keeping track of live variables. Specifically, this pass keeps track of callee-saved
and caller-saved variables to perform liveness analysis. When registers are exhausted, variables are
allocated on the stack.
opam install ocamlfind dune utop sexplib ppx_sexp_conv ocaml-lsp-server
cd src/compiler
make
python scripts/run_eval_tests.py tests/var_test_10.src
python scripts/run_eval_tests.py tests/var_test_*.src
View compiled code and reference side by side.
python scripts/compare.py reference/var_test_*.lvar -diff -random 10
python scripts/compare.py reference/var_test_*.lvar -diff
python scripts/compare.py reference/var_test_*.lvar -diff
> file.s