An experimental MLIR compiler that goes from custom dialects to native binaries. Built incrementally, an epsilon at a time.
This project started as a way to deepen my understanding of MLIR. I learn best by building things, so I set out to construct a small compiler incrementally, one layer at a time.
The initial goal was to understand the core abstractions: dialects (tensor, control flow, buffer), operation semantics (constant folding, type constraints), and optimization passes (DCE, CSE, LICM, tiling, fusion). I wanted to see how MLIR's infrastructure works from the inside - not just reading documentation, but writing TableGen definitions, implementing fold methods, and wiring up pass pipelines.
After the optimization passes were working, I got curious: can I actually run this? So I extended the compiler to bufferize tensor operations into explicit memory management, lower through standard MLIR dialects, emit LLVM IR, and use clang to compile to a native binary. It was satifying seing [[6, 8], [10, 12]] come out of a program that started as epsilon.add on two constant tensors.
At that point, I wanted to close the loop on the end-to-end workflow. Writing MLIR by hand is fine for testing, but real compilers have frontends. So I built an ONNX importer — a Python script that reads machine learning models and emits Epsilon IR. Now you can define a model in Python, export it to ONNX, and compile it to x86 machine code through the full pipeline.
In the beginning, I spent days fighting the build system. Linking LLVM/MLIR libraries with a standalone project is not straightforward — CMake configuration, library dependencies, and linking order all have to be exactly right. I tried debugging it with free AI chats and the LLVM documentation and kept hitting walls.
Then I decided to subscribe to Claude Code. The build file problem was solved in minutes. 🤯
The key discovery was a set of LLVM CMake flags that dramatically improved the development experience:
| Flag | What it does |
|---|---|
-DCMAKE_BUILD_TYPE=RelWithDebInfo |
Optimized build with debug symbols — fast execution, debuggable |
-DLLVM_BUILD_LLVM_DYLIB=ON |
Builds LLVM as a shared library instead of hundreds of static libs |
-DLLVM_LINK_LLVM_SHLIB=ON |
Links tools against the shared lib — cuts link time dramatically |
-DLLVM_ENABLE_LLD=ON |
Uses the LLD linker instead of GNU ld — much faster linking |
-DLLVM_USE_SPLIT_DWARF=ON |
Splits debug info into .dwo files — reduces linker memory pressure |
-DLLVM_CCACHE_BUILD=ON |
Enables ccache for compiler caching — speeds up rebuilds |
Without these flags, a full LLVM build takes forever and linking a small tool against the static libraries can eat 16+ GB of RAM. With them, incremental builds are fast and linking is almost instant.
Thanks to Claude Code, building and learning has been much faster than I could have expected. I didn't need to get bogged down on syntax, CMake incantations, or MLIR boilerplate — I could focus on understanding the concepts while Claude handled the mechanical parts.
I also asked Claude to write the blog posts and code documentation for each feature, so I could read back through the explanations and deepen my understanding of what I'd built.
┌─────────────┐
│ ONNX Model │
│ (.onnx) │
└──────┬──────┘
│ epsilon-import-onnx (Python)
▼
┌─────────────┐
│ Epsilon │ epsilon.constant, epsilon.add,
│ Tensor │ epsilon.mul, epsilon.reduce_sum,
│ Dialect │ epsilon.reduce_max, epsilon.fill
└──────┬──────┘
│ Canonicalization / Constant Folding
│ DCE, CSE, LICM, Tiling, Fusion
│
│ --convert-epsilon-to-epsilon-buf
▼
┌─────────────┐
│ epsilon_buf │ epsilon_buf.alloc, epsilon_buf.add,
│ Buffer │ epsilon_buf.mul, epsilon_buf.dealloc,
│ Dialect │ epsilon_buf.constant, epsilon_buf.fill
└──────┬──────┘
│ Memory Planning, Buffer Reuse
│
│ --convert-epsilon-buf-to-std
▼
┌─────────────┐
│ Standard │ memref.alloc, scf.for,
│ MLIR │ arith.addi, memref.store
└──────┬──────┘
│ --convert-scf-to-cf
│ --convert-cf-to-llvm
│ --convert-arith-to-llvm
│ --convert-func-to-llvm
│ --finalize-memref-to-llvm
▼
┌─────────────┐
│ LLVM │ llvm.func, llvm.add,
│ Dialect │ llvm.load, llvm.store
└──────┬──────┘
│ mlir-translate --mlir-to-llvmir
▼
┌─────────────┐
│ LLVM IR │ define i32 @main() { ... }
└──────┬──────┘
│ llc → clang
▼
┌─────────────┐
│ Native │
│ Binary │
│ (x86) │
└─────────────┘
include/Dialect/Tensor/ # Tensor dialect (epsilon.add, epsilon.mul, ...)
include/Dialect/Buf/ # Buffer dialect (epsilon_buf.alloc, epsilon_buf.add, ...)
include/Dialect/CF/ # Control flow dialect (epsilon_cf.br, epsilon_cf.cond_br)
include/Conversion/ # Conversion pass headers
lib/Dialect/ # Dialect implementations
lib/Conversion/ # Conversion passes (tensor → buffer → standard)
test/ # LIT tests for every dialect, pass, and conversion
tools/epsilon-opt.cpp # Compiler driver (C++)
tools/epsilon-import-onnx.py # ONNX frontend (Python)
documentation/blog/ # Blog posts documenting each feature
- CMake >= 3.20, Ninja, Clang, LLD
- ccache (recommended)
- Python 3 with
onnxandnumpy(for ONNX import)
cd externals/llvm-project
cmake -G Ninja -B build llvm \
-DLLVM_ENABLE_PROJECTS=mlir \
-DLLVM_TARGETS_TO_BUILD="Native" \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DLLVM_ENABLE_ASSERTIONS=ON \
-DLLVM_BUILD_LLVM_DYLIB=ON \
-DLLVM_LINK_LLVM_SHLIB=ON \
-DLLVM_ENABLE_LLD=ON \
-DLLVM_USE_SPLIT_DWARF=ON \
-DLLVM_CCACHE_BUILD=ON \
-DCMAKE_C_COMPILER=clang \
-DCMAKE_CXX_COMPILER=clang++
ninja -C build
cd ../..cmake -G Ninja -B build . \
-DCMAKE_BUILD_TYPE=RelWithDebInfo \
-DMLIR_DIR=$(pwd)/externals/llvm-project/build/lib/cmake/mlir \
-DLLVM_DIR=$(pwd)/externals/llvm-project/build/lib/cmake/llvm \
-DLLVM_EXTERNAL_LIT=$(pwd)/externals/llvm-project/build/bin/llvm-lit
ninja -C build# All tests
ninja -C build check-epsilon
# Single test with verbose output
./externals/llvm-project/build/bin/llvm-lit -v build/test/Dialect/Tensor/ops.mlir# Parse and print
./build/bin/epsilon-opt input.mlir
# With constant folding
./build/bin/epsilon-opt input.mlir --canonicalize
# Full pipeline to LLVM dialect
./build/bin/epsilon-opt input.mlir \
--canonicalize \
--convert-epsilon-to-epsilon-buf \
--convert-epsilon-buf-to-std \
--convert-scf-to-cf \
--convert-cf-to-llvm \
--convert-arith-to-llvm \
--convert-func-to-llvm \
--finalize-memref-to-llvm \
--reconcile-unrealized-casts# Set up Python environment (one-time)
python3 -m venv .venv
.venv/bin/pip install onnx numpy
# Generate test models
.venv/bin/python test/ImportONNX/generate_test_models.py
# Import and compile
.venv/bin/python tools/epsilon-import-onnx.py test/ImportONNX/add_constants.onnx \
| ./build/bin/epsilon-opt --canonicalize# 1. Lower to LLVM dialect
./build/bin/epsilon-opt input.mlir \
--convert-epsilon-buf-to-std \
--convert-scf-to-cf --convert-cf-to-llvm \
--convert-arith-to-llvm --convert-func-to-llvm \
--finalize-memref-to-llvm --reconcile-unrealized-casts \
-o lowered.mlir
# 2. Emit LLVM IR
./externals/llvm-project/build/bin/mlir-translate --mlir-to-llvmir lowered.mlir -o output.ll
# 3. Compile and run
./externals/llvm-project/build/bin/llc output.ll -o output.o -filetype=obj
clang output.o -o program
./programThe commit history is clean and incremental — you can follow the commits to see how each feature was built step by step.
Each feature is documented in a blog post that explains the design decisions and implementation details:
- Setting Up the Build System with CMake
- TableGen: Defining Operations Without the Boilerplate
- Building a Tensor Dialect
- Control Flow Dialect
- Dead Code Elimination
- Common Subexpression Elimination
- Loop-Invariant Code Motion
- Loop Tiling
- Loop Fusion
- Buffer Dialect
- Memory Planning
- Buffer Reuse
- Lowering to Standard Dialects
- Emitting LLVM IR
- ONNX Import