Skip to content

iamvickynguyen/Epsilon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Epsilon

An experimental MLIR compiler that goes from custom dialects to native binaries. Built incrementally, an epsilon at a time.

The Story

This project started as a way to deepen my understanding of MLIR. I learn best by building things, so I set out to construct a small compiler incrementally, one layer at a time.

The initial goal was to understand the core abstractions: dialects (tensor, control flow, buffer), operation semantics (constant folding, type constraints), and optimization passes (DCE, CSE, LICM, tiling, fusion). I wanted to see how MLIR's infrastructure works from the inside - not just reading documentation, but writing TableGen definitions, implementing fold methods, and wiring up pass pipelines.

After the optimization passes were working, I got curious: can I actually run this? So I extended the compiler to bufferize tensor operations into explicit memory management, lower through standard MLIR dialects, emit LLVM IR, and use clang to compile to a native binary. It was satifying seing [[6, 8], [10, 12]] come out of a program that started as epsilon.add on two constant tensors.

At that point, I wanted to close the loop on the end-to-end workflow. Writing MLIR by hand is fine for testing, but real compilers have frontends. So I built an ONNX importer — a Python script that reads machine learning models and emits Epsilon IR. Now you can define a model in Python, export it to ONNX, and compile it to x86 machine code through the full pipeline.

The Build System Battle

In the beginning, I spent days fighting the build system. Linking LLVM/MLIR libraries with a standalone project is not straightforward — CMake configuration, library dependencies, and linking order all have to be exactly right. I tried debugging it with free AI chats and the LLVM documentation and kept hitting walls.

Then I decided to subscribe to Claude Code. The build file problem was solved in minutes. 🤯

The key discovery was a set of LLVM CMake flags that dramatically improved the development experience:

Flag What it does
-DCMAKE_BUILD_TYPE=RelWithDebInfo Optimized build with debug symbols — fast execution, debuggable
-DLLVM_BUILD_LLVM_DYLIB=ON Builds LLVM as a shared library instead of hundreds of static libs
-DLLVM_LINK_LLVM_SHLIB=ON Links tools against the shared lib — cuts link time dramatically
-DLLVM_ENABLE_LLD=ON Uses the LLD linker instead of GNU ld — much faster linking
-DLLVM_USE_SPLIT_DWARF=ON Splits debug info into .dwo files — reduces linker memory pressure
-DLLVM_CCACHE_BUILD=ON Enables ccache for compiler caching — speeds up rebuilds

Without these flags, a full LLVM build takes forever and linking a small tool against the static libraries can eat 16+ GB of RAM. With them, incremental builds are fast and linking is almost instant.

How Claude Code Helped

Thanks to Claude Code, building and learning has been much faster than I could have expected. I didn't need to get bogged down on syntax, CMake incantations, or MLIR boilerplate — I could focus on understanding the concepts while Claude handled the mechanical parts.

I also asked Claude to write the blog posts and code documentation for each feature, so I could read back through the explanations and deepen my understanding of what I'd built.

Architecture

                    ┌─────────────┐
                    │  ONNX Model │
                    │  (.onnx)    │
                    └──────┬──────┘
                           │  epsilon-import-onnx (Python)
                           ▼
                    ┌─────────────┐
                    │   Epsilon   │  epsilon.constant, epsilon.add,
                    │   Tensor    │  epsilon.mul, epsilon.reduce_sum,
                    │   Dialect   │  epsilon.reduce_max, epsilon.fill
                    └──────┬──────┘
                           │  Canonicalization / Constant Folding
                           │  DCE, CSE, LICM, Tiling, Fusion
                           │
                           │  --convert-epsilon-to-epsilon-buf
                           ▼
                    ┌─────────────┐
                    │ epsilon_buf │  epsilon_buf.alloc, epsilon_buf.add,
                    │   Buffer    │  epsilon_buf.mul, epsilon_buf.dealloc,
                    │   Dialect   │  epsilon_buf.constant, epsilon_buf.fill
                    └──────┬──────┘
                           │  Memory Planning, Buffer Reuse
                           │
                           │  --convert-epsilon-buf-to-std
                           ▼
                    ┌─────────────┐
                    │  Standard   │  memref.alloc, scf.for,
                    │    MLIR     │  arith.addi, memref.store
                    └──────┬──────┘
                           │  --convert-scf-to-cf
                           │  --convert-cf-to-llvm
                           │  --convert-arith-to-llvm
                           │  --convert-func-to-llvm
                           │  --finalize-memref-to-llvm
                           ▼
                    ┌─────────────┐
                    │    LLVM     │  llvm.func, llvm.add,
                    │   Dialect   │  llvm.load, llvm.store
                    └──────┬──────┘
                           │  mlir-translate --mlir-to-llvmir
                           ▼
                    ┌─────────────┐
                    │   LLVM IR   │  define i32 @main() { ... }
                    └──────┬──────┘
                           │  llc → clang
                           ▼
                    ┌─────────────┐
                    │   Native    │
                    │   Binary    │
                    │   (x86)     │
                    └─────────────┘

Project Structure

include/Dialect/Tensor/    # Tensor dialect (epsilon.add, epsilon.mul, ...)
include/Dialect/Buf/       # Buffer dialect (epsilon_buf.alloc, epsilon_buf.add, ...)
include/Dialect/CF/        # Control flow dialect (epsilon_cf.br, epsilon_cf.cond_br)
include/Conversion/        # Conversion pass headers
lib/Dialect/               # Dialect implementations
lib/Conversion/            # Conversion passes (tensor → buffer → standard)
test/                      # LIT tests for every dialect, pass, and conversion
tools/epsilon-opt.cpp      # Compiler driver (C++)
tools/epsilon-import-onnx.py  # ONNX frontend (Python)
documentation/blog/        # Blog posts documenting each feature

Build

Prerequisites

  • CMake >= 3.20, Ninja, Clang, LLD
  • ccache (recommended)
  • Python 3 with onnx and numpy (for ONNX import)

Build LLVM/MLIR

cd externals/llvm-project
cmake -G Ninja -B build llvm \
  -DLLVM_ENABLE_PROJECTS=mlir \
  -DLLVM_TARGETS_TO_BUILD="Native" \
  -DCMAKE_BUILD_TYPE=RelWithDebInfo \
  -DLLVM_ENABLE_ASSERTIONS=ON \
  -DLLVM_BUILD_LLVM_DYLIB=ON \
  -DLLVM_LINK_LLVM_SHLIB=ON \
  -DLLVM_ENABLE_LLD=ON \
  -DLLVM_USE_SPLIT_DWARF=ON \
  -DLLVM_CCACHE_BUILD=ON \
  -DCMAKE_C_COMPILER=clang \
  -DCMAKE_CXX_COMPILER=clang++
ninja -C build
cd ../..

Build Epsilon

cmake -G Ninja -B build . \
  -DCMAKE_BUILD_TYPE=RelWithDebInfo \
  -DMLIR_DIR=$(pwd)/externals/llvm-project/build/lib/cmake/mlir \
  -DLLVM_DIR=$(pwd)/externals/llvm-project/build/lib/cmake/llvm \
  -DLLVM_EXTERNAL_LIT=$(pwd)/externals/llvm-project/build/bin/llvm-lit
ninja -C build

Run Tests

# All tests
ninja -C build check-epsilon

# Single test with verbose output
./externals/llvm-project/build/bin/llvm-lit -v build/test/Dialect/Tensor/ops.mlir

Usage

Compile MLIR directly

# Parse and print
./build/bin/epsilon-opt input.mlir

# With constant folding
./build/bin/epsilon-opt input.mlir --canonicalize

# Full pipeline to LLVM dialect
./build/bin/epsilon-opt input.mlir \
  --canonicalize \
  --convert-epsilon-to-epsilon-buf \
  --convert-epsilon-buf-to-std \
  --convert-scf-to-cf \
  --convert-cf-to-llvm \
  --convert-arith-to-llvm \
  --convert-func-to-llvm \
  --finalize-memref-to-llvm \
  --reconcile-unrealized-casts

Import from ONNX

# Set up Python environment (one-time)
python3 -m venv .venv
.venv/bin/pip install onnx numpy

# Generate test models
.venv/bin/python test/ImportONNX/generate_test_models.py

# Import and compile
.venv/bin/python tools/epsilon-import-onnx.py test/ImportONNX/add_constants.onnx \
  | ./build/bin/epsilon-opt --canonicalize

Compile to native binary

# 1. Lower to LLVM dialect
./build/bin/epsilon-opt input.mlir \
  --convert-epsilon-buf-to-std \
  --convert-scf-to-cf --convert-cf-to-llvm \
  --convert-arith-to-llvm --convert-func-to-llvm \
  --finalize-memref-to-llvm --reconcile-unrealized-casts \
  -o lowered.mlir

# 2. Emit LLVM IR
./externals/llvm-project/build/bin/mlir-translate --mlir-to-llvmir lowered.mlir -o output.ll

# 3. Compile and run
./externals/llvm-project/build/bin/llc output.ll -o output.o -filetype=obj
clang output.o -o program
./program

Blog Series

The commit history is clean and incremental — you can follow the commits to see how each feature was built step by step.

Each feature is documented in a blog post that explains the design decisions and implementation details:

  1. Setting Up the Build System with CMake
  2. TableGen: Defining Operations Without the Boilerplate
  3. Building a Tensor Dialect
  4. Control Flow Dialect
  5. Dead Code Elimination
  6. Common Subexpression Elimination
  7. Loop-Invariant Code Motion
  8. Loop Tiling
  9. Loop Fusion
  10. Buffer Dialect
  11. Memory Planning
  12. Buffer Reuse
  13. Lowering to Standard Dialects
  14. Emitting LLVM IR
  15. ONNX Import

About

An experimental MLIR compiler that goes from custom dialects to native binaries, one small step at a time.

Topics

Resources

Stars

Watchers

Forks

Contributors