Skip to content

Latest commit

 

History

History
222 lines (162 loc) · 6.56 KB

File metadata and controls

222 lines (162 loc) · 6.56 KB

Ripple

Most stream processing systems recompute too much.

Ripple only recomputes what actually changed.

→ Distributed, type-safe, incremental stream processing in OCaml
→ 250ns stabilization at 10K symbols
→ 2M+ events/sec throughput
→ Deterministic replay and recovery

Built from first principles after studying incremental computation models used in production trading infrastructure.

Ripple live simulation — 50 symbols, only affected nodes recompute


What this looks like

Traditional systems:

price update → recompute entire window → serialize full record → latency spike

Ripple:

price update → recompute 3 affected nodes → serialize delta → done in 250ns
10,000 symbols in the graph.
One trade arrives.
3 nodes recompute. 9,997 don't.

Core Idea

Ripple is built on incremental computation graphs:

  • Nodes represent computations
  • Edges represent dependencies
  • Changes propagate only to affected nodes
  • Stabilization is deterministic and minimal

Instead of recomputing everything, we recompute exactly what changed.

Input Events → [Leaf Nodes] → [Map] → [Incr Fold] → [Output] → Deltas
                (set value)   (pure)   (O(1) update)  (diff)    (bin_prot)

A min-heap tracks dirty nodes in topological order. The incremental fold subtracts the old value, adds the new. O(1) per changed parent — not O(N), not O(log N). O(1). Regardless of graph size.


Architecture

Ripple is composed of:

  • Graph Engine — incremental dependency tracking with heap-based propagation
  • Delta Layer — idempotent state transitions with algebraic guarantees
  • Schema System — type-safe schemas with compile-time compatibility checking
  • Transport — sequence-ordered distributed propagation with CRC-32C integrity
  • Checkpointing — deterministic replay and recovery
  • Coordinator / Workers — horizontal scaling via consistent hashing

Design principles:

  • Deterministic execution (injectable effects for time, randomness)
  • Minimal recomputation (O(1) incremental fold)
  • Type-safe boundaries (schemas derived from types)
  • Replayable state (crash at any point, recover correctly)
lib/
├── kernel/          Effect injection, domain types (Trade, Vwap)
├── graph/           Core engine: heap-based stabilization, incremental fold
├── schema/          Type-safe schemas, delta algebra, compatibility checker
├── wire/            Binary protocol with CRC-32C integrity
├── transport/       Sequence-ordered delta buffer with gap detection
├── checkpoint/      Snapshot/restore with pluggable stores (memory, disk, S3)
├── window/          Tumbling/sliding/session windows, watermark tracking
├── time_series/     Aggregators: count, sum, mean, vwap, stddev
├── topology/        Pipeline composition and validation
├── observability/   Prometheus metrics, W3C tracing, introspection, alerts
├── coordinator/     Consistent hashing, partition assignment, failure detection
├── worker/          Lifecycle state machine, engine loop
├── rpc/             Async RPC delta transport
├── connector/       Source/sink connectors (file, Kafka)
└── ripple/          Top-level facade

Performance

Operation Measured
Stabilization (10K symbols) 250 ns
bin_prot serde roundtrip 82 ns (12M+/sec)
Schema compatibility check 128 ns
VWAP pipeline throughput 2.16M events/sec
6M event replay recovery 2.1 seconds
Heap growth over 1M events 0.1%

Guarantees

  • Deterministic replay — same inputs always produce same outputs
  • Idempotent updates — apply(d, apply(d, v)) = apply(d, v)
  • Bounded recomputation — O(depth) per event, not O(graph size)
  • Crash recovery — 100/100 random crash points recover correctly

Quality Bar

Category Count What
Inline expect tests 117 Every module
Property-based tests 11 Algebraic laws verified across 6,500+ random inputs
Load tests 4 Sustained throughput, memory stability, latency, O(1) recomputation
Chaos tests 3 Crash at 100 random points, all produce correct output
Integration tests 1 Docker Compose with Redpanda (Kafka) + MinIO (S3)

Pre-commit hook gates every commit against benchmark regression. Nothing lands if performance degrades.


Why Ripple Exists

The incremental computation model is incredibly powerful — but it's been locked inside single-process libraries.

Ripple is an attempt to:

  • Bring incremental computation to distributed systems
  • Preserve determinism at scale
  • Stop recomputing work that doesn't need recomputing

This is not a wrapper around existing stream processors. This is a different execution model.


Positioning

Ripple is not:

  • A Kafka clone
  • A Spark alternative
  • A batch processing system

Ripple is:

  • An incremental computation engine
  • A deterministic stream processor
  • A system that minimizes work instead of scaling it

Delta Algebra

Deltas are composable, associative, and idempotent:

apply(Set(v), _)              = Ok v                  -- replacement
apply(d, apply(d, v))         = apply(d, v)           -- idempotent
apply(diff(old, new), old)    = Ok new                -- roundtrip
compose(d, Remove)            = Remove                -- annihilation
compose(d, Set(v))            = Set(v)                -- right identity
apply(compose(d1,d2), v)      = apply(d2, apply(d1, v))  -- compatibility

This gives you effectively-once semantics without distributed transactions.


Quick Start

# Install dependencies
opam install core ppx_jane ppx_expect ppx_inline_test core_bench async

# Build
make build

# Run tests
make test

# Run the VWAP demo (2M+ events/sec)
make demo

# Start a worker
make worker
curl localhost:9100/health     # OK
curl localhost:9100/metrics    # Prometheus format

# CLI
dune exec ripple-cli -- info
dune exec ripple-cli -- inspect schemas

Makefile

Target What
make build Compile
make test All tests (117 inline + property + load + chaos)
make bench Benchmarks
make check Build + test + benchmark gate
make demo VWAP pipeline, 100K events
make worker Start worker process
make docs Build documentation
make post Generate project summary with live numbers
make install-hooks Install pre-commit hook

License

MIT