Skip to content

A high-performance, deterministic C++20 matching engine

Notifications You must be signed in to change notification settings

dhruvan2006/orderbook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OrderBook Logo

OrderBook: Ultra-Low Latency Matching Engine

A high-performance, deterministic C++20 engine achieving sub-100ns median latency.

Test Status Codecov C++20


🚀 Performance Overview

Designed for High-Frequency Trading (HFT) environments where every nanosecond counts. This engine implements a subset of NASDAQ OUCH 5.0 and ITCH 5.0 protocols with a focus on mechanical sympathy.

  • Median Latency ($P50$): 45.51 ns
  • Tail Latency ($P99.9$): 837.35 ns
  • Throughput: ~14.37 Million ops/sec
  • Execution Model: Single-threaded deterministic hot-path with asynchronous I/O.

🏗 System Architecture

The system utilizes a Thread-per-Core architecture. By pinning the Matching Engine to a dedicated physical core and isolating I/O, we eliminate OS context switches and maximize L1/L2 cache residency.

Architecture Diagram
System Design: Lock-free SPSC pipelining for fast lock-free execution.

Key Architectural Decisions

  • Zero-Allocation Hot Path: Utilizing a custom OrderPool backed by 2MB Huge Pages (MAP_HUGETLB). This avoids malloc non-determinism and minimizes TLB misses during high volatility.
  • Lock-Free Communication: Components communicate via Single-Producer Single-Consumer (SPSC) ring buffers. This removes the overhead of mutex contention and cache coherency traffic.
  • Mechanical Sympathy: Data structures are cache-line aligned (64 bytes) to prevent False Sharing and optimized for spatial locality.

🛠 Technical Deep Dive

1. The "Tombstone-Free" OrderMap

Standard std::unordered_map uses linked-list chaining, causing expensive pointer chasing.

  • Our Solution: An open-addressed hash map with Backward Shift Deletion.
  • Impact: By shifting elements back upon deletion, we maintain short probe sequences without the performance degradation of "tombstones," resulting in a 3.47x speedup over the STL.

2. SIMD-Accelerated Logging (AVX2)

Persistence is usually the bottleneck.

  • The Optimization: We use _mm256_stream_si256 (Non-temporal stores) to stream Write-Ahead Logs (WAL) directly to memory-mapped files.
  • The Why: This bypasses the L3 cache entirely. Since log data is rarely read back immediately, we prevent it from "polluting" the cache, leaving more room for hot Order Book data.

3. Branchless Logic

We replaced std::lower_bound with a custom branchless binary search using assembly conditional moves (cmov). This stabilizes performance during price discovery by preventing CPU pipeline stalls caused by branch mispredictions.


📊 Benchmarking Data

Latency Distribution (Real-world NVDA Data)

Environment: Ryzen 9 6900HS (Isolated Cores), Ubuntu 24.04, Isolated Cores

Metric Cycles (@ 4.4 GHz) Latency
Minimum 42 9.46 ns
Median (P50) 202 45.51 ns
99th Percentile 1,566 352.78 ns
99.9th Percentile 3,717 837.35 ns
Measured with hardware performance counter (RDPMC)
Latency Distribution

Map Implementation Comparison

Implementation Throughput vs. STL
std::unordered_map 26.13 M/s Baseline
absl::flat_hash_map 37.16 M/s 1.42x
OrderMap (Fibonacci) 46.42 M/s 1.78x
OrderMap (Identity) 90.68 M/s 3.47x

✅ Reliability & Verification

In financial systems, speed is worthless without correctness.

  • Oracle Validation: OrderMap is continuously cross-referenced against a shadow STL implementation.
  • Fuzz Testing: Integrated clang's libFuzzer to ensure engine stability against malformed OUCH/ITCH packets.
  • Static Analysis: Built with -Wall -Wextra -Wpedantic and analyzed via Clang-Tidy.

🚀 Quick Start

Note: A kernel with Huge Pages support is suggested for OrderPool and OrderMap.

# Enable Huge Pages
sudo sysctl -w vm.nr_hugepages=1024

# Build with Release optimizations
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

# 3. Run
./OrderBook

👨‍💻 Author

This project was created by Dhruvan Gnanadhandayuthapani <dhruvan2006@gmail.com>.

About

A high-performance, deterministic C++20 matching engine

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published