OrderBook: Ultra-Low Latency Matching Engine

A high-performance, deterministic C++20 engine achieving sub-100ns median latency.

🚀 Performance Overview

Designed for High-Frequency Trading (HFT) environments where every nanosecond counts. This engine implements a subset of NASDAQ OUCH 5.0 and ITCH 5.0 protocols with a focus on mechanical sympathy.

Median Latency ($P50$): 45.51 ns
Tail Latency ($P99.9$): 837.35 ns
Throughput: ~14.37 Million ops/sec
Execution Model: Single-threaded deterministic hot-path with asynchronous I/O.

🏗 System Architecture

The system utilizes a Thread-per-Core architecture. By pinning the Matching Engine to a dedicated physical core and isolating I/O, we eliminate OS context switches and maximize L1/L2 cache residency.

System Design: Lock-free SPSC pipelining for fast lock-free execution.

Key Architectural Decisions

Zero-Allocation Hot Path: Utilizing a custom OrderPool backed by 2MB Huge Pages (MAP_HUGETLB). This avoids malloc non-determinism and minimizes TLB misses during high volatility.
Lock-Free Communication: Components communicate via Single-Producer Single-Consumer (SPSC) ring buffers. This removes the overhead of mutex contention and cache coherency traffic.
Mechanical Sympathy: Data structures are cache-line aligned (64 bytes) to prevent False Sharing and optimized for spatial locality.

🛠 Technical Deep Dive

1. The "Tombstone-Free" OrderMap

Standard std::unordered_map uses linked-list chaining, causing expensive pointer chasing.

Our Solution: An open-addressed hash map with Backward Shift Deletion.
Impact: By shifting elements back upon deletion, we maintain short probe sequences without the performance degradation of "tombstones," resulting in a 3.47x speedup over the STL.

2. SIMD-Accelerated Logging (AVX2)

Persistence is usually the bottleneck.

The Optimization: We use _mm256_stream_si256 (Non-temporal stores) to stream Write-Ahead Logs (WAL) directly to memory-mapped files.
The Why: This bypasses the L3 cache entirely. Since log data is rarely read back immediately, we prevent it from "polluting" the cache, leaving more room for hot Order Book data.

3. Branchless Logic

We replaced std::lower_bound with a custom branchless binary search using assembly conditional moves (cmov). This stabilizes performance during price discovery by preventing CPU pipeline stalls caused by branch mispredictions.

📊 Benchmarking Data

Latency Distribution (Real-world NVDA Data)

Environment: Ryzen 9 6900HS (Isolated Cores), Ubuntu 24.04, Isolated Cores

Metric	Cycles (@ 4.4 GHz)	Latency
Minimum	42	9.46 ns
Median (P50)	202	45.51 ns
99th Percentile	1,566	352.78 ns
99.9th Percentile	3,717	837.35 ns
Measured with hardware performance counter (`RDPMC`)

Map Implementation Comparison

Implementation	Throughput	vs. STL
`std::unordered_map`	26.13 M/s	Baseline
`absl::flat_hash_map`	37.16 M/s	1.42x
OrderMap (Fibonacci)	46.42 M/s	1.78x
OrderMap (Identity)	90.68 M/s	3.47x

✅ Reliability & Verification

In financial systems, speed is worthless without correctness.

Oracle Validation: OrderMap is continuously cross-referenced against a shadow STL implementation.
Fuzz Testing: Integrated clang's libFuzzer to ensure engine stability against malformed OUCH/ITCH packets.
Static Analysis: Built with -Wall -Wextra -Wpedantic and analyzed via Clang-Tidy.

🚀 Quick Start

Note: A kernel with Huge Pages support is suggested for OrderPool and OrderMap.

# Enable Huge Pages
sudo sysctl -w vm.nr_hugepages=1024

# Build with Release optimizations
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
make -j$(nproc)

# 3. Run
./OrderBook

👨‍💻 Author

This project was created by Dhruvan Gnanadhandayuthapani <dhruvan2006@gmail.com>.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
docs		docs
include/orderbook		include/orderbook
src		src
test		test
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
plot.py		plot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OrderBook: Ultra-Low Latency Matching Engine

🚀 Performance Overview

🏗 System Architecture

Key Architectural Decisions

🛠 Technical Deep Dive

1. The "Tombstone-Free" OrderMap

2. SIMD-Accelerated Logging (AVX2)

3. Branchless Logic

📊 Benchmarking Data

Latency Distribution (Real-world NVDA Data)

Map Implementation Comparison

✅ Reliability & Verification

🚀 Quick Start

👨‍💻 Author

About

Uh oh!

Releases

Packages

Languages

dhruvan2006/orderbook

Folders and files

Latest commit

History

Repository files navigation

OrderBook: Ultra-Low Latency Matching Engine

🚀 Performance Overview

🏗 System Architecture

Key Architectural Decisions

🛠 Technical Deep Dive

1. The "Tombstone-Free" OrderMap

2. SIMD-Accelerated Logging (AVX2)

3. Branchless Logic

📊 Benchmarking Data

Latency Distribution (Real-world NVDA Data)

Map Implementation Comparison

✅ Reliability & Verification

🚀 Quick Start

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages