A lightweight, config-driven harness for systematic llama.cpp performance testing on multi-core NUMA systems.
Fore more details - check-out: Sergiu's blog article - Testing on AMD Threadripper 1950x
- YAML-driven: All test scenarios live in declarative configs
- Two modes: Exploratory (fast, broad) → Deep (confirmatory, narrow)
- Provenance: Every run captures binary fingerprints, env, NUMA state
- Reproducible: Promotes winners from exploratory to deep testing
Optimized for multi-core NUMA systems, including:
- AMD Threadripper (all models)
- AMD EPYC (dual-socket servers)
- Intel Xeon (multi-socket NUMA configurations)
Key features:
- Handles SMT (hyperthreading) correctly
- Configurable NUMA pinning strategies
- Detects physical core IDs automatically
# 1. Install dependencies
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# 2. Configure your test
# Option A: Start from 1950X example (if you have one)
cp configs/example-1950x-exploratory.yaml configs/mytest.yaml
# Option B: Start from generic template
cp configs/example-exploratory.yaml configs/mytest.yaml
# Then: Edit model path, llama-bench path, CPU topology
# 3. Run exploratory sweep
./run_bench.sh configs/mytest.yaml
# 4. Check results
cat reports/latest/summary.md
# 5. Promote winners to deep testing
./run_bench.sh reports/latest/promote.yamlhft-cpu-test/
├── configs/ # Test definitions (YAML)
│ ├── minimal.yaml # Quick-start template
│ ├── example-exploratory.yaml # Generic exploratory template
│ ├── example-deep.yaml # Generic deep template
│ ├── example-1950x-exploratory.yaml # Real 1950X exploratory config
│ └── example-1950x-deep.yaml # Real 1950X deep config
├── scripts/
│ ├── bench_harness.py # Main orchestrator
│ ├── setup_builds.sh # Optional: build multiple BLAS variants
│ └── check_system.sh # System readiness checker
├── docs/
│ ├── QUICK_START.md # Getting started guide
│ ├── CONFIG_SCHEMA.md # Complete YAML reference
│ ├── PROVENANCE.md # What's captured per run
│ ├── EXAMPLES.md # Usage patterns
│ └── INDEX.md # Documentation index
├── reports/ # Auto-generated (timestamped)
├── builds/ # Optional: built binaries
├── run_bench.sh # Main entry point
└── requirements.txt # Python dependencies
- Test many builds (4-6) with simple configurations
- Default parameters, basic NUMA strategies
- 2-3 repetitions per configuration
- Goal: Identify 2-3 winning builds
- Generates ranking in
summary.md
- Test top 2-3 builds from exploratory with parameter variations
- KV cache types (f16/f16, f8/f16, f16/f8, f8/f8)
- MLA variants (mla 2/3, flash attention, fused MoE)
- Batch/ubatch size combinations
- 3 repetitions (breadth over depth)
- Goal: Find optimal parameter settings for production
Example presets (adjust for your CPU topology):
- vanilla: No pinning (baseline)
- all-cores: All physical cores across NUMA nodes
- single-node: Single NUMA node only
- balanced: Cores spread evenly across nodes
Important: Use lscpu --parse=CPU,Core,Node to identify your physical core IDs. On many AMD systems they're sequential (0-15), on some Intel systems they're even-numbered (0,2,4...).
Generic templates (adapt to your CPU):
configs/minimal.yaml- Quick start templateconfigs/example-exploratory.yaml- Full exploratory templateconfigs/example-deep.yaml- Deep validation template
Real working examples (AMD Threadripper 1950X):
configs/example-1950x-exploratory.yaml- Complete exploratory configconfigs/example-1950x-deep.yaml- Deep validation config
- Quick Start - Get running in 5 minutes
- Configuration Schema - Complete YAML reference
- Provenance - System state capture details
- Examples - Real-world usage patterns
- Documentation Index - Complete documentation guide
- Python 3.7+
numactl(for NUMA pinning)- At least one
llama-benchbinary from llama.cpp - Linux with NUMA support
When setting up on a new system:
- Check your topology:
lscpu --parse=CPU,Core,Nodeandnumactl --hardware - Identify physical core IDs (vs SMT/HT siblings)
- Update
--physcpubindin config to use physical cores only - Test with
--dry-runbefore executing - Run
./scripts/check_system.shto validate configuration
- Generated by Claude Sonnet 4.5 🤖
- Check-out the blog with experiments: https://nikro.me/
- Get in touch with us: https://humanfacetech.com/