Namespace: vernier::bench
Platform: Linux (full), macOS (core harness)
C++ Standard: C++23
Performance benchmarking framework with profiler integrations, GPU support, and statistical analysis.
- Quick Start
- Key Features
- Common Workflows
- CLI Tools
- API Reference
- Requirements
- Platform Support
- Testing
- Project Structure
- License
- See Also
#include "Perf.hpp"
PERF_TEST(MyLib, Throughput) {
UB_PERF_GUARD(perf);
perf.warmup([&]{ work(); });
auto result = perf.throughputLoop([&]{ work(); }, "label");
EXPECT_GT(result.callsPerSecond, 10000.0);
}
PERF_MAIN()make compose-debug
make compose-testp
docker compose run --rm -T dev-cuda bash -c '
./build/native-linux-debug/bin/ptests/BenchmarkCPU_PTEST --csv results.csv
'cmake --preset native-linux-debug
cmake --build --preset native-linux-debug
./build/native-linux-debug/bin/ptests/BenchmarkCPU_PTEST --csv results.csv- GoogleTest integration with CSV export and end-of-run summary tables
- 6 profiler backends: perf, gperftools, bpftrace, RAPL, callgrind, Nsight
- CUDA GPU benchmarking with multi-GPU and Unified Memory support
- Statistical analysis: median, percentiles, CV%, adaptive stability detection
- Memory bandwidth analysis with efficiency calculations
- Multi-threaded contention benchmarking with synchronized start gates
- Semantic test macros (PERF_THROUGHPUT, PERF_LATENCY, PERF_MEMORY, etc.)
- CLI tools for analysis, comparison, regression detection, and visualization
# 1. Baseline measurement
./bin/ptests/MyComponent_PTEST --repeats 30 --csv baseline.csv
# 2. Profile to find hotspots
./bin/ptests/MyComponent_PTEST --profile perf
# 3. Make changes, rebuild, measure again
./bin/ptests/MyComponent_PTEST --repeats 30 --csv optimized.csv
# 4. Statistical comparison
bench compare baseline.csv optimized.csv --threshold 5./bin/ptests/BenchmarkCPU_PTEST --quick --gtest_filter="*Throughput*"make compose-release
make installConsumers use find_package(vernier):
find_package(vernier REQUIRED)
target_link_libraries(my_benchmark PRIVATE vernier::bench)The install tree contains headers, shared libraries, CMake config, and documentation
under build/native-linux-release/install/.
Two CLI tools handle post-measurement analysis and visualization. Build with
make tools-rust and make tools-py, then source .env from the build directory.
| Tool | Language | Purpose |
|---|---|---|
bench |
Rust | Analysis, comparison, validation, execution, flamegraphs |
bench-plot |
Python | Visualization (plots, dashboards, charts) |
bench summary results.csv
bench compare baseline.csv candidate.csv --fail-on-regression
bench-plot plot results.csv --output charts/See tools/README.md for full CLI documentation.
| Document | Purpose |
|---|---|
| CPU Guide | CPU benchmarking patterns and profiler usage |
| GPU Guide | GPU/CUDA benchmarking patterns |
| API Reference | Complete API documentation |
| Advanced Guide | Memory profiling, parameterized tests |
| CI/CD Integration | Automated regression detection |
| Docker Setup | Container build and profiling setup |
| Troubleshooting | Common issues and solutions |
| Demo Walkthroughs | 12 step-by-step tutorials |
Required:
- C++23 compiler (clang-21 recommended, GCC 13+ also works)
- CMake 3.24+
- GoogleTest (auto-fetched via CMake FetchContent)
- POSIX system (Linux or macOS)
Optional:
- CUDA toolkit 12+ (GPU benchmarking)
- gperftools (gperf profiler backend)
- valgrind (callgrind profiler backend)
- bpftrace (syscall tracing)
- Rust toolchain (for
benchCLI tool) - Python 3.10+ with Poetry (for
bench-plotCLI tool)
| Platform | Library | Profilers | CUDA | Pre-built Artifact |
|---|---|---|---|---|
| x86_64 Linux | Full | All 6 | Yes | vernier-*-x86_64-linux[-cuda] |
| Jetson (aarch64) | Full | 5 (no RAPL) | Yes | vernier-*-aarch64-jetson |
| Raspberry Pi (aarch64) | Full | 5 (no RAPL) | No | vernier-*-aarch64-rpi |
| RISC-V 64 | Full | 5 (no RAPL) | No | vernier-*-riscv64-linux |
| macOS (Apple Silicon/x86) | Full | No-ops | No | Build from source |
RAPL is Intel-only (energy measurement). All profilers degrade gracefully when hardware or tools are unavailable -- the core timing harness always works.
# Build and run all tests (Docker)
make compose-debug
make compose-testp
# Run specific library tests
docker compose run --rm -T dev-cuda ctest --test-dir build/native-linux-debug -L bench
# CLI tool tests
make test-rust
make test-pyvernier/
CMakeLists.txt Root project (version, presets, CUDA detection)
Makefile Build entry point (make help for full list)
docker-compose.yml Dev containers (CPU, CUDA, cross-compile)
cmake/vernier/ CMake infrastructure (targets, testing, coverage)
docker/ Dockerfiles (base, dev, builder, toolchain)
mk/ Make modules (build, test, docker, coverage)
src/
bench/ Benchmarking library
inc/ Public headers (Perf.hpp, PerfGpu.hpp, profilers)
src/ Profiler implementations
bpf/ BPF tracing scripts
utst/ Unit tests (66 tests)
ptst/ Performance tests (CPU + GPU)
demo/ Educational demos with step-by-step docs
docs/ Technical documentation
tools/
rust/ bench CLI (Rust)
py/ bench-plot CLI (Python)
MIT License. See LICENSE for details.
- tools/README.md - CLI tools documentation (bench, bench-plot)
- src/bench/docs/CPU_GUIDE.md - CPU benchmarking guide
- src/bench/docs/GPU_GUIDE.md - GPU benchmarking guide
- src/bench/docs/ - Technical documentation