Add comparative benchmark harness and stable-recalc perf groundwork#28
Merged
Add comparative benchmark harness and stable-recalc perf groundwork#28
Conversation
517be91 to
009b50b
Compare
Add dense INDEX/MATCH and cross-sheet dimension/fact benchmark scenarios with deterministic corpus generation and verification. The dense lookup case is right-sized to 50k rows for practical comparative runs while still exposing lookup scaling behavior.
Add row-insert churn and sheet rename/rebind scenarios, along with structural metadata for recovery coverage.
009b50b to
b3b9f64
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR lands the comparative benchmark harness, expands the benchmark suite substantially, adds CI/nightly benchmark execution plans, and includes two narrowly-scoped stable-topology performance improvements for Formualizer.
At a high level it adds:
What landed
Benchmark harness + contracts
benchmarks/scenarios.yamlbenchmarks/function_matrix.yamlbenchmarks/reporting.mdbenchmarks/README.mdbenchmarks/harness/...crates/formualizer-bench-corecrates/formualizer-testkitHighlights:
formualizer_rust_nativeironcalc_rust_nativehyperformula_nodeNew benchmark scenarios
Added scenarios across the benchmark meta tranche:
inc_sparse_dirty_region_1minc_cross_sheet_mesh_3x25klookup_index_match_dense_50klookup_cross_sheet_dim_factagg_countifs_multi_criteria_100kagg_mixed_rollup_grid_2k_reportsstruct_row_insert_middle_50k_refsstruct_sheet_rename_rebindreal_finance_model_v1real_ops_model_v1Plus the earlier core scenarios now governed under the same suite metadata:
headline_100k_single_editchain_100kfanout_100kcross_sheet_meshsparse_whole_column_refssumifs_fact_table_100kstructural_sheet_recoveryCI / nightly execution plans
Added YAML-defined execution plans in:
benchmarks/harness/plans.yamlPlans:
ci_formualizer_gatecore_smokeplusstructural_sheet_recoverynightly_native_comparesThe harness runner now supports:
list-plansvalidate-plansrun-plan --plan <name>Fairness / parity fixes
The generated comparison corpus was hardened so comparator engines can ingest the same workbook fairly:
=inside OOXML<f>nodesThis addressed a concrete comparator fairness issue with IronCalc workbook ingest.
Stable-topology perf groundwork
1. Benchmark-runner recalc plan reuse mode
perf(bench): add recalc plan reuse modeAdds a controlled
native_best_cached_plan/--reuse-recalc-planmode to the Formualizer native benchmark runner.Properties:
2. Engine static schedule cache for stable recalcs
perf(engine): cache static schedules for stable recalcsAdds a conservative internal schedule cache for stable-topology, non-dynamic, non-range-dependency recalcs.
Properties:
Performance notes
We did one extra A/B measurement round on the same machine comparing:
0ea742a320d484Measured as 3-run medians for
formualizer_rust_nativenative_best:headline_100k_single_edit19.467 us24.887 us+27.8%chain_100k105,092.923 us70,290.640 us-33.1%fanout_100k63,018.455 us45,666.231 us-27.5%sumifs_fact_table_100k19,060.541 us21,267.939 us+11.6%lookup_cross_sheet_dim_fact16,538.927 us14,976.657 us-9.5%Interpretation:
headlineis a few microseconds;sumifsis a couple milliseconds incremental)Validation performed
Representative validation run on this branch included:
cargo fmt --allcargo clippy -p formualizer-eval --lib --tests -- -D warningscargo clippy -p formualizer-bench-core --all-targets --features xlsx,formualizer_runner,ironcalc_runner -- -D warningscargo clippy -p formualizer-testkit --all-targets -- -D warningscargo test -p formualizer-eval recalc_plan -- --nocapturecargo test -p formualizer-eval schedule_cache -- --nocapturecargo test -p formualizer-bench-core --features formualizer_runner --bin run-formualizer-native -- --nocaptureuv run --project benchmarks/harness python benchmarks/harness/runner/main.py validate-suiteuv run --project benchmarks/harness python benchmarks/harness/runner/main.py validate-plansformualizer_rust_nativenative-best suite sweepFresh suite sweep result:
formualizer_rust_nativeMerge-readiness / artifact hygiene
Checked before opening this PR:
results/is ignoredbenchmarks/harness/ref-libs/are ignoredNotable ignored paths include:
benchmarks/corpus/synthetic/benchmarks/corpus/real/*.xlsxbenchmarks/harness/results/benchmarks/harness/ref-libs/target/So the PR contains harness/tooling/contracts/docs/tests/code, but not local benchmark artifacts or vendored comparator checkouts.
Follow-up after merge
I recommend follow-up perf work on top of this merged benchmark baseline rather than extending this PR further.
Most likely next tranche: