Skip to content

feat(gnovm): paramterize and calibrate gas model (from Xeon 8168 benchmarks)#5291

Open
jaekwon wants to merge 32 commits intomasterfrom
dev/jae/gas-model-improvements
Open

feat(gnovm): paramterize and calibrate gas model (from Xeon 8168 benchmarks)#5291
jaekwon wants to merge 32 commits intomasterfrom
dev/jae/gas-model-improvements

Conversation

@jaekwon
Copy link
Contributor

@jaekwon jaekwon commented Mar 15, 2026

Summary

  • Calibrate all 180+ CPU gas constants from benchmark data on Intel Xeon 8168 (cpuBaseNs = 5.2 ns/gas)
  • Add per-N gas charging for parameterized ops (array/slice/map/struct literals, func calls, type switches, type
    asserts, conversions, equality, BigInt/BigDec arithmetic, NameExpr depth, compound assigns)
  • Replace flat-per-byte allocation gas (1 gas/byte) with calibrated size-class lookup table (allocGasTable)
  • Replace guessed GC visit cost (VisitCpuFactor = 8) with cache-aware lookup table (gcVisitGasTable)
  • Add comprehensive benchmarking and calibration tooling

Depends on #5091 and #5289

Gas impact

Test Before After Change
gas/const.gno 2,966 479 -84%
gas/nested_alloc.gno 13,273,861 3,810,912 -71%
gas/slice_alloc.gno 500,003,015 10,042,122 -98%

The old model massively overcharged allocation gas (208-byte struct = 208 gas ~ 624ns equivalent, actual malloc ~
30ns). The new model charges ~8 gas for the same allocation. CPU base costs decreased because old values were
overestimated, but per-N slopes were added to prevent DoS via large inputs (big arrays, deep scopes, wide BigInts).
GC visit gas increased for large heaps to model cache effects (6 gas/visit in L2 vs 135 gas/visit hitting DRAM).

Per-N charging formulas

  • Parameterized ops: base + slope * N (e.g., ArrayLit: 37 + 9/element)
  • BigInt linear: bits * slope / 1024 (e.g., Add: 9/kb)
  • BigInt quadratic: (bits/32)^2 * slope / 32 (e.g., Mul, Rem: slope=1)
  • BigDec linear: digits * slope / 100 (e.g., Add: 72/100digits)
  • BigDec quadratic: (digits/10)^2 * slope / 10 (e.g., Mul, Quo: slope=1)
  • NameExpr: 1 * block_depth

Benchmarks (5,500+ lines)

  • bench_ops_test.go: 100+ parameterized op handler benchmarks covering all arithmetic, comparison, assignment,
    literal construction, type operations, conversions, BigInt/BigDec at multiple bit widths, and NameExpr depth
    scaling
  • bench_gc_test.go: GC visit cost benchmarks with realistic object graphs at varying heap sizes (1K to 16M
    objects)
  • cmd/calibrate/alloc_bench_test.go: Go heap allocation cost benchmarks across 32 size classes (1B to 1GB)

Calibration tooling

  • cmd/calibrate/gen_analysis.py: Parses op benchmark output, fits linear/quadratic models, computes calibrated
    OpCPU constants with R^2 validation and worst-case verification
  • cmd/calibrate/gen_alloc_table.py: Fits power-law model to allocation benchmarks, generates allocGasTable
    lookup table
  • cmd/calibrate/plot_fits.py: Generates multi-panel matplotlib visualizations of parameterized op cost fits
  • cmd/calibrate/op_gas_formulas.md: Documents gas formula for every parameterized op
  • Reference benchmark data from 3 hardware profiles (DO Dedicated Xeon 8168, DO Regular, M2 ARM64)

Makefile targets

  • make calibrate.gas — full end-to-end: op benchmarks + alloc benchmarks + analysis + plots
  • make calibrate.alloc — allocation benchmarks + table generation
  • make calibrate.analysis — run analysis on existing benchmark data
  • make calibrate.plot — generate fit visualizations

Test plan

  • go test ./gnovm/pkg/gnolang/ -run TestFiles -test.short — all gas golden tests pass
  • go test ./gno.land/pkg/integration/ -run TestTestdata — all integration tests pass
  • go test ./gno.land/pkg/sdk/vm/ -run TestAddPkg — gas_test.go passes
  • Full CI

Note: These gas changes are consensus-breaking and must be gated behind a chain upgrade version.

🤖 Generated with Claude Code

ltzmaxwell and others added 28 commits March 13, 2026 21:44
Replace the start/stop/start/stop benchmarking pattern with a gap-free
SwitchOpCode model that atomically finalizes one op and starts the next
using a single time.Now() call, ensuring all CPU time is accounted for.

Key changes:
- New 4-primitive API: BeginOpCode/SwitchOpCode/ResumeOpCode/StopOpCode
- SwitchOpCode returns old op code; Go call stack replaces explicit stack
- Unified StartStore/StopStore suspend VM op timer and track store duration
- Add dedicated RealmDidUpdate/RealmFinalizeTx store operation codes
- Add Enabled const (OpsEnabled || StorageEnabled || NativeEnabled) to gate
  all collection uniformly; specific flags gate only export
- Fix FinishStore exporting with TypeNative instead of TypeStore
- Remove OpStaticTypeOf exclusion from benchmarking

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- StartNative: finalize previous op BEFORE runtime.GC() so GC time
  is not attributed to any opcode
- Export format expanded from 10 to 14 bytes: now exports
  (totalDuration, totalSize, count) instead of pre-averaged values
- stats.go computes proper weighted averages and per-run stddev
- Add GAS_TODO.md with full gas metering audit findings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pre-load all benchmark packages and stdlibs before enabling recording,
so init-phase store operations don't contaminate benchmark data.

- Add explicit Recording flag to gate export (not implicit nil check)
- Extract loadBenchPackages() to separate loading from benchmarking
- Benchmark functions now take pre-loaded *PackageValue directly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Clarify that Fork() intentionally omits gasMeter and GC callback.
The caller is responsible for setting these when needed (e.g. the
Machine constructor does this for transactions). Query contexts
intentionally omit the gasMeter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…izes

The _alloc* constants represent unsafe.Sizeof for each GnoVM value
type, but many were stale — structs grew as fields were added
(ObjectInfo, FuncValue.Crossing, etc.). This caused systematic
under-charging of memory gas.

Changes:
- Update all _alloc* constants to match current unsafe.Sizeof values
- Replace _allocBase+_allocPointer with unified _allocHeap constant
- Fix allocPointer to include PointerValue size (was just _allocBase)
- Fix allocHeapItem to use actual HeapItemValue size (was _allocTypedValue)
- Fix allocMapItem from *3 to *2 (key + value TypedValues)
- Add detailed comments explaining the allocation model
- Update test golden values for new allocation sizes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Panics at startup if any _alloc* constant doesn't match
unsafe.Sizeof, so stale values are caught immediately when
struct fields change.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the linear per-byte gas model (1 gas/byte) with a calibrated
power-law model based on actual Go malloc benchmarks on a dedicated
DigitalOcean instance (Intel Xeon 8168, 2-core).

The new allocGasTable uses 6 exact benchmark points (1B-32B) plus a
power-law fit (ns = 0.47 × size^0.925) for larger sizes. At runtime,
allocGas() does O(1) lookup via bits.Len64 + linear interpolation.

Key changes:
- Small allocs (e.g. 208B struct) drop from 208 gas to ~8 gas
- Large allocs use sublinear scaling instead of linear
- Allocation-heavy tests see 50-98% gas reduction
- CPU/store-dominated tests change <2%

Adds gnovm/cmd/calibrate/ with benchmarks, data, and tooling for
recalibration on different hardware.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
shlAssign/shrAssign allocated closures on every call for overflow
checking that only runs during preprocessing (StagePre). This caused
unnecessary GC pressure at runtime.

- Remove checkOverflow closure pattern; inline StagePre guard
- Add shlCheckOverflow helper that short-circuits for shift > 64
  (any non-zero value overflows any fixed-width type)
- Cap UntypedBigintType shifts at 10000 bits to prevent DoS via
  huge big.Int allocations (e.g., 1 << 1_000_000_000)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…audit

Add comprehensive microbenchmarks for gas calibration:
- bench_ops_test.go: 234 benchmarks covering all 90 GnoVM op handlers,
  parameterized at powers of 10 (1, 10, 100, 1000, 10000)
- bench_gc_test.go: GC visitor traversal benchmarks with realistic object
  graphs (100 to 10M objects), reports ns/visit for VisitCpuFactor calibration
- op_handler_gas_audit.md: audit of every doOpXxx handler documenting
  cost-varying parameters, pessimistic inputs, and benchmark coverage gaps
- benchops: add OpAccumDur/OpCount accessor functions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace flat VisitCpuFactor=8 with gcVisitGasTable, a 25-entry lookup
indexed by log2(visitCount). Per-visit cost scales with heap size due
to CPU cache effects: 6 gas/visit for small heaps (L2-resident) up to
135 gas/visit for huge heaps (DRAM+TLB bound).

Calibrated from BenchmarkGCVisit on DigitalOcean Dedicated 2-core,
Intel Xeon Platinum 8168 @ 2.70GHz, cpuBaseNs=5.2.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add parameterized benchmarks for accurate gas calibration:
- String ops (Add, Eql, Lss, Convert, Index1_MapStringKey, Slice) × 4 lengths
- BigInt ops (14 ops) × 4 bit-lengths (64, 256, 1024, 4096)
- BigDec ops (7 ops) × 4 precisions (10, 100, 1000, 10000 digits)
- Byte-array variants (Eql, ArrayLit, Index1, Slice) × 4 sizes
- Asymmetric BigInt benchmarks (11 cross-size combinations)

Add bytes.Equal fast path for byte array comparison in isEql,
replacing element-by-element DataByteValue wrapping (~850x speedup).

Total benchmark count: 337.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add benchmarks for call dispatch variants:
- BenchmarkOpPrecall_BoundMethod (method precall via BoundMethodValue)
- BenchmarkOpCall_Method (doOpCall with receiver)
- BenchmarkOpIfCond_FalseBranch (else branch dispatch)
- BenchmarkOpSelector_VPInterface_{1,10,100} (interface method dispatch)
- BenchmarkOpTypeAssert2_Interface_{Hit,Miss} (interface type assertion)
- BenchmarkOpTypeSwitch_Interface_{1,10,100} (interface type switch)

Extract shared helpers to reduce duplication:
- benchInterfaceAndImpl: builds interface + implementing type
- benchMethodSetup: builds method type/value/receiver

Fix pre-existing issues:
- Hoist FieldType construction out of hot loop in benchOpStructType
  and benchOpInterfaceType (was rebuilding strings + structs per iter)
- Standardize method naming in benchOpTypeAssert1_Interface (M%d)
- Remove redundant BenchmarkOpCall_Closure (duplicate of benchOpCall)

Total benchmark count: 350.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add benchmark findings section to op_handler_gas_audit.md with data on
byte array equality fix (850x), BigInt asymmetric costs, BigDec precision
scaling, interface method dispatch scaling, GC visit cache effects, and
ArrayValue dual representation. Mark completed items in GAS_TODO.md and
MEM_TODO.md. Add BENCH.md noting benchops superseded by Go-level benchmarks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…on data

Add allocation gas metering to op handler benchmarks: benchMachine() now
creates a real Allocator with a GasMeter so reportBenchops() can report
exact alloc-gas/op alongside ns/op(pure). This enables precise separation
of CPU gas from allocation gas in calibration analysis.

Save DO Xeon 8168 benchmark results and analysis report with per-op
alloc gas breakdown, proposed gas formulas, and mismatch identification.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ation

Add higher-N benchmarks for parameterized ops to improve least-squares
fit accuracy: Define, Assign, StructType, InterfaceType, TypeSwitch,
TypeSwitch_Interface, ReturnCallDefers, Call (params and captures) at
N=1000. Add StructLitNamed benchmarks for the named-field code path.
Update analysis report with worst-case input verification section.
Update DO Xeon benchmark data with alloc-gas/op metrics.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace fixed BenchmarkOpFuncType with benchOpFuncType(b, nParams, nResults)
measuring params and results independently at 0/1/10/100/1000 points each.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…arameterized ops

Calibrate all 68 OpCPU* constants against Xeon 8168 benchmarks (cpuBaseNs=5.2).
Add per-N CPU gas charging in 20 parameterized op handlers so gas scales with
input size (e.g., map entries, struct fields, array elements, function params).

Key changes:
- machine.go: update flat constants from benchmarks, add 22 slope constants
- op_assign.go: Define (15/LHS), Assign (17/LHS)
- op_expressions.go: ArrayLit (9/elt), SliceLit (4/elt), SliceLit2 (5/size),
  MapLit (60/entry), StructLit (9/field), FuncLit (7/capture),
  TypeAssert1/2 (67/method), Convert str↔runes (3/2 per char)
- op_call.go: Call (9/param + 5/capture)
- op_exec.go: TypeSwitch (49/clause), ForLoop heap (8/var), RangeIter (2/elt)
- op_binary.go: Eql array (27/elt), Eql struct (26/field)
- op_types.go: FuncType (4/param+result), StructType (6/field), InterfaceType (5/method)
- op_decl.go: ValueDecl (6/name)
- Add calibrate.analysis and calibrate.plot Makefile targets
- Add gen_analysis.py, plot_fits.py, op_gas_formulas.md

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r depth

Add calibrated per-bit-width gas charging for BigInt and BigDec
arithmetic operations, and per-depth charging for NameExpr block
traversal. BigInt uses per-kilobit slopes, BigDec uses per-100-digit
slopes, and Mul/Quo use quadratic models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…as target

Add per-N gas charging for remaining BigInt ops:
- Rem: quadratic model (same as Mul/Quo)
- Shl: per-kilobit of shift amount (output growth)
- Shr: per-kilobit of input bit width

Add unified `make calibrate.gas` target that runs op benchmarks,
allocation benchmarks, analysis, and plot generation end-to-end.
Also add `make calibrate.alloc` for allocation-only calibration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The compound assignment handlers (+=, -=, *=, etc.) in op_assign.go
were missing per-N BigInt/BigDec gas charging, allowing users to
bypass per-N gas by using `x += bigval` instead of `x = x + bigval`.

Also: use GetBigInt()/GetBigDec() consistently in unary helpers,
use max() builtin, remove stale Makefile dependency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 3B gas-wanted was a leftover from an intermediate state where the
loop reduction (100→2 iterations) landed before alloc gas recalibration.
Actual gas used is ~22M, so 100M provides adequate headroom.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update expected gas values in txtar tests after per-N gas charging
for BigInt/BigDec ops and compound assign handlers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jaekwon jaekwon requested review from aeddi and ltzmaxwell and removed request for ltzmaxwell March 15, 2026 08:28
@github-actions github-actions bot added 📦 🤖 gnovm Issues or PRs gnovm related 📦 ⛰️ gno.land Issues or PRs gno.land package related labels Mar 15, 2026
@Gno2D2
Copy link
Collaborator

Gno2D2 commented Mar 15, 2026

🛠 PR Checks Summary

All Automated Checks passed. ✅

Manual Checks (for Reviewers):
  • IGNORE the bot requirements for this PR (force green CI check)
Read More

🤖 This bot helps streamline PR reviews by verifying automated checks and providing guidance for contributors and reviewers.

✅ Automated Checks (for Contributors):

No automated checks match this pull request.

☑️ Contributor Actions:
  1. Fix any issues flagged by automated checks.
  2. Follow the Contributor Checklist to ensure your PR is ready for review.
    • Add new tests, or document why they are unnecessary.
    • Provide clear examples/screenshots, if necessary.
    • Update documentation, if required.
    • Ensure no breaking changes, or include BREAKING CHANGE notes.
    • Link related issues/PRs, where applicable.
☑️ Reviewer Actions:
  1. Complete manual checks for the PR, including the guidelines and additional checks if applicable.
📚 Resources:
Debug
Manual Checks
**IGNORE** the bot requirements for this PR (force green CI check)

If

🟢 Condition met
└── 🟢 On every pull request

Can be checked by

  • Any user with comment edit permission

@Kouteki Kouteki moved this from Triage to In Review in 🧙‍♂️Gno.land development Mar 15, 2026
jaekwon and others added 3 commits March 15, 2026 01:41
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fixes thelper lint errors in bench_ops_test.go (71 functions),
bench_gc_test.go (1 function), and alloc_bench_test.go (1 function).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

📦 ⛰️ gno.land Issues or PRs gno.land package related 📦 🤖 gnovm Issues or PRs gnovm related 📄 top-level-md

Projects

Status: No status
Status: In Review

Development

Successfully merging this pull request may close these issues.

3 participants