-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Hi TensorRT team — I’m posting this as a routing request to the right CUDA/performance or inference-performance integration owner.
I have a public-safe H100 results pack (I will attach a zip to this issue). It contains:
- raw benchmark JSON outputs + measured energy receipts (NVML / nvidia-smi sampling)
- explicit PASS/FAIL gates
- a compact one-pager summary (summary_public.json) + schema + a tiny validator script
- the Python harness used to produce the key measurement (measurement side is not a black box)
Headline result (from a single H100, short steady window; exact top‑1 check):
N=20,000,000 candidates, query_len=256
- full_scan_top1: p95 ≈ 37.523 ms, energy/query ≈ 4.46297 J
- range_scan_top1 (M≪N work‑shrink/routing): p95 ≈ 0.11414 ms, energy/query ≈ 0.0144809 J
=> ~308× lower J/query and ~300× lower p95 latency, with top‑1 exactness still matching.
Why I think this matters:
This is a scaling regime shift. The baseline cost scales with N, while the routed path scales with M≪N. At some N the baseline becomes infeasible (OOM wall for explicit NxN fp16 materialization), while the routed path still runs.
How to verify quickly after attaching the zip:
- Open README.txt in the zip (it points to the exact JSON field paths).
- Check the main record:
prototypes/prototype_ctdr_landauer_lab/benchmarks/results/gpu_2025-12-19/joules_query_prefix_range.json
Look for:- delta.energy_per_query_ratio_full_over_range
- passfail.range_scan.pass == true
- correctness.range.ok == true
- Optional: validate the pack summary (no external deps):
python partner_packet_nvidia/public_teaser/pack_tools/validate_summary_public.py partner_packet_nvidia/public_teaser/assets/summary_public.json
What I’m asking:
Who is the right person/team to evaluate this (CUDA perf / inference perf integration)? If you can route me, I can share a short runbook; I can provide full reproducible harness + implementation details.
(For clarity: the pack contains no kernel source/PTX/SASS.)