H100 receipts pack: M≪N work-shrink exact retrieval shows 308× lower J/query at N=20M (public-safe)

Hi TensorRT team — I’m posting this as a routing request to the right CUDA/performance or inference-performance integration owner.

I have a public-safe H100 results pack (I will attach a zip to this issue). It contains:
- raw benchmark JSON outputs + measured energy receipts (NVML / nvidia-smi sampling)
- explicit PASS/FAIL gates
- a compact one-pager summary (summary_public.json) + schema + a tiny validator script
- the Python harness used to produce the key measurement (measurement side is not a black box)

Headline result (from a single H100, short steady window; exact top‑1 check):
N=20,000,000 candidates, query_len=256
- full_scan_top1: p95 ≈ 37.523 ms, energy/query ≈ 4.46297 J
- range_scan_top1 (M≪N work‑shrink/routing): p95 ≈ 0.11414 ms, energy/query ≈ 0.0144809 J
=> ~308× lower J/query and ~300× lower p95 latency, with top‑1 exactness still matching.

Why I think this matters:
This is a scaling regime shift. The baseline cost scales with N, while the routed path scales with M≪N. At some N the baseline becomes infeasible (OOM wall for explicit NxN fp16 materialization), while the routed path still runs.

How to verify quickly after attaching the zip:
1) Open README.txt in the zip (it points to the exact JSON field paths).
2) Check the main record:
   prototypes/prototype_ctdr_landauer_lab/benchmarks/results/gpu_2025-12-19/joules_query_prefix_range.json
   Look for:
   - delta.energy_per_query_ratio_full_over_range
   - passfail.range_scan.pass == true
   - correctness.range.ok == true
3) Optional: validate the pack summary (no external deps):
   python partner_packet_nvidia/public_teaser/pack_tools/validate_summary_public.py partner_packet_nvidia/public_teaser/assets/summary_public.json

What I’m asking:
Who is the right person/team to evaluate this (CUDA perf / inference perf integration)? If you can route me, I can share a short runbook; I can provide full reproducible harness + implementation details.

(For clarity: the pack contains no kernel source/PTX/SASS.)

[CTDR_public_pack_20251219.zip](https://github.com/user-attachments/files/24319503/CTDR_public_pack_20251219.zip)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

H100 receipts pack: M≪N work-shrink exact retrieval shows 308× lower J/query at N=20M (public-safe) #4672

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

H100 receipts pack: M≪N work-shrink exact retrieval shows 308× lower J/query at N=20M (public-safe) #4672

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions