[None][infra] Cherry-pick Perf Test Scripts by chenfeiz0326 · Pull Request #12057 · NVIDIA/TensorRT-LLM

chenfeiz0326 · 2026-03-10T02:11:36Z

Summary by CodeRabbit

Release Notes

New Features
- Added local SLURM-based performance test execution workflows for aggregated and disaggregated benchmarks.
- Introduced pre-merge HTML report generation with performance data visualization and historical comparisons.
- Expanded disaggregated testing support with comprehensive configuration coverage for multiple model and hardware scenarios.
- Added error detection and reporting utilities for improved test diagnostics.
Improvements
- Enhanced Jenkins pipeline with updated platform resolution and mode detection logic.
- Improved performance data collection and baseline computation capabilities.
- Refined test configuration management with better hardware detection.
Documentation
- Added system architecture and workflow documentation for performance testing infrastructure.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

coderabbitai · 2026-03-10T02:28:09Z

📝 Walkthrough

Walkthrough

Introduces a comprehensive performance sanity testing infrastructure refactor incorporating local and disaggregated SLURM-based test launching, OpenSearch integration for historical baseline data retrieval, HTML/SVG report generation for pre- and post-merge analysis, enhanced test configuration management with split group support, and new error detection utilities across Jenkins, Python scripts, Bash workflows, and YAML configurations.

Changes

Cohort / File(s)	Summary
Jenkins CI orchestration `jenkins/L0_Test.groovy`	Platform resolution via SlurmConfig.resolvePlatform, disagg stage detection updated to match "Disagg-PerfSanity", interruption handling with stageIsInterrupted flag, HF token credential integration, and --split-group argument support for disaggregated Slurm launches.
OpenSearch data integration `jenkins/scripts/open_search_db.py`, `tests/integration/defs/perf/open_search_db_utils.py`	New queryPerfDataFromOpenSearchDB method for querying perf data with must/must_not clauses; refactored history retrieval and baseline computation; added generate_perf_yaml for YAML persistence; replaced calculate_best_perf_result with calculate_baseline_metrics.
Performance reporting & utilities `jenkins/scripts/perf/perf_utils.py`, `jenkins/scripts/perf/get_pre_merge_html.py`, `jenkins/scripts/perf/perf_sanity_triage.py`	New comprehensive perf_utils module with OpenSearch integration, baseline computation, regression detection, and post-merge HTML/SVG dashboard generation; new get_pre_merge_html for pre-merge report generation; expanded perf_sanity_triage with structured parsing, UPDATE operation support, and split message handling.
Local SLURM launching infrastructure `jenkins/scripts/perf/local/submit.py`, `jenkins/scripts/perf/local/slurm_.sh`, `jenkins/scripts/perf/local/run__local.sh`	New submit.py orchestrates SLURM script generation with hardware/environment config handling; slurm_install.sh manages wheel building and installation across nodes; slurm_run.sh executes tests with pytest integration; run_agg/e2e/gen_only_local.sh scripts iterate test configurations.
Disaggregated SLURM launching `jenkins/scripts/perf/disaggregated/submit.py`, `jenkins/scripts/perf/disaggregated/slurm_launch_draft.sh`	Refactored submit.py with new get_pytest_commands splitting into worker/server/benchmark commands; parse_test_case_name with split group selection; per-role environment partitions and GPU-per-node calculations; enhanced draft script with gen/ctx server differentiation and per-context GPU distribution.
Aggregated SLURM launching `jenkins/scripts/perf/aggregated/slurm_launch_draft.sh`	New draft script with cleanup_on_failure helper, world_size calculation, srun execution with kill-on-bad-exit, and job cancellation on failure.
Performance test configurations (aggregated) `tests/scripts/perf-sanity/aggregated/*.yaml`	Comprehensive YAML configs for deepseek/gpt_oss/k2_thinking models with varying precisions/modes (FP4/FP8), parallelism strategies (TP/DEP/TEP), speculative decoding configs, attention DP settings, and per-model dataset references.
Performance test configurations (disaggregated) `tests/scripts/perf-sanity/disaggregated/*.yaml`	Extensive disaggregated configs for deepseek/gpt_oss/qwen/k2_thinking models with gen/ctx worker specifications, UCX/CUTEDSL backends, MTP/TP decoding, pipeline parallelism, and nested sub-configs for cache/graph/MOE.
Test infrastructure & utilities `tests/integration/defs/perf/test_perf_sanity.py`, `tests/test_common/error_utils.py`, `tests/test_common/http_utils.py`	Refactored test_perf_sanity with test_output_dir plumbing, parse_test_string decomposition, config directory resolution, ServerConfig/ClientConfig enhancements (dataset_file, eagle3_one_model); new error_utils for keyword-based error detection and contextual reporting; http_utils augmented with periodic file checking via check_error.
Documentation `jenkins/scripts/perf/README.md`, `jenkins/scripts/perf/local/README.md`, `tests/scripts/perf-sanity/README.md`	Added comprehensive README files describing perf-sanity architecture, SLURM script generation flow, local submission workflows, disaggregated execution with gen/ctx servers, environment variable overrides, and CI stage configuration guidelines.

Sequence Diagram

sequenceDiagram
    participant Jenkins as Jenkins Pipeline
    participant LocalSubmit as Local Submit.py
    participant SLURM as SLURM Cluster
    participant InstallScript as Install Script
    participant TestRunner as Test Runner
    participant OpenSearch as OpenSearch DB
    participant Reporter as Report Generator

    Jenkins->>LocalSubmit: Parse test config & hardware
    LocalSubmit->>LocalSubmit: Generate SLURM launch script
    LocalSubmit->>SLURM: sbatch slurm_launch.sh
    SLURM->>InstallScript: Execute on compute nodes
    InstallScript->>InstallScript: Build wheel (coordinated)
    InstallScript->>InstallScript: Install dependencies
    SLURM->>TestRunner: Execute pytest commands
    TestRunner->>TestRunner: Run performance benchmarks
    TestRunner->>OpenSearch: Post new perf data
    Reporter->>OpenSearch: Query history & baseline data
    OpenSearch-->>Reporter: Return perf data
    Reporter->>Reporter: Generate HTML/SVG reports
    Reporter-->>Jenkins: Return report artifacts

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Rationale: 70+ affected files spanning heterogeneous change types (Jenkins config, Python modules with dense logic, Bash scripts, YAML configurations, test infrastructure). Key files like perf_utils.py (1620 lines), submit.py (724 lines), and test_perf_sanity.py (395 lines) contain substantial new logic for OpenSearch integration, report generation, and test orchestration. Structural changes to test infrastructure and new multi-component workflows require careful reasoning. Configuration files (YAML) reduce effort per file but add breadth. Not purely refactoring—introduces new functionality across multiple subsystems.

Possibly related PRs

[TRTLLM-8263][feat] Add Disagg Perf Tests #10912: Modifies jenkins/L0_Test.groovy to add disaggregated PerfSanity stage generation; this PR updates disagg stage detection and Slurm handling for the same file, suggesting shared infrastructure changes for disaggregated test orchestration.

Suggested reviewers

laikhtewari
nv-guomingz
longlee0622
fredricz-20070104

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

coderabbitai

Actionable comments posted: 8

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

tests/scripts/perf-sanity/aggregated/config_database_b200_nvl.yaml (2)
623-648: ⚠️ Potential issue | 🟠 Major

Use CUTLASS for this FP4 v2 8k1k ADP/MTP1 path.

This block is otherwise the DEP/MTP1 profile (enable_attention_dp: true and num_nextn_predict_layers: 1), but moe_config.backend is still TRTLLM. The sibling FP4 v2 DEP entries use CUTLASS, so this one will measure a different backend.
Suggested fix
   moe_config:
-    backend: TRTLLM
+    backend: CUTLASS
   attention_dp_config:
     batching_wait_iters: 0
     enable_balance: true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/scripts/perf-sanity/aggregated/config_database_b200_nvl.yaml` around
lines 623 - 648, The moe backend for the FP4 v2 ADP/MTP1 entry (name:
nvidia_DeepSeek_R1_0528_FP4_v2_8192_1024_conc512_gpu8, model_name:
deepseek_r1_0528_fp4_v2) is set to TRTLLM but should use CUTLASS to match
sibling FP4 v2 DEP profiles; update the moe_config.backend value from TRTLLM to
CUTLASS in this YAML block so the test measures the correct backend.
383-407: ⚠️ Potential issue | 🟠 Major

Add the missing MTP block to this FP4 v2 ADP profile.

This is the only touched FP4 v2 CUTLASS/attention-DP scenario here without speculative_config. The sibling DEP-style variants all pin decoding_type: MTP with one predictor layer, so this entry will benchmark a different decode path than the rest.
Suggested fix
   attention_dp_config:
     batching_wait_iters: 0
     enable_balance: true
     timeout_iters: 60
+  speculative_config:
+    decoding_type: MTP
+    num_nextn_predict_layers: 1
   tensor_parallel_size: 8
   moe_expert_parallel_size: 8
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/scripts/perf-sanity/aggregated/config_database_b200_nvl.yaml` around
lines 383 - 407, The FP4 v2 ADP profile block named
"nvidia_DeepSeek_R1_0528_FP4_v2_1024_1024_conc2048_gpu8" is missing the MTP
speculative config; add a speculative_config section to that profile with
decoding_type set to "MTP" and predictor_layers set to 1 (matching sibling
DEP-style variants) so this CUTLASS/attention-DP scenario uses the same MTP
decode path as the others; place the speculative_config under the profile’s
top-level keys (alongside kv_cache_config, moe_config, attention_dp_config,
etc.) using the key names speculative_config, decoding_type, and
predictor_layers to locate and update the block.
tests/scripts/perf-sanity/aggregated/config_database_h200_sxm.yaml (1)
820-831: ⚠️ Potential issue | 🟠 Major

Add the missing attention_dp_config to this GPT-OSS ADP case.

This is the only touched GPT-OSS entry here with enable_attention_dp: true but no accompanying attention_dp_config. The other ADP profiles explicitly set enable_balance: true, so this scenario will run with different scheduling behavior.
Suggested fix
   enable_attention_dp: true
+  attention_dp_config:
+    enable_balance: true
   print_iter_log: true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/scripts/perf-sanity/aggregated/config_database_h200_sxm.yaml` around
lines 820 - 831, The GPT-OSS ADP profile named
openai_gpt_oss_120b_8192_1024_conc16_gpu2 (model_name: gpt_oss_120b_fp4) sets
enable_attention_dp: true but is missing an attention_dp_config; add an
attention_dp_config block mirroring the other ADP entries (including
enable_balance: true) so scheduling behavior matches the other profiles—locate
the YAML entry by the name or model_name and insert attention_dp_config with
enable_balance: true and the same subfields used by the other GPT-OSS ADP cases.
jenkins/scripts/perf/disaggregated/submit.py (1)
331-357: ⚠️ Potential issue | 🟠 Major

This change removes the advertised local-mode path.

main() now always resolves config_yaml from --test-list and then unconditionally opens args.script_prefix / args.srun_args. That makes --config-yaml and --run-ci=false unusable and breaks local invocations with the current CLI.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/perf/disaggregated/submit.py` around lines 331 - 357, main()
now unconditionally resolves config_yaml via
parse_test_case_name(args.test_list, ...) and always opens args.script_prefix
and args.srun_args, which breaks the --config-yaml and --run-ci local mode;
restore conditional handling so local invocations work. Fix by reverting to the
prior branching: if args.run_ci (or when --test-list is provided) call
parse_test_case_name and open script_prefix/srun_args, otherwise use
args.config_yaml and skip reading args.script_prefix/args.srun_args for local
mode; update the code paths around parse_test_case_name, config_yaml assignment,
and the blocks that open args.script_prefix and args.srun_args to only run when
CI/test-list mode is active.

🟠 Major comments (15)

tests/scripts/perf-sanity/aggregated/deepseek_r1_fp8_blackwell.yaml-140-166 (1)
140-166: ⚠️ Potential issue | 🟠 Major

Add the missing MTP1 config to the 6k1k case.

This entry is labeled as MTP1, but unlike the 1k1k and 8k1k variants it never sets speculative_config. As written, the 6k1k run will exercise a different decode path than the scenario name/comment implies, which makes the perf result hard to compare against the other MTP cases.
Proposed fix
-  # 6k1k configs - TP8 with TRTLLM, MTP1
-  - name: "r1_fp8_tp8_6k1k"
+  # 6k1k configs - TP8 with TRTLLM, MTP1
+  - name: "r1_fp8_tp8_mtp1_6k1k"
     model_name: "deepseek_r1_0528_fp8"
     tensor_parallel_size: 8
     moe_expert_parallel_size: 1
     pipeline_parallel_size: 1
     max_batch_size: 512
     max_num_tokens: 8192
     attn_backend: "TRTLLM"
     enable_attention_dp: false
     moe_config:
       backend: 'TRTLLM'
     cuda_graph_config:
       enable_padding: true
       max_batch_size: 64
     kv_cache_config:
       dtype: 'fp8'
       enable_block_reuse: false
       free_gpu_memory_fraction: 0.8
+    speculative_config:
+      decoding_type: 'MTP'
+      num_nextn_predict_layers: 1
     client_configs:
       - name: "con64_iter10_6k1k"
         concurrency: 64
         iterations: 10
         isl: 6144
         osl: 1024
         backend: "openai"
         random_range_ratio: 0.2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/scripts/perf-sanity/aggregated/deepseek_r1_fp8_blackwell.yaml` around
lines 140 - 166, The 6k1k config "r1_fp8_tp8_6k1k" is missing the MTP1
speculative_config, causing it to use a different decode path; add a
speculative_config block to the "r1_fp8_tp8_6k1k" entry (same location as the
other MTP1 variants) with the MTP1 settings (e.g., mode: "MTP1" and the same
speculative parameters used by the 1k1k/8k1k MTP1 cases) so the client_configs
(e.g., "con64_iter10_6k1k") exercises the intended MTP1 decode path.
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml-71-97 (1)
71-97: ⚠️ Potential issue | 🟠 Major

Remove YAML alias to avoid shared mutable state between generator and context worker configs.

Lines 71–97 currently share a single speculative_config mapping via YAML anchor/alias. When loaded, yaml.safe_load() materializes this as one shared Python dict object assigned to both gen and ctx. The validate_speculative_config() method in TorchLlmArgs (line 3057) mutates this object in place for MTPDecodingConfig:
self.speculative_config.max_draft_len = self.speculative_config.num_nextn_predict_layers
In disaggregated systems, the two worker roles are independent processes initialized separately. Cross-role mutation of a shared config object is a correctness hazard. Duplicate the small mapping instead.
Proposed fix
-    speculative_config: &id001
+    speculative_config:
       decoding_type: MTP
       num_nextn_predict_layers: 3
@@
-    speculative_config: *id001
+    speculative_config:
+      decoding_type: MTP
+      num_nextn_predict_layers: 3
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml`
around lines 71 - 97, The YAML uses a shared alias (&id001 / *id001) for
speculative_config causing a single Python dict to be shared between the
generator and context worker; change the ctx.speculative_config alias into an
explicit duplicate mapping (copy the keys: decoding_type: MTP and
num_nextn_predict_layers: 3) instead of referencing *id001 so yaml.safe_load
produces two independent dicts and TorchLlmArgs.validate_speculative_config
(which mutates self.speculative_config) won't mutate the other role's config.
tests/scripts/perf-sanity/disaggregated/gb300_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml-11-12 (1)
11-12: ⚠️ Potential issue | 🟠 Major

Confirm these template placeholders are resolved before this config is launched.

If this YAML is consumed as-is, <partition>, <account>, <container_mount>, <container_image>, <model_path>, and <full_path_to_work_dir> will break job submission or container startup immediately. The disaggregated submission pipeline (disaggregated/submit.py) loads this config with yaml.safe_load() and directly exports these values to the environment without substitution. Please verify how these placeholders are replaced in your CI or local submission workflow before this config is selected by tests.

Also applies to: 32-37
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf-sanity/disaggregated/gb300_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml`
around lines 11 - 12, The YAML contains unresolved template placeholders
(<partition>, <account>, <container_mount>, <container_image>, <model_path>,
<full_path_to_work_dir>) that will break job submission because
disaggregated/submit.py loads the file with yaml.safe_load() and exports values
to the environment; update this config (gb300_deepseek-..._ccb-UCX.yaml) to use
concrete values or CI/template variables that your submission workflow replaces,
or implement a runtime check in disaggregated/submit.py to validate and fail
fast if any placeholder pattern like "<...>" remains (so you can surface missing
substitutions before container startup).
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_32k4k_con256_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml-86-103 (1)
86-103: ⚠️ Potential issue | 🟠 Major

Align the ctx transceiver buffer with the configured max token budget.

Line 86 allows 32,784 ctx tokens, but Line 102 caps the UCX buffer at 32,768. A max-sized prompt can overflow this limit on the boundary case.
Suggested fix
   ctx:
     print_iter_log: true
     max_batch_size: 1
     max_num_tokens: 32784
@@
     cache_transceiver_config:
-      max_tokens_in_buffer: 32768
+      max_tokens_in_buffer: 32784
       backend: UCX
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_32k4k_con256_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml`
around lines 86 - 103, The configured max_num_tokens (32784) can exceed
cache_transceiver_config.max_tokens_in_buffer (32768) causing boundary
overflows; update the cache_transceiver_config.max_tokens_in_buffer to be at
least equal to max_num_tokens or reduce max_num_tokens to match—modify the YAML
so the fields max_num_tokens and cache_transceiver_config.max_tokens_in_buffer
are aligned (e.g., set cache_transceiver_config.max_tokens_in_buffer: 32784 or
lower max_num_tokens) to prevent overflow.
jenkins/scripts/perf/local/slurm_run.sh-5-24 (1)
5-24: ⚠️ Potential issue | 🟠 Major

ERR trap still aborts before pytest exit code is captured.

set +e on line 19 disables errexit, but the ERR trap from line 5 remains active. When eval $pytestCommand fails on line 23, it triggers the trap which explicitly calls exit $rc, aborting the script before line 24 can capture the exit code. This defeats the intended behavior stated in the comment.

Use a conditional instead to avoid triggering the trap:
Proposed fix
-# Turn off "exit on error" so the following lines always run
-set +e
-
 pytest_exit_code=0
-
-eval $pytestCommand
-pytest_exit_code=$?
+if eval "$pytestCommand"; then
+    pytest_exit_code=0
+else
+    pytest_exit_code=$?
+fi
 echo "Rank${SLURM_PROCID} Pytest finished execution with exit code $pytest_exit_code"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/perf/local/slurm_run.sh` around lines 5 - 24, The ERR trap
installed by the trap command still forces an exit when pytest fails, so capture
pytest_exit_code by temporarily disabling or restoring the ERR trap around the
eval $pytestCommand call: save the current ERR trap (from the trap '...' ERR),
disable it (trap - ERR), run eval $pytestCommand and assign pytest_exit_code=$?,
then restore the saved ERR trap; reference the existing trap, set +e, eval
$pytestCommand and pytest_exit_code to locate where to add the
save/disable/restore logic.
jenkins/scripts/perf/local/slurm_install.sh-23-35 (1)

23-35: ⚠️ Potential issue | 🟠 Major

Propagate leader failures to the waiting ranks.

Both follower branches wait forever for a lock file that is only written on success. When retry_command exhausts on the leader, the other ranks spin until the SLURM job times out.

Also applies to: 47-77
tests/test_common/http_utils.py-26-36 (1)
26-36: ⚠️ Potential issue | 🟠 Major

check_files won't run for most timeouts.

Line 27 only checks files every 300 loop iterations. Because the loop sleeps one second, callers with timeout < 300 never execute this branch, and the default timeout=300 hits it only once near expiry. That means log-detected startup failures still wait out the full timeout unless server_proc is also provided.
Suggested fix
-        if check_files and iteration % 300 == 0:
+        if check_files and (iteration == 1 or iteration % 10 == 0):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_common/http_utils.py` around lines 26 - 36, The check_files block
only runs every 300 iterations so callers with timeout < 300 never get log-based
failures; change the check frequency to use a dynamic interval based on timeout
(e.g. compute check_interval = min(300, timeout) or similar) and replace the
hard-coded 300 in the condition so check_error(check_file) runs at appropriate
intervals (and still periodically for long timeouts); update the logic around
iteration and the condition that currently reads "if check_files and iteration %
300 == 0" to use the new check_interval variable to ensure early detection for
small timeouts while keeping periodic checks for long runs.
tests/test_common/error_utils.py-52-58 (1)
52-58: ⚠️ Potential issue | 🟠 Major

report_error() never scans the file contents for keywords.

Line 53 reads the whole file into all_lines, then Line 54 iterates the already-exhausted handle f. error_lines stays empty, so keyword matches are never reported and callers only get the generic tail output.
Suggested fix
         try:
             with open(log_file, "r", errors="replace") as f:
                 all_lines = f.readlines()
-                for line_idx, line in enumerate(f, start=1):
-                    for keyword in ERROR_KEYWORDS:
-                        if keyword in line:
-                            error_lines.append((line_idx, line.strip()))
-                            break
+            for line_idx, line in enumerate(all_lines, start=1):
+                for keyword in ERROR_KEYWORDS:
+                    if keyword in line:
+                        error_lines.append((line_idx, line.strip()))
+                        break
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_common/error_utils.py` around lines 52 - 58, The loop in
report_error reads the file into all_lines then iterates the exhausted file
handle f, so error_lines never gets any matches; change the iteration to
enumerate(all_lines, start=1) (or seek the file and iterate f again) and keep
the keyword check against ERROR_KEYWORDS so report_error properly appends
matching (line_idx, line.strip()) entries to error_lines for the given log_file.
jenkins/scripts/perf/local/run_agg_local.sh-7-14 (1)
7-14: ⚠️ Potential issue | 🟠 Major

Don't commit developer-specific cluster settings.

These absolute repo/model paths, mounts, image tag, and account are tied to one user and one cluster environment, so the script is not reusable for other developers or automation. Please read them from arguments or environment variables instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/perf/local/run_agg_local.sh` around lines 7 - 14, Replace
hard-coded, developer-specific constants (REPO_ROOT, LOCAL_DIR, IMAGE, MOUNTS,
LLM_MODELS_ROOT, PARTITION, ACCOUNT, JOB_NAME) with values read from environment
variables or command-line arguments and provide sensible defaults; update the
script entrypoint to parse args (e.g., via getopts) and fall back to
${ENV_VAR:-default} for each symbol so CI and other users can override paths,
mounts, image tag, partition/account, and job name without committing local
settings.
jenkins/scripts/perf/local/run_e2e_local.sh-7-14 (1)
7-14: ⚠️ Potential issue | 🟠 Major

Don't commit developer-specific cluster settings.

These absolute repo/model paths, mounts, image tag, and account are tied to one user and one cluster environment, so the script is not reusable for other developers or automation. Please read them from arguments or environment variables instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/perf/local/run_e2e_local.sh` around lines 7 - 14, The script
hardcodes developer- and cluster-specific settings (REPO_ROOT, LOCAL_DIR, IMAGE,
MOUNTS, LLM_MODELS_ROOT, PARTITION, ACCOUNT, JOB_NAME); change these to be read
from environment variables or script arguments with sensible defaults so others
can override them. Replace the literal assignments for REPO_ROOT, LOCAL_DIR,
IMAGE, MOUNTS, LLM_MODELS_ROOT, PARTITION, ACCOUNT and JOB_NAME with parameter
parsing (e.g., using getopts or positional args) and fallbacks to ENV vars
(e.g., ${REPO_ROOT:-/default/path}) and validate required values early, emitting
a clear error and usage help if missing. Ensure any user-specific defaults are
removed and documented in the script header so CI and other developers can
supply their own values.
jenkins/scripts/perf/local/run_gen_only_local.sh-7-14 (1)
7-14: ⚠️ Potential issue | 🟠 Major

Don't commit developer-specific cluster settings.

These absolute repo/model paths, mounts, image tag, and account are tied to one user and one cluster environment, so the script is not reusable for other developers or automation. Please read them from arguments or environment variables instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/perf/local/run_gen_only_local.sh` around lines 7 - 14, The
script hardcodes developer/cluster-specific variables (REPO_ROOT, LOCAL_DIR,
IMAGE, MOUNTS, LLM_MODELS_ROOT, PARTITION, ACCOUNT, JOB_NAME); change these to
be read from environment variables or positional/flagged script arguments with
sensible defaults and a clear usage message, e.g., respect $REPO_ROOT, $IMAGE,
$MOUNTS, $LLM_MODELS_ROOT, $PARTITION, $ACCOUNT, $JOB_NAME or accept
--repo-root/--image/--mounts etc., validate required values early and exit with
an error if missing, and set LOCAL_DIR relative to REPO_ROOT when not explicitly
provided so the script is reusable across users and CI.
jenkins/scripts/perf/local/submit.py-1-7 (1)
1-7: ⚠️ Potential issue | 🟠 Major

Add the required NVIDIA Apache 2.0 file header.

This new Python source file is missing the standard TensorRT-LLM copyright/license block.

As per coding guidelines: "All TensorRT-LLM source files should contain an NVIDIA copyright header with the year of the latest meaningful modification. The header should be an Apache 2.0 license block as specified".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/perf/local/submit.py` around lines 1 - 7, Add the required
NVIDIA Apache 2.0 file header to the top of
jenkins/scripts/perf/local/submit.py: insert the standard TensorRT-LLM
copyright/license block (Apache License, Version 2.0) including the correct year
of latest meaningful modification and "NVIDIA CORPORATION" as the copyright
holder, placed above the existing shebang (#!/usr/bin/env python3) so the file
begins with the full header comment followed by the shebang and imports.
jenkins/scripts/perf/local/submit.py-326-340 (1)
326-340: ⚠️ Potential issue | 🟠 Major

Don't introduce a second parametrized test-ID format.

The local runner scripts in this PR pass aggr_upload-* / disagg_upload-* names, but this helper rewrites them to aggr-* / disagg-*. Since pytest --test-list filters by parametrized id, that drift can select the wrong case or no case at all.
Suggested fix
     if runtime_mode == "disaggregated":
         # disagg_upload-{e2e|gen_only}-{config_base}
         test_list_content = (
-            f"perf/test_perf_sanity.py::test_e2e[disagg-{benchmark_mode}-{config_file_base_name}]"
+            f"perf/test_perf_sanity.py::test_e2e[disagg_upload-{benchmark_mode}-{config_file_base_name}]"
         )
     elif benchmark_mode == "ctx_only":
         # aggr_upload-ctx_only-{config_base}
         test_list_content = (
-            f"perf/test_perf_sanity.py::test_e2e[aggr-ctx_only-{config_file_base_name}]"
+            f"perf/test_perf_sanity.py::test_e2e[aggr_upload-ctx_only-{config_file_base_name}]"
         )
     else:
         # Normal aggr: aggr-{config}-{select_pattern}
         test_list_content = (
-            f"perf/test_perf_sanity.py::test_e2e[aggr-{config_file_base_name}-{select_pattern}]"
+            f"perf/test_perf_sanity.py::test_e2e[aggr_upload-{config_file_base_name}-{select_pattern}]"
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/perf/local/submit.py` around lines 326 - 340, The code
rewrites incoming parametrized test IDs (e.g., converts
aggr_upload-/disagg_upload- to aggr-/disagg-), which breaks pytest --test-list
filtering; instead use the original benchmark_mode (and any upload_* prefixes)
when building test_list_content. Update the branches that set test_list_content
to include benchmark_mode verbatim (alongside
runtime_mode/config_file_base_name/select_pattern as appropriate) rather than
substituting fixed strings like "disagg-" or "aggr-", so the variable
test_list_content matches the exact parametrized IDs pytest expects.
jenkins/scripts/perf/local/submit.py-261-272 (1)
261-272: ⚠️ Potential issue | 🟠 Major

Keep the YAML benchmark mode in bm_config.

benchmark_mode only carries the CLI/test-list mode (e2e, gen_only, ctx_only). Returning it here drops gen_only_no_context, so the later env setup can never export TRTLLM_DISAGG_BENCHMARK_GEN_ONLY=1 for no-context runs.
Suggested fix
 def get_benchmark_config(config, benchmark_mode):
     """Get benchmark config based on mode."""
     if benchmark_mode is None:
         return {}
     benchmark = config.get("benchmark", {})
+    benchmark_runtime_mode = benchmark.get("mode", benchmark_mode)
     concurrency_str = benchmark.get("concurrency_list", "1")
     concurrency = int(concurrency_str) if isinstance(concurrency_str, str) else concurrency_str
 
     return {
-        "mode": benchmark_mode,
+        "mode": benchmark_runtime_mode,
         "concurrency": concurrency,
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/perf/local/submit.py` around lines 261 - 272, The function
get_benchmark_config currently returns the CLI/test-list mode (benchmark_mode)
and drops the YAML-specified benchmark mode; update get_benchmark_config to
preserve the YAML benchmark mode by using the value from
config["benchmark"].get("mode") (falling back to benchmark_mode if missing) so
bm_config retains the YAML mode (e.g., "gen_only_no_context"); keep the existing
concurrency handling and return the resulting dict with "mode" set to that
YAML-aware value.
jenkins/scripts/perf/disaggregated/submit.py-15-17 (1)
15-17: ⚠️ Potential issue | 🟠 Major

Use the YAML benchmark mode as the source of truth for no-context runs.

parse_test_case_name() only returns e2e or gen_only, so this condition can never see gen_only_no_context. Those configs will still keep their ctx-server count. Please derive the runtime mode from config["benchmark"]["mode"] once and reuse it here and in the later gen-only env branch.
Suggested fix
-    num_ctx_servers = (
-        0 if "gen_only_no_context" in benchmark_mode else hardware.get("num_ctx_servers")
-    )
+    benchmark_runtime_mode = config.get("benchmark", {}).get("mode", benchmark_mode)
+    num_ctx_servers = (
+        0 if benchmark_runtime_mode == "gen_only_no_context"
+        else hardware.get("num_ctx_servers")
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/perf/disaggregated/submit.py` around lines 15 - 17, The check
for "gen_only_no_context" is wrong because parse_test_case_name() only returns
"e2e" or "gen_only"; derive the runtime mode once from
config["benchmark"]["mode"] (e.g., runtime_mode = config["benchmark"]["mode"])
and use that variable instead of benchmark_mode when computing num_ctx_servers
(num_ctx_servers = 0 if runtime_mode == "gen_only_no_context" else
hardware.get("num_ctx_servers")), and also reuse runtime_mode in the later
gen-only env branch so both places use the YAML benchmark mode as the single
source of truth.

🟡 Minor comments (5)

jenkins/scripts/open_search_db.py-328-356 (1)
328-356: ⚠️ Potential issue | 🟡 Minor

Narrow this exception handling so real parsing/schema bugs do not look like "no perf data."

except Exception here will also hide unexpected payload-shape regressions and local code bugs by returning None. Catch only the request/JSON/payload errors you actually expect—requests.RequestException, ValueError, and AttributeError.
Proposed fix
-        except Exception as e:
+        except (requests.RequestException, ValueError, AttributeError) as e:
             OpenSearchDB.logger.warning(
                 f"Failed to query from {project_name}, returned error: {e}")
             return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/open_search_db.py` around lines 328 - 356, The broad except
Exception in the OpenSearchDB.queryFromOpenSearchDB call hides real
parsing/schema bugs; narrow it to only the expected request/JSON/payload errors
by catching requests.RequestException, ValueError, and AttributeError (and
import requests if not present) around the block that calls
OpenSearchDB.queryFromOpenSearchDB and processes res.json(), keep the existing
logging behavior for those specific exceptions, and remove or avoid a blanket
Exception catch so unexpected errors bubble up instead of returning None.
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k2-thinking-fp4_8k1k_con1024_ctx1_dep4_gen1_dep32_eplb416_mtp3_ccb-UCX.yaml-74-78 (1)
74-78: ⚠️ Potential issue | 🟡 Minor

Filename says mtp3 but config uses Eagle decoding.

The filename contains mtp3 which typically indicates Multi-Token Prediction with 3 layers, but the speculative config uses Eagle decoding with max_draft_len: 3. This naming inconsistency may cause confusion when filtering or selecting configs by filename patterns.

Consider updating the filename to reflect Eagle decoding (e.g., eagle3) or verify this is intentional for your naming convention.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@tests/scripts/perf-sanity/disaggregated/gb200_kimi-k2-thinking-fp4_8k1k_con1024_ctx1_dep4_gen1_dep32_eplb416_mtp3_ccb-UCX.yaml`
around lines 74 - 78, The filename indicates "mtp3" but the speculative_config
block uses Eagle decoding (decoding_type: Eagle) with max_draft_len: 3 and
eagle3_one_model: true (speculative_model: Kimi-K2-Thinking-NVFP4-Eagle3);
rename the file to use "eagle3" (or another Eagle-based token) to match the
config, or if "mtp3" is intentional, change
decoding_type/max_draft_len/eagle3_one_model to reflect MTP3 settings so the
filename and the speculative_config (specifically the decoding_type,
max_draft_len, eagle3_one_model, and speculative_model) are consistent.
tests/scripts/perf-sanity/README.md-154-158 (1)
154-158: ⚠️ Potential issue | 🟡 Minor

Fix the SBSA Jenkins config descriptions.

These labels do not match jenkins/L0_Test.groovy: SBSASlurmTestConfigs is the SBSA single-node SLURM bucket, and multiNodesSBSAConfigs contains both aggregated and disaggregated multi-node stages. The current table will send people to the wrong map when they add or debug coverage.
Suggested wording
-| `SBSASlurmTestConfigs` | Multi-node aggregated tests |
-| `multiNodesSBSAConfigs` | Multi-node disaggregated tests |
+| `SBSASlurmTestConfigs` | SBSA single-node SLURM tests |
+| `multiNodesSBSAConfigs` | SBSA multi-node SLURM tests (aggregated + disaggregated) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/scripts/perf-sanity/README.md` around lines 154 - 158, Update the
README table entries so they match jenkins/L0_Test.groovy: change the
description for SBSASlurmTestConfigs to "SBSA single-node SLURM bucket" (it is
not multi-node), and change the description for multiNodesSBSAConfigs to
"Multi-node SBSA stages (contains both aggregated and disaggregated multi-node
stages)"; locate and edit the table rows referencing the symbols
SBSASlurmTestConfigs and multiNodesSBSAConfigs to use these corrected
descriptions.
tests/integration/defs/perf/test_perf_sanity.py-940-952 (1)
940-952: ⚠️ Potential issue | 🟡 Minor

Chain the exception with from for proper traceback.

When re-raising a different exception inside an except block, use from to preserve the exception chain.
Proposed fix
             except (subprocess.CalledProcessError, FileNotFoundError, IndexError):
-                raise RuntimeError("Failed to get GPU type")
+                raise RuntimeError("Failed to get GPU type") from None
Or to preserve the original exception:
             except (subprocess.CalledProcessError, FileNotFoundError, IndexError) as e:
-                raise RuntimeError("Failed to get GPU type")
+                raise RuntimeError("Failed to get GPU type") from e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/defs/perf/test_perf_sanity.py` around lines 940 - 952, The
except block in get_gpu_type catches (subprocess.CalledProcessError,
FileNotFoundError, IndexError) but re-raises a RuntimeError without chaining the
original exception; modify the except to capture the original exception (e.g.,
except (... ) as e:) and re-raise using "raise RuntimeError('Failed to get GPU
type') from e" so the original traceback from subprocess.check_output (and
related errors) is preserved.
jenkins/scripts/perf/perf_utils.py-536-547 (1)
536-547: ⚠️ Potential issue | 🟡 Minor

Add strict=True to zip() to catch length mismatches.

The zip(x_vals, decline_vals) call on line 540 could silently truncate data if the lists have different lengths, which would produce incorrect regression calculations.
Proposed fix
-                    ss_xy = sum((x - x_mean) * (y - y_mean) for x, y in zip(x_vals, decline_vals))
+                    ss_xy = sum((x - x_mean) * (y - y_mean) for x, y in zip(x_vals, decline_vals, strict=True))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/perf/perf_utils.py` around lines 536 - 547, In the regression
block using x_vals and decline_vals, change the zip call to use strict=True
(zip(x_vals, decline_vals, strict=True)) so length mismatches raise immediately;
wrap the pairing and subsequent calculations (ss_xy, ss_xx, ss_yy, slope,
r_squared) in a try/except ValueError to handle/report mismatched lengths (e.g.,
log and skip regression) so you don't silently truncate data when computing
slope and r_squared for the gradual_decline check.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 28b1230b-8e55-4410-854d-3e592e8080b7

📥 Commits

Reviewing files that changed from the base of the PR and between 4988ba0 and 97cda5c.

📒 Files selected for processing (76)

jenkins/L0_Test.groovy
jenkins/scripts/open_search_db.py
jenkins/scripts/perf/README.md
jenkins/scripts/perf/aggregated/slurm_launch_draft.sh
jenkins/scripts/perf/disaggregated/slurm_launch_draft.sh
jenkins/scripts/perf/disaggregated/submit.py
jenkins/scripts/perf/get_pre_merge_html.py
jenkins/scripts/perf/local/README.md
jenkins/scripts/perf/local/run_agg_local.sh
jenkins/scripts/perf/local/run_e2e_local.sh
jenkins/scripts/perf/local/run_gen_only_local.sh
jenkins/scripts/perf/local/slurm_install.sh
jenkins/scripts/perf/local/slurm_run.sh
jenkins/scripts/perf/local/submit.py
jenkins/scripts/perf/perf_sanity_triage.py
jenkins/scripts/perf/perf_utils.py
tests/integration/defs/perf/open_search_db_utils.py
tests/integration/defs/perf/test_perf_sanity.py
tests/scripts/perf-sanity/README.md
tests/scripts/perf-sanity/aggregated/config_database_b200_nvl.yaml
tests/scripts/perf-sanity/aggregated/config_database_h200_sxm.yaml
tests/scripts/perf-sanity/aggregated/deepseek_r1_fp4_v2_2_nodes_grace_blackwell.yaml
tests/scripts/perf-sanity/aggregated/deepseek_r1_fp4_v2_blackwell.yaml
tests/scripts/perf-sanity/aggregated/deepseek_r1_fp4_v2_grace_blackwell.yaml
tests/scripts/perf-sanity/aggregated/deepseek_r1_fp8_blackwell.yaml
tests/scripts/perf-sanity/aggregated/deepseek_v32_fp4_blackwell.yaml
tests/scripts/perf-sanity/aggregated/deepseek_v32_fp4_grace_blackwell.yaml
tests/scripts/perf-sanity/aggregated/gb300_deepseek_r1_fp4_v2_2_nodes_grace_blackwell.yaml
tests/scripts/perf-sanity/aggregated/gpt_oss_120b_fp4_blackwell.yaml
tests/scripts/perf-sanity/aggregated/gpt_oss_120b_fp4_grace_blackwell.yaml
tests/scripts/perf-sanity/aggregated/k2_thinking_fp4_2_nodes_grace_blackwell.yaml
tests/scripts/perf-sanity/aggregated/k2_thinking_fp4_blackwell.yaml
tests/scripts/perf-sanity/aggregated/k2_thinking_fp4_grace_blackwell.yaml
tests/scripts/perf-sanity/deepseek_r1_fp4_v2_2_nodes_grace_blackwell.yaml
tests/scripts/perf-sanity/disaggregated/b200_deepseek-r1-fp4_1k1k_con1_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/b200_deepseek-r1-fp4_1k1k_con2048_ctx1_dep4_gen1_dep8_eplb0_mtp1_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/b200_deepseek-r1-fp4_1k1k_con256_ctx1_dep4_gen1_dep8_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/b200_deepseek-r1-fp4_8k1k_con1536_ctx1_dep4_gen1_dep8_eplb0_mtp1_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/b200_deepseek-r1-fp4_8k1k_con1_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/b200_deepseek-r1-fp4_8k1k_con256_ctx1_dep4_gen1_dep8_eplb0_mtp1_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_128k8k_con128_ctx1_pp8_gen1_dep16_eplb0_mtp2_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_128k8k_con1_ctx1_pp8_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_128k8k_con64_ctx1_pp8_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep8_eplb0_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con1_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_1k1k_con3072_ctx1_dep4_gen1_dep4_eplb0_mtp1_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_8k1k_con1024_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_8k1k_con1_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-r1-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb0_mtp1_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_1k1k_con1024_ctx1_dep4_gen1_dep32_eplb256_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_1k1k_con1_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_1k1k_con2048_ctx1_dep4_gen1_dep4_eplb0_mtp1_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_32k4k_con1_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_32k4k_con2048_ctx1_dep4_gen1_dep32_eplb288_mtp1_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_32k4k_con256_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_8k1k_con1024_ctx1_dep4_gen1_dep32_eplb256_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_8k1k_con1_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_8k1k_con4096_ctx1_dep4_gen1_dep32_eplb256_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_gpt-oss-120b-fp4_1k1k_con2048_ctx1_tp1_gen1_dep2_eplb0_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_gpt-oss-120b-fp4_1k1k_con512_ctx1_tp1_gen1_dep2_eplb0_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_gpt-oss-120b-fp4_1k1k_con64_ctx1_tp1_gen1_tp4_eplb0_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_gpt-oss-120b-fp4_8k1k_con128_ctx1_tp1_gen1_tp4_eplb0_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_gpt-oss-120b-fp4_8k1k_con4_ctx1_tp1_gen1_tp4_eplb0_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_gpt-oss-120b-fp4_8k1k_con512_ctx1_tp1_gen1_dep2_eplb0_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k2-thinking-fp4_1k1k_con2048_ctx1_dep4_gen1_dep32_eplb384_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k2-thinking-fp4_1k1k_con4096_ctx1_dep4_gen1_dep8_eplb0_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k2-thinking-fp4_1k1k_con4_ctx1_dep4_gen1_tep4_eplb0_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k2-thinking-fp4_8k1k_con1024_ctx1_dep4_gen1_dep32_eplb416_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k2-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_kimi-k2-thinking-fp4_8k1k_con4_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_qwen3-235b-fp4_8k1k_con1024_ctx1_tp1_gen1_dep8_eplb0_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb200_qwen3-235b-fp4_8k1k_con64_ctx1_tp1_gen1_tep4_eplb0_mtp0_ccb-UCX.yaml
tests/scripts/perf-sanity/disaggregated/gb300_deepseek-r1-fp4_1k1k_con1024_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml
tests/test_common/error_utils.py
tests/test_common/http_utils.py

💤 Files with no reviewable changes (1)

tests/scripts/perf-sanity/deepseek_r1_fp4_v2_2_nodes_grace_blackwell.yaml

coderabbitai · 2026-03-10T02:28:13Z

jenkins/L0_Test.groovy

+    def disaggMode = stageName.contains("Disagg-PerfSanity")



⚠️ Potential issue | 🔴 Critical

The disaggregated path never activates.

Line 904 looks for "Disagg-PerfSanity", but the stage keys in multiNodesSBSAConfigs are named "...PerfSanity-Disagg...". That leaves disaggMode false for the GB200 disaggregated stages, so they fall back to the aggregated sbatch launcher instead of the split-group disaggregated flow.

Suggested fix

- def disaggMode = stageName.contains("Disagg-PerfSanity") + def disaggMode = stageName.contains("PerfSanity-Disagg")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def disaggMode = stageName.contains("Disagg-PerfSanity")

def disaggMode = stageName.contains("PerfSanity-Disagg")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@jenkins/L0_Test.groovy` around lines 904 - 905, The disaggregated path never activates because disaggMode is set by checking stageName.contains("Disagg-PerfSanity"), but the keys use the pattern "PerfSanity-Disagg"; update the check that sets disaggMode (the line defining disaggMode using stageName) to match the actual naming (e.g., stageName.contains("PerfSanity-Disagg") or more robustly stageName.contains("Disagg") || stageName.contains("PerfSanity-Disagg")) so GB200 disaggregated stages trigger the split-group disaggregated flow instead of the aggregated sbatch path.

coderabbitai · 2026-03-10T02:28:13Z

jenkins/L0_Test.groovy

+    } catch (InterruptedException e) {
+        stageIsInterrupted = true
+        throw e
    } finally {
-        uploadResults(pipeline, cluster, jobUID, stageName)
+        uploadResults(pipeline, cluster, jobUID, stageName, stageIsInterrupted)
        stage("Clean Up Slurm Resource") {


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify the helper definition and all call sites. rg -n 'def uploadResults\s*\(' jenkins/L0_Test.groovy rg -n 'uploadResults\s*\(' jenkins/L0_Test.groovy

Repository: NVIDIA/TensorRT-LLM

Length of output: 335

Fix uploadResults call with incorrect argument count.

Line 1384 passes five arguments (pipeline, cluster, jobUID, stageName, stageIsInterrupted), but uploadResults at line 111 is defined with four parameters (pipeline, cluster, nodeName, stageName). This will raise MissingMethodException from the finally block at runtime, masking the original error and skipping result upload.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@jenkins/L0_Test.groovy` around lines 1380 - 1385, The call to uploadResults currently passes five args (pipeline, cluster, jobUID, stageName, stageIsInterrupted) but the defined signature is uploadResults(pipeline, cluster, nodeName, stageName); fix the call to match the signature by passing pipeline, cluster, nodeName, stageName (replace jobUID with the nodeName variable and remove stageIsInterrupted). If the intent was to propagate the interrupted flag, instead update the uploadResults method signature (uploadResults(pipeline, cluster, nodeName, stageName, stageIsInterrupted)) and update its callers/implementation accordingly.

coderabbitai · 2026-03-10T02:28:13Z

tests/scripts/perf-sanity/aggregated/deepseek_v32_fp4_blackwell.yaml

+    max_batch_size: 4
+    max_num_tokens: 12288


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

In TensorRT-LLM serving/perf-sanity configs, is max_num_tokensenforced per request or as a batch-wide scheduler cap, and what value is required to actually admit 8k input / 1k output requests whenmax_batch_size is 4 or 32?

💡 Result:

max_num_tokens is not per-request; it’s a batch-/iteration-wide scheduler budget. In each scheduled iteration it counts (a) the sum of all prompt (context) tokens of the context-phase requests plus (b) ~1 token per request that is in generation (not the full remaining output length). [1], [2]

So, to actually admit requests with ~8k input (≈8192 tokens) while allowing up to max_batch_size = N such prefills to be scheduled together, you need roughly:

max_batch_size = 4: max_num_tokens >= 4 * 8192 = 32768

max_batch_size = 32: max_num_tokens >= 32 * 8192 = 262144 [2]

(Your 1k output doesn’t require adding 1k to max_num_tokens up front; decode contributes ~1 token/request per iteration to this cap. [2] Also ensure your per-request limits like max_input_len/max_seq_len allow 8k+1k.) [2]

Fix max_num_tokens to allow actual batching of 8k input requests.

max_num_tokens is a batch-wide scheduler cap that sums all prompt tokens in a scheduled iteration. With isl: 8192 inputs, the required minimums are:

Lines 14-15 (max_batch_size: 4): max_num_tokens must be ≥ 32768 (currently 12288)

Lines 45-46 (max_batch_size: 32): max_num_tokens must be ≥ 262144 (currently 12288)

Current values prevent the batch sizes from being exercised and invalidate the perf baselines.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf-sanity/aggregated/deepseek_v32_fp4_blackwell.yaml` around lines 14 - 15, Update the batch-wide scheduler cap values so the scheduler can actually batch 8k-token inputs: for the block where max_batch_size: 4, change max_num_tokens from 12288 to at least 32768; for the block where max_batch_size: 32, change max_num_tokens from 12288 to at least 262144. Ensure you update the respective max_num_tokens keys in the same YAML sections as the existing max_batch_size entries so the perf baselines exercise the intended batch sizes.

coderabbitai · 2026-03-10T02:28:13Z

tests/scripts/perf-sanity/aggregated/k2_thinking_fp4_grace_blackwell.yaml

+    client_configs:
+      - name: "con2_iter10_8k1k"
+        concurrency: 2
+        iterations: 10
+        isl: 8192
+        osl: 1024
+        backend: "openai"
+        trust_remote_code: true
+        dataset_file: datasets/perf-ci/k2_thinking-8k1k-20480-ratio-1_for_serve.json


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail target='datasets/perf-ci/k2_thinking-8k1k-20480-ratio-1_for_serve.json' echo "Checking for ${target}..." if [ -f "${target}" ]; then echo "FOUND: ${target}" else echo "MISSING: ${target}" echo echo "Nearby perf-ci datasets:" fd . datasets/perf-ci 2>/dev/null || true exit 1 fi

Repository: NVIDIA/TensorRT-LLM

Length of output: 239

The referenced perf dataset is missing and must be cherry-picked with this config.

The file datasets/perf-ci/k2_thinking-8k1k-20480-ratio-1_for_serve.json does not exist in the repository. Both scenarios at lines 28-36 and 62-70 depend on it and will fail immediately at runtime without it.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf-sanity/aggregated/k2_thinking_fp4_grace_blackwell.yaml` around lines 28 - 36, The referenced perf dataset file datasets/perf-ci/k2_thinking-8k1k-20480-ratio-1_for_serve.json is missing and both client_configs entries (e.g., name "con2_iter10_8k1k") reference it; fix by adding/ cherry-picking the commit that introduced that JSON into datasets/perf-ci so the path exists, or replace the dataset_file value in the client_configs (the occurrences with dataset_file: datasets/perf-ci/k2_thinking-8k1k-20480-ratio-1_for_serve.json) with an existing dataset path; ensure both scenarios that reference that filename are updated consistently.

coderabbitai · 2026-03-10T02:28:14Z

...ity/disaggregated/b200_deepseek-r1-fp4_8k1k_con1_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml

+  partition: <partition>
+  account: <account>


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Find all placeholder values in YAML configuration files rg -n '<[^>]+>' tests/scripts/perf-sanity/disaggregated/b200_deepseek-r1-fp4_8k1k_con1_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml

Repository: NVIDIA/TensorRT-LLM

Length of output: 263

All placeholder values must be replaced before execution.

This SLURM configuration file contains multiple placeholder values that must be replaced with actual values:

Lines 11-12: <partition> and <account>

Lines 32-34: <container_mount>, <container_image>, and <model_path>

Line 37: <full_path_to_work_dir>

These placeholders will cause the job submission to fail at runtime.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf-sanity/disaggregated/b200_deepseek-r1-fp4_8k1k_con1_ctx1_dep4_gen1_tep8_eplb0_mtp3_ccb-UCX.yaml` around lines 11 - 12, Replace all SLURM/YAML placeholder tokens with real values: set the partition and account fields (replace <partition> and <account>), provide a real container mount path, container image and model path (replace <container_mount>, <container_image>, <model_path>), and set the working directory to the full absolute path (replace <full_path_to_work_dir>); verify these concrete values are valid on the cluster and in the container runtime so the job submission using the keys partition, account, container_mount, container_image, model_path and work_dir will not fail at runtime.

coderabbitai · 2026-03-10T02:28:14Z

...y/disaggregated/b200_deepseek-r1-fp4_8k1k_con256_ctx1_dep4_gen1_dep8_eplb0_mtp1_ccb-UCX.yaml

+  account: <account>
+  job_time: 02:00:00
+  job_name: unified-benchmark
+  extra_args: --gres=gpu:8


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Inspect the new config ==" sed -n '1,220p' tests/scripts/perf-sanity/disaggregated/b200_deepseek-r1-fp4_8k1k_con256_ctx1_dep4_gen1_dep8_eplb0_mtp1_ccb-UCX.yaml echo echo "== Find code that computes SLURM/node/GPU allocation for disaggregated runs ==" rg -n -C4 '\b(num_ctx_servers|num_gen_servers|gpus_per_node|tensor_parallel_size|pipeline_parallel_size|context_parallel_size|extra_args|gres)\b' echo echo "== Inspect the referenced SLURM script ==" fd 'disaggr_torch\.slurm$' -x sed -n '1,260p' {}

Repository: NVIDIA/TensorRT-LLM

Length of output: 50377

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Find the disaggr_torch.slurm script ===" find . -name "disaggr_torch.slurm" -type f echo echo "=== Find launcher scripts in perf-sanity ===" fd -e py . tests/scripts/perf-sanity/ | head -20 echo echo "=== Look at run_benchmark_serve.py to see how it processes hardware config ===" wc -l tests/scripts/perf-sanity/run_benchmark_serve.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 448

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Search for how num_ctx_servers and num_gen_servers are used ===" rg -n 'num_ctx_servers|num_gen_servers' tests/scripts/perf-sanity/ -A 3 -B 1 | head -100 echo echo "=== Search for SLURM node/GPU allocation computation ===" rg -n 'gres|--nodes|num_nodes|total.*gpu' tests/scripts/perf-sanity/ -B 2 -A 2 | head -80

Repository: NVIDIA/TensorRT-LLM

Length of output: 22235

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Examine the disaggr_torch.slurm script ===" head -80 ./examples/disaggregated/slurm/benchmark/disaggr_torch.slurm echo echo "=== Look for resource/node allocation logic in run_benchmark_serve.py ===" sed -n '1,150p' tests/scripts/perf-sanity/run_benchmark_serve.py | head -100

Repository: NVIDIA/TensorRT-LLM

Length of output: 7199

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Search for where YAML config is loaded and parsed ===" rg -n 'hardware.*num_ctx_servers|num_ctx_servers.*hardware|load.*yaml|yaml.*load' tests/scripts/perf-sanity/ -B 2 -A 5 | head -150 echo echo "=== Look for node allocation computation logic ===" rg -n 'num_ctx_servers|num_gen_servers' examples/disaggregated/ -B 3 -A 3 | head -100

Repository: NVIDIA/TensorRT-LLM

Length of output: 8901

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Look at submit.py to see how total_nodes is calculated ===" sed -n '180,260p' examples/disaggregated/slurm/benchmark/submit.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 3893

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Find the calculate_nodes function ===" rg -n 'def calculate_nodes' examples/disaggregated/slurm/benchmark/ -A 10 echo echo "=== Verify for b200 config what nodes would be calculated ===" python3 << 'EOF' # Simulate the calculation for b200_deepseek-r1-fp4_8k1k gpus_per_node = 8 num_ctx_servers = 1 num_gen_servers = 1 ctx_tp = 4 gen_tp = 8 # Assuming calculate_nodes is: ceil(world_size / gpus_per_node) * num_servers import math def calculate_nodes(world_size, num_servers, gpus_per_node): # Typical pattern: each server needs ceil(world_size / gpus_per_node) nodes gpus_per_server = gpus_per_node return math.ceil(world_size / gpus_per_server) * num_servers ctx_nodes = calculate_nodes(ctx_tp, num_ctx_servers, gpus_per_node) gen_nodes = calculate_nodes(gen_tp, num_gen_servers, gpus_per_node) total_nodes = ctx_nodes + gen_nodes total_gpus = total_nodes * gpus_per_node print(f"ctx_world_size={ctx_tp}, num_ctx_servers={num_ctx_servers}, gpus_per_node={gpus_per_node}") print(f" ctx_nodes = ceil({ctx_tp}/{gpus_per_node}) * {num_ctx_servers} = {ctx_nodes}") print(f"gen_world_size={gen_tp}, num_gen_servers={num_gen_servers}, gpus_per_node={gpus_per_node}") print(f" gen_nodes = ceil({gen_tp}/{gpus_per_node}) * {num_gen_servers} = {gen_nodes}") print(f"total_nodes = {ctx_nodes} + {gen_nodes} = {total_nodes}") print(f"total_gpus = {total_nodes} * {gpus_per_node} = {total_gpus}") print(f"Config specifies: --gres=gpu:8") print(f"MISMATCH: {total_gpus} GPUs needed but only 8 requested!" if total_gpus != 8 else "MATCH!") EOF

Repository: NVIDIA/TensorRT-LLM

Length of output: 1376

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "=== Verify the calculation for the other config lines mentioned ===" python3 << 'EOF' import math def calculate_nodes(world_size, num_servers, gpus_per_node): return math.ceil(world_size * num_servers / gpus_per_node) configs = [ # Lines 15, 28-30, 52-55, 80-83 (primary file) ("b200_deepseek-r1-fp4_8k1k", 8, 1, 1, 4, 8), # Lines 28-30, 52-55, 80-83 patterns from gb200 files ("gb200_deepseek-r1-fp4_1k1k", 4, 1, 1, 8, 8), ] for name, gpus_per_node, num_ctx, num_gen, ctx_tp, gen_tp in configs: ctx_nodes = calculate_nodes(ctx_tp, num_ctx, gpus_per_node) gen_nodes = calculate_nodes(gen_tp, num_gen, gpus_per_node) total_nodes = ctx_nodes + gen_nodes total_gpus = total_nodes * gpus_per_node print(f"\n{name}:") print(f" gpus_per_node={gpus_per_node}, num_ctx_servers={num_ctx}, num_gen_servers={num_gen}") print(f" ctx_tp={ctx_tp}, gen_tp={gen_tp}") print(f" → ctx_nodes={ctx_nodes}, gen_nodes={gen_nodes}, total_nodes={total_nodes}") print(f" → requires {total_gpus} GPUs total") print(f" → but config specifies --gres=gpu:{gpus_per_node}") if total_gpus > gpus_per_node: print(f" ⚠️ UNDER-PROVISIONED: needs {total_gpus} but requests {gpus_per_node}!") EOF

Repository: NVIDIA/TensorRT-LLM

Length of output: 659

Fix SLURM GPU allocation to match disaggregated topology requirements.

The launcher's calculate_nodes() function correctly determines that each disaggregated server needs multiple full nodes based on tensor parallelism, but the YAML configs hardcode insufficient --gres=gpu:* values.

For example, the b200_deepseek-r1-fp4_8k1k config with gen TP=8 and ctx TP=4 requires:

ctx_nodes = ceil(4 × 1 / 8) = 1 node

gen_nodes = ceil(8 × 1 / 8) = 1 node

Total: 2 nodes × 8 GPUs/node = 16 GPUs

But extra_args: --gres=gpu:8 allocates only 8 GPUs on a single node, causing 2x under-provisioning.

Change extra_args to reflect total GPU requirements: --gres=gpu:16 (or scale it based on the computed total_nodes = ctx_nodes + gen_nodes multiplied by gpus_per_node).

Also applies to lines 28-30, 52-55, 80-83 with the same resource-mismatch pattern across all disaggregated benchmark configurations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf-sanity/disaggregated/b200_deepseek-r1-fp4_8k1k_con256_ctx1_dep4_gen1_dep8_eplb0_mtp1_ccb-UCX.yaml` at line 15, Update the SLURM GPU allocation in the YAML configs by replacing the hardcoded extra_args: --gres=gpu:* with a value computed from the launcher's node calculation: compute ctx_nodes and gen_nodes the same way as calculate_nodes(), set total_nodes = ctx_nodes + gen_nodes, and set extra_args to --gres=gpu:{total_nodes * gpus_per_node} (e.g., for ctx TP=4 and gen TP=8 with 8 GPUs/node use --gres=gpu:16); apply this change to the occurrences at the shown blocks (lines around the current extra_args and the repeated blocks at lines ~28-30, ~52-55, ~80-83) so the YAML reflects total_nodes × gpus_per_node rather than a single-node GPU count.

coderabbitai · 2026-03-10T02:28:14Z

...saggregated/gb200_deepseek-v32-fp4_32k4k_con256_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml

+  partition: <partition>
+  account: <account>


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== Placeholder occurrences in perf-sanity configs ==" rg -n --glob '*.yaml' '<(partition|account|container_mount|container_image|model_path|full_path_to_work_dir)>' tests/scripts/perf-sanity echo echo "== Candidate substitution/materialization logic ==" rg -n 'envsubst|jinja|template|render|substitute|<partition>|<container_image>|<model_path>|<full_path_to_work_dir>' .

Repository: NVIDIA/TensorRT-LLM

Length of output: 50377

🏁 Script executed:

# Search for where these disaggregated configs are referenced/loaded rg -l "disaggregated.*\.yaml|gb200.*fp4.*con" tests/ --type py --type sh # Check for any loader/materialization scripts find . -type f $ -name "*disaggregated*" -not -path "./\.git/*" $ | head -20 # Look for scripts that might handle template rendering for these configs rg -n "partition|account|container_image|model_path" tests/scripts/perf-sanity/ --type py -A 3 -B 3 | head -100

Repository: NVIDIA/TensorRT-LLM

Length of output: 6204

🏁 Script executed:

# Search for YAML loading of disaggregated configs more directly rg -n "load.*yaml|yaml.*load" tests/ --type py -B 2 -A 2 | grep -i disaggregated -A 5 -B 5 # Look for explicit loading of the config files in the disaggregated directory rg -n "disaggregated/.*\.yaml" . --type py # Search for references to the specific file under review rg -n "gb200_deepseek-v32-fp4_32k4k_con256" . # Check if there's a template or config rendering mechanism in tests find tests/scripts/perf-sanity -type f $ -name "*.py" -o -name "*.sh" $ -exec basename {} \; | sort | uniq

Repository: NVIDIA/TensorRT-LLM

Length of output: 4051

🏁 Script executed:

# Search for references to tests/scripts/perf-sanity/disaggregated specifically rg -n "tests/scripts/perf-sanity/disaggregated" . # Look for any script in perf-sanity that might load these YAMLs find tests/scripts/perf-sanity -type f $ -name "*.py" -o -name "*.sh" $ | xargs ls -la # Check if there's a benchmark script that reads from disaggregated dir rg -n "disaggregated" tests/scripts/perf-sanity/ --type py # Look for config loading in perf-sanity scripts rg -n "yaml.load|yaml.safe_load" tests/scripts/perf-sanity/ --type py

Repository: NVIDIA/TensorRT-LLM

Length of output: 1289

🏁 Script executed:

# Check how run_benchmark_serve.py loads and processes the YAML configs sed -n '450,500p' tests/scripts/perf-sanity/run_benchmark_serve.py # Also check the jenkins submission script to see if it handles template rendering cat jenkins/scripts/perf/disaggregated/submit.py # Check if there's any environment variable substitution in the loading process rg -n "environ|getenv|substitute|envsubst|template" tests/scripts/perf-sanity/run_benchmark_serve.py -B 2 -A 2

Repository: NVIDIA/TensorRT-LLM

Length of output: 20470

🏁 Script executed:

# Read the README to understand the intent of these configs cat tests/scripts/perf-sanity/README.md | head -250 # Also check if there's any documentation about template requirements rg -n "template|placeholder|<partition>|<account>" tests/scripts/perf-sanity/README.md # Check one of the actual YAML files to see the full context head -50 tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_32k4k_con256_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml

Repository: NVIDIA/TensorRT-LLM

Length of output: 11182

🏁 Script executed:

# Check the SLURM launch draft template to see if substitution happens there head -100 jenkins/scripts/perf/disaggregated/slurm_launch_draft.sh # Also check if there's any shell variable expansion of these placeholders rg -n "container_image|<container_image>|envsubst" jenkins/scripts/perf/disaggregated/ --type sh # Look for where env_config values are actually used in submit.py sed -n '80,250p' jenkins/scripts/perf/disaggregated/submit.py

Repository: NVIDIA/TensorRT-LLM

Length of output: 8817

🏁 Script executed:

# Check if there's any documentation comment in the YAML files themselves head -10 tests/scripts/perf-sanity/disaggregated/*.yaml | grep -A 5 -B 5 "partition\|container_image" | head -50 # Check the Jenkinsfile to see if it handles template substitution grep -A 20 -B 5 "disaggregated/submit.py\|DISAGG_CONFIG" jenkins/L0_Test.groovy | head -100 # Look for any shell script that might do envsubst or sed replacement find . -name "*.sh" -type f | xargs grep -l "envsubst\|sed.*<\|placeholder" 2>/dev/null | head -10

Repository: NVIDIA/TensorRT-LLM

Length of output: 1772

Verify that a template rendering step exists before these configs are submitted to SLURM.

These YAML files contain unrendered placeholders (<partition>, <account>, <container_image>, <model_path>, <full_path_to_work_dir>) across lines 11–12 and 32–37. The config is loaded directly via yaml.safe_load() in submit.py without any substitution step, and will pass literal <...> strings to SLURM and container commands, causing immediate failures.

Confirm either:

A preprocessing step exists outside this repo that renders these templates before submit.py is invoked, or

These files require explicit concrete values and should not contain angle-bracket placeholders.

This pattern is consistent across all disaggregated config files and applies to lines 32–37 as well.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf-sanity/disaggregated/gb200_deepseek-v32-fp4_32k4k_con256_ctx1_dep4_gen1_dep32_eplb0_mtp3_ccb-UCX.yaml` around lines 11 - 12, The YAML configs contain unrendered placeholders like <partition>, <account>, <container_image>, <model_path>, and <full_path_to_work_dir> and are loaded directly via yaml.safe_load() in submit.py; update the workflow so these placeholders are rendered before submission by either (A) adding a preprocessing/render step that substitutes template variables into the files (e.g., call the existing templating utility or a simple Jinja/str.replace pass) before submit.py calls yaml.safe_load(), or (B) ensure the config files contain concrete values and remove angle-bracket placeholders; additionally, add validation inside submit.py (after yaml.safe_load()) to detect any remaining angle-bracket tokens and fail fast with a clear error referencing the offending fields to prevent passing literal "<...>" values to SLURM/container commands.

coderabbitai · 2026-03-10T02:28:14Z

...gated/gb200_kimi-k2-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml

+  script_file: disaggr_torch.slurm
+  benchmark_type: 8k1k
+slurm:
+  script_file: disaggr_torch.slurm


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check that the referenced launcher and dataset exist on this branch and are already used by perf-sanity. fd 'disaggr_torch\.slurm$' fd 'k2_thinking-8k1k-20480-ratio-1_for_serve\.json$' rg -n -C2 'disaggr_torch\.slurm|k2_thinking-8k1k-20480-ratio-1_for_serve\.json'

Repository: NVIDIA/TensorRT-LLM

Length of output: 50376

Missing dataset file will cause job failure on release/1.2.

The referenced launcher disaggr_torch.slurm exists at examples/disaggregated/slurm/benchmark/disaggr_torch.slurm, but the dataset file k2_thinking-8k1k-20480-ratio-1_for_serve.json is not present in the repository despite being referenced in the config at line 26 and in multiple other perf-sanity configs. The job will fail at runtime without this artifact.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/scripts/perf-sanity/disaggregated/gb200_kimi-k2-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml` around lines 7 - 10, The config references a missing dataset file k2_thinking-8k1k-20480-ratio-1_for_serve.json (and the disaggr_torch.slurm launcher) which will cause runtime failure; either add the dataset JSON to the repo (e.g., alongside examples/disaggregated/slurm/benchmark/disaggr_torch.slurm or a centralized datasets dir) or update gb200_kimi-k2-thinking-fp4_8k1k_con4096_ctx1_dep4_gen1_dep16_eplb384_mtp0_ccb-UCX.yaml and other perf-sanity configs to point to an existing dataset filename/path, and then confirm the launcher name disaggr_torch.slurm and dataset path are consistently referenced across all perf-sanity configs on the release/1.2 branch.

update

97cda5c

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

chenfeiz0326 requested a review from fredricz-20070104 March 10, 2026 02:11

chenfeiz0326 requested a review from a team as a code owner March 10, 2026 02:11

github-actions bot assigned chenfeiz0326 Mar 10, 2026

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

chenfeiz0326 closed this Mar 10, 2026

	def disaggMode = stageName.contains("Disagg-PerfSanity")
	def disaggMode = stageName.contains("PerfSanity-Disagg")

Conversation

chenfeiz0326 commented Mar 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chenfeiz0326 commented Mar 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 10, 2026 •

edited

Loading