[https://nvbugs/5919026][fix] Pass sparse_attn_config from effective_draft_config for one-model draft KV cache by chenfeiz0326 · Pull Request #12032 · NVIDIA/TensorRT-LLM

chenfeiz0326 · 2026-03-09T10:41:34Z

Summary by CodeRabbit

Bug Fixes
- Fixed sparse attention configuration handling in multi-token prediction and one-model draft scenarios.
Tests
- Re-enabled performance tests for deepseek v32 fp4 configurations that were previously skipped.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

chenfeiz0326 · 2026-03-09T10:43:20Z

/bot run --disable-fail-fast --stage-list "DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-1,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-2,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-3,DGX_B200-8_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7"

coderabbitai · 2026-03-09T10:45:20Z

📝 Walkthrough

Walkthrough

Removes a defensive guard in sparse attention indexer preparation that previously skipped setup when kv_cache_manager lacked index_head_dim. Draft KV-cache creation now derives sparse attention config to enable proper handling in multi-token prediction scenarios. Two previously skipped performance tests are re-enabled.

Changes

Cohort / File(s)	Summary
Sparse Attention Config Handling `tensorrt_llm/_torch/attention_backend/sparse/dsa.py`, `tensorrt_llm/_torch/pyexecutor/_util.py`	Removed early-return guard in `Indexer.prepare` that skipped indexer preparation when `kv_cache_manager` was absent or lacked `index_head_dim`. Draft KV-cache creation now derives `sparse_attn_config` from draft model config instead of passing `None`, enabling sparse attention support in MTP/one-model scenarios.
Test Skip Removal `tests/integration/test_lists/waives.txt`	Removed two SKIP annotations for DeepSeek V32 FP4 Grace Blackwell performance tests, re-enabling their execution.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	PR description provided by the author is entirely blank—only the template is present with no actual content filled in.	Add a description explaining what the PR changes, why the changes are needed, and what test coverage exists. The PR objectives indicate this fixes sparse attention config handling for one-model draft KV caches in MTP scenarios; include this context in the description.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title clearly and specifically describes the main change: passing sparse_attn_config from effective_draft_config for one-model draft KV cache, addressing a specific bug identified by NVBugs ID.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

tensorrt-cicd · 2026-03-09T10:48:47Z

PR_Github #38252 [ run ] triggered by Bot. Commit: 0dc5178 Link to invocation

tensorrt-cicd · 2026-03-09T15:13:21Z

PR_Github #38252 [ run ] completed with state SUCCESS. Commit: 0dc5178
/LLM/main/L0_MergeRequest_PR pipeline #29636 (Partly Tested) completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

QiJune

LGTM

chenfeiz0326 · 2026-03-10T01:45:48Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-10T01:51:45Z

PR_Github #38345 [ run ] triggered by Bot. Commit: 0dc5178 Link to invocation

…draft_config for one-model draft KV cache In _create_one_model_draft_kv_cache_manager, the sparse_attn_config was hardcoded to None. However, for MTP with models using sparse attention (e.g., DeepSeek V3 with DSA), the draft layers share the same architecture as the target model and need the sparse_attention_config. The fix gets sparse_attn_config from effective_draft_config, which falls back to the target model's config for MTP mode. This ensures DSACacheManager is properly initialized with the required index_head_dim and other parameters. Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

chenfeiz0326 · 2026-03-10T08:02:42Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-03-10T08:05:42Z

PR_Github #38345 [ run ] completed with state SUCCESS. Commit: 0dc5178
/LLM/main/L0_MergeRequest_PR pipeline #29718 completed with status: 'FAILURE'

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

tensorrt-cicd · 2026-03-10T08:09:10Z

PR_Github #38410 [ run ] triggered by Bot. Commit: a2ec3b9 Link to invocation

chenfeiz0326 requested a review from QiJune March 9, 2026 10:41

chenfeiz0326 requested review from a team as code owners March 9, 2026 10:41

github-actions bot assigned chenfeiz0326 Mar 9, 2026

QiJune approved these changes Mar 10, 2026

View reviewed changes

chenfeiz0326 enabled auto-merge (squash) March 10, 2026 01:46

ziyixiong-nv and others added 2 commits March 10, 2026 01:00

Unwaive failed tests

a2ec3b9

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

chenfeiz0326 force-pushed the chenfeiz/fix-bug5919026 branch from 0dc5178 to a2ec3b9 Compare March 10, 2026 08:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[https://nvbugs/5919026][fix] Pass sparse_attn_config from effective_draft_config for one-model draft KV cache#12032

[https://nvbugs/5919026][fix] Pass sparse_attn_config from effective_draft_config for one-model draft KV cache#12032
chenfeiz0326 wants to merge 2 commits intoNVIDIA:mainfrom
chenfeiz0326:chenfeiz/fix-bug5919026

chenfeiz0326 commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

chenfeiz0326 commented Mar 9, 2026

Uh oh!

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

QiJune left a comment

Uh oh!

chenfeiz0326 commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

chenfeiz0326 commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

chenfeiz0326 commented Mar 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

chenfeiz0326 commented Mar 9, 2026

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

tensorrt-cicd commented Mar 9, 2026

Uh oh!

QiJune left a comment

Choose a reason for hiding this comment

Uh oh!

chenfeiz0326 commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

chenfeiz0326 commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

tensorrt-cicd commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chenfeiz0326 commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading