Skip to content

[None][fix] Fix ValueError and missing decoding statistics for MTP#12063

Open
cascade812 wants to merge 1 commit intoNVIDIA:mainfrom
cascade812:guiju/at2
Open

[None][fix] Fix ValueError and missing decoding statistics for MTP#12063
cascade812 wants to merge 1 commit intoNVIDIA:mainfrom
cascade812:guiju/at2

Conversation

@cascade812
Copy link
Collaborator

@cascade812 cascade812 commented Mar 10, 2026

Description

This PR addressed below two issues for MTP:

  1. there's a ValueError: No free slots when enable use_relaxed_acceptance_for_thinking or use_sa_spec for MTP with a max_batch_size > 1.
  2. the speculative decoding statistics is missing in the log for MTP.

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

  • Bug Fixes

    • Fixed draft length calculation logic in benchmarking reports to use fallback values when primary metrics are unavailable.
  • Refactor

    • Optimized resource allocation for speculative decoding operations to improve efficiency.

Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
@cascade812 cascade812 requested review from a team as code owners March 10, 2026 04:38
@cascade812 cascade812 requested review from nv-yilinf and zheyuf March 10, 2026 04:38
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

Two localized adjustments: (1) increased MTP slot pool size by one to accommodate a CUDA graph dummy request; (2) enhanced draft length calculation with fallback logic from num_nextn_predict_layers when max_draft_len is absent.

Changes

Cohort / File(s) Summary
MTP Slot Pool Expansion
tensorrt_llm/_torch/speculative/mtp.py
Added extra slot in MTPHiddenStatesManager for CUDA graph padding dummy request, expanding slot_pool_size from max_num_requests to max_num_requests + 1 and adjusting related tensor allocations.
Draft Length Fallback Logic
tensorrt_llm/bench/dataclasses/reporting.py
Modified get_max_draft_len to apply fallback strategy when speculative_config is a dict: prefer max_draft_len but fall back to num_nextn_predict_layers if absent.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description explains the two issues fixed but lacks critical details: no test coverage information provided, no specific tests listed to safeguard changes, and incomplete PR checklist. Add specific test cases that verify the ValueError fix and decoding statistics are now included in logs. Clearly list which tests were run or added.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main changes: fixing ValueError for MTP and restoring missing decoding statistics, which aligns with the PR objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tensorrt_llm/bench/dataclasses/reporting.py`:
- Around line 781-784: The code treats falsy 0 as "missing" and only applies the
new dict-backed fallback, causing object-based configs to skip decoding stats;
update the logic around spec_config to (1) detect if spec_config is a dict and,
if so, use "if 'max_draft_len' in spec_config: return
spec_config['max_draft_len']" (allow 0), else if "num_nextn_predict_layers" in
spec_config return that value (again allowing 0), and (2) for non-dict (object)
configs, use attribute checks like "hasattr(spec_config, 'max_draft_len') and
spec_config.max_draft_len is not None" to return the explicit attribute
(including 0) or fall back similarly to num_nextn_predict_layers; in short, stop
using boolean "or" chains and use explicit key/attribute existence and "is not
None" checks for spec_config.max_draft_len and
spec_config.num_nextn_predict_layers.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c3e187bc-6eb6-4266-854c-3fc44fdc9119

📥 Commits

Reviewing files that changed from the base of the PR and between 3139ffa and 137a2ba.

📒 Files selected for processing (2)
  • tensorrt_llm/_torch/speculative/mtp.py
  • tensorrt_llm/bench/dataclasses/reporting.py

Comment on lines +781 to 784
draft_len = (spec_config.get("max_draft_len")
or spec_config.get("num_nextn_predict_layers"))
return draft_len or 0
return spec_config.max_draft_len or 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle object configs and explicit zero values separately.

The new fallback only applies to dict-backed configs. Line 784 still reads spec_config.max_draft_len only, so object-based MTP configs can keep returning 0 and skip decoding stats. The or chain also treats an explicit max_draft_len=0 as “missing”, which changes the meaning of a configured value.

Proposed fix
         if ("speculative_config" in self.kwargs
                 and self.kwargs["speculative_config"] is not None):
             spec_config = self.kwargs["speculative_config"]
             # Handle both dict (from YAML) and object types
             if isinstance(spec_config, dict):
-                draft_len = (spec_config.get("max_draft_len")
-                             or spec_config.get("num_nextn_predict_layers"))
-                return draft_len or 0
-            return spec_config.max_draft_len or 0
+                if "max_draft_len" in spec_config:
+                    return spec_config["max_draft_len"] or 0
+                return spec_config.get("num_nextn_predict_layers") or 0
+            draft_len = getattr(spec_config, "max_draft_len", None)
+            if draft_len is not None:
+                return draft_len
+            return getattr(spec_config, "num_nextn_predict_layers", 0) or 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/bench/dataclasses/reporting.py` around lines 781 - 784, The code
treats falsy 0 as "missing" and only applies the new dict-backed fallback,
causing object-based configs to skip decoding stats; update the logic around
spec_config to (1) detect if spec_config is a dict and, if so, use "if
'max_draft_len' in spec_config: return spec_config['max_draft_len']" (allow 0),
else if "num_nextn_predict_layers" in spec_config return that value (again
allowing 0), and (2) for non-dict (object) configs, use attribute checks like
"hasattr(spec_config, 'max_draft_len') and spec_config.max_draft_len is not
None" to return the explicit attribute (including 0) or fall back similarly to
num_nextn_predict_layers; in short, stop using boolean "or" chains and use
explicit key/attribute existence and "is not None" checks for
spec_config.max_draft_len and spec_config.num_nextn_predict_layers.

Copy link
Collaborator

@zheyuf zheyuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good to me.

@cascade812 cascade812 changed the title [None][Fix] fix ValueError and missing decoding statistics for MTP [None][fix] Fix ValueError and missing decoding statistics for MTP Mar 10, 2026
@cascade812
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #38389 [ run ] triggered by Bot. Commit: 137a2ba Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants