Mtp draft fix by valarLip · Pull Request #254 · ROCm/ATOM

valarLip · 2026-03-02T16:41:07Z

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Copilot

Pull request overview

This PR refines speculative decoding (MTP/EAGLE) execution and attention metadata handling, while also restructuring scheduler/model-runner output plumbing and MTP statistics reporting.

Changes:

Updates attention metadata preparation (slot mapping initialization, kv_indices generation/buffer sizing) to better support speculative decoding paths.
Refactors scheduler/model-runner output formats to use ordered req_ids + token_ids lists with O(1) req-id indexing.
Revises MTP stats logging behavior and routes stats printing through Scheduler.spec_stats.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`atom/utils/forward_context.py`	Removes unused `fake_block_tables` from `AttentionMetaData`.
`atom/spec_decode/eagle.py`	Adjusts speculative proposer position/index handling and updates attention metadata for MTP decode.
`atom/models/deepseek_mtp.py`	Disables masked embedding behavior and comments out `support_torch_compile` usage.
`atom/model_ops/sampler.py`	Introduces cached exponential tensor helper for sampling path.
`atom/model_ops/embed_head.py`	Minor import reordering.
`atom/model_ops/attentions/backends.py`	Initializes slot mapping with `-1` for scheduled tokens; copies full scheduled range to GPU.
`atom/model_ops/attentions/aiter_mla.py`	Increases kv_indices buffer sizing and generates kv_indices via Triton; various decode/prefill path adjustments.
`atom/model_ops/attentions/aiter_attention.py`	Similar kv_indices buffer sizing and generation changes for persistent attention.
`atom/model_ops/attention_mha.py`	Removes prefill-time fake block table handling.
`atom/model_engine/scheduler.py`	Changes MTP stats logging cadence; refactors `ScheduledBatchOutput` structure and adds O(1) req-id lookup.
`atom/model_engine/model_runner.py`	Batch token-id postprocessing; adapts to new `ScheduledBatchOutput` API; removes old MTP stats APIs.
`atom/model_engine/engine_core.py`	Prints MTP stats via scheduler instead of runner RPC.
`atom/model_engine/async_proc.py`	Avoids enqueueing outputs when no output address is configured.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

atom/model_engine/engine_core.py

atom/model_engine/scheduler.py

atom/model_ops/sampler.py

atom/model_engine/scheduler.py

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (2)

atom/model_engine/scheduler.py:416

num_rejected is only assigned inside if is_deferred_out or self.use_spec:, but it's used later unconditionally when computing num_tokens = seq.num_tokens - self.mtp_k - num_rejected. In the non-speculative, non-deferred path this will raise UnboundLocalError (or reuse a stale value from a previous loop iteration). Initialize num_rejected = 0 per-sequence (or compute num_tokens differently) so the non-spec path is safe.

            if self.mtp_k > 0:
                # idx already resolved above via get_idx
                seq.spec_token_ids = draft_token_ids[idx]

            if seq.num_completion_tokens == 1 and seq.first_token_time == 0.0:
                seq.first_token_time = time.time()

            num_tokens = seq.num_tokens - self.mtp_k - num_rejected
            leave_reason = None

atom/model_engine/engine_core.py:301

print_mtp_statistics() now calls the private SpecStats._log() unconditionally when spec_stats exists. _log() divides by iv_steps, which will be 0 if no decode steps have been recorded yet, causing a ZeroDivisionError. Please add a guard (e.g., if spec_stats.total_draft_tokens > 0 / total_steps > 0) or expose a safe public logging method on SpecStats that handles the empty case.

    def print_mtp_statistics(self):
        if self.scheduler.spec_stats is not None:
            self.scheduler.spec_stats._log()
        else:
            logger.info(
                "\n[MTP Stats] No MTP statistics available (MTP not enabled or no tokens processed)\n"
            )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

atom/model_ops/attentions/aiter_attention.py

atom/model_engine/scheduler.py

valarLip and others added 5 commits February 28, 2026 10:27

mtp_draft_fix

e2be8e0

remove torch compile for ds_mtp

34bb768

clean up

55ede68

reduce mtp stats interval

2c2c0f5

update

0babab8

valarLip marked this pull request as ready for review March 2, 2026 16:41

Copilot AI review requested due to automatic review settings March 2, 2026 16:41

Copilot started reviewing on behalf of valarLip March 2, 2026 16:43 View session

fix lint

54efce3

Copilot AI reviewed Mar 2, 2026

View reviewed changes

atom/model_engine/engine_core.py Show resolved Hide resolved

atom/model_engine/scheduler.py Show resolved Hide resolved

atom/model_engine/scheduler.py Show resolved Hide resolved

atom/model_ops/sampler.py Show resolved Hide resolved

atom/model_engine/scheduler.py Show resolved Hide resolved

valarLip and others added 2 commits March 3, 2026 04:01

more cleanup

c664137

fix

7a318ed

Copilot AI review requested due to automatic review settings March 3, 2026 09:07

Copilot started reviewing on behalf of valarLip March 3, 2026 09:09 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

atom/model_ops/attentions/aiter_attention.py Show resolved Hide resolved

atom/model_engine/scheduler.py Show resolved Hide resolved

valarLip merged commit 33e0aac into main Mar 3, 2026
18 checks passed

valarLip deleted the mtp_draft_fix branch March 3, 2026 11:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mtp draft fix#254

Mtp draft fix#254
valarLip merged 8 commits intomainfrom
mtp_draft_fix

valarLip commented Mar 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

valarLip commented Mar 2, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants