fix: resolve prefix caching crashes with MTP speculative decoding by valarLip · Pull Request #234 · ROCm/ATOM

valarLip · 2026-02-24T15:03:17Z

Fix GPU memory access fault caused by double conversion of block_tables in cached prefill path. kv_indices_generate_triton applies block_ratio internally, but was receiving already-converted block_tables (via block_tables_converted), causing indices to be multiplied by block_ratio twice (e.g. block_id256 instead of block_id16), exceeding KV cache bounds.

Key changes:

Use raw block_tables for kv_indices generation in aiter_mla prefill
Route cached prefill through paged MLA attention (supports Q≠K) instead of flash_attn_varlen_func (requires Q==K)
Track has_cached flag through AttentionMetaData for path selection
Fix block_manager: hash table leak, can_allocate cache-hit accounting, can_append for multi-token decode, O(1) free block tracking
Add CacheStats to scheduler for prefix cache hit rate monitoring
Add comprehensive block_manager tests (119 passing)

Verified: gsm8k 1319 samples, 95.83% accuracy, 0 GPU faults.

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Fix GPU memory access fault caused by double conversion of block_tables in cached prefill path. kv_indices_generate_triton applies block_ratio internally, but was receiving already-converted block_tables (via block_tables_converted), causing indices to be multiplied by block_ratio twice (e.g. block_id*256 instead of block_id*16), exceeding KV cache bounds. Key changes: - Use raw block_tables for kv_indices generation in aiter_mla prefill - Route cached prefill through paged MLA attention (supports Q≠K) instead of flash_attn_varlen_func (requires Q==K) - Track has_cached flag through AttentionMetaData for path selection - Fix block_manager: hash table leak, can_allocate cache-hit accounting, can_append for multi-token decode, O(1) free block tracking - Add CacheStats to scheduler for prefix cache hit rate monitoring - Add comprehensive block_manager tests (119 passing) Verified: gsm8k 1319 samples, 95.83% accuracy, 0 GPU faults.

github-actions · 2026-02-24T15:04:19Z

atom/model_ops/attentions/backends.py

 from atom.model_engine.scheduler import ScheduledBatch
+
+logger = logging.getLogger("atom")
 from atom.model_ops.attention_mla import MLAModules


⚠️ [ruff] <E402> _{reported by reviewdog 🐶}
Module level import not at top of file

github-actions · 2026-02-24T15:04:19Z

atom/model_ops/attentions/backends.py

+
+logger = logging.getLogger("atom")
 from atom.model_ops.attention_mla import MLAModules
 from atom.utils import CpuGpuBuffer


⚠️ [ruff] <E402> _{reported by reviewdog 🐶}
Module level import not at top of file

github-actions · 2026-02-24T15:04:19Z

atom/model_ops/attentions/backends.py

+logger = logging.getLogger("atom")
 from atom.model_ops.attention_mla import MLAModules
 from atom.utils import CpuGpuBuffer
 from atom.utils.block_convert import block_table_convert_triton


⚠️ [ruff] <E402> _{reported by reviewdog 🐶}
Module level import not at top of file

github-actions · 2026-02-24T15:04:19Z

tests/test_prefix_cache_accuracy.py

+import json
+import re


⚠️ [ruff] <F401> _{reported by reviewdog 🐶}
json imported but unused

Suggested change

import json

import re

import re

github-actions · 2026-02-24T15:04:19Z

tests/test_prefix_cache_accuracy.py

+        sys.exit(1)
+
+    model = get_model_name(base_url)
+    print(f"=== Prefix Cache Accuracy Test ===")


⚠️ [ruff] <F541> _{reported by reviewdog 🐶}
f-string without any placeholders

Suggested change

print(f"=== Prefix Cache Accuracy Test ===")

print("=== Prefix Cache Accuracy Test ===")

github-actions bot reviewed Feb 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve prefix caching crashes with MTP speculative decoding#234

fix: resolve prefix caching crashes with MTP speculative decoding#234
valarLip wants to merge 1 commit intomainfrom
ds_prefix_cache

valarLip commented Feb 24, 2026

Uh oh!

github-actions bot Feb 24, 2026

Uh oh!

github-actions bot Feb 24, 2026

Uh oh!

github-actions bot Feb 24, 2026

Uh oh!

github-actions bot Feb 24, 2026

Uh oh!

github-actions bot Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	print(f"=== Prefix Cache Accuracy Test ===")
	print("=== Prefix Cache Accuracy Test ===")

Conversation

valarLip commented Feb 24, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant