-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Motivation
PR #253 introduces Prefill/Decode disaggregation — a critical production feature that splits inference across separate GPU instances via MORI-IO RDMA. Current test coverage is minimal:
- 1 test file (
test_kv_aggregator.py, 96 lines) covering onlyKVOutputAggregator - 0 tests for the core transfer engine (1,624 lines), proxy (372 lines), scheduler integration (344 lines), and async worker plumbing (212 lines)
This issue tracks the plan to add layered test coverage following the same strategy as the plugin-mode CI (#255).
Approach: Layered Testing by Module
L1: CPU Unit Tests (P0 — gate for merge)
Pure logic tests with mocked GPU/RDMA/ZMQ dependencies. Run on ubuntu-latest in < 5 seconds.
tests/disaggregation/
├── test_kv_aggregator.py # Enhance existing
├── test_connector_metadata.py # New
├── test_kv_connector_scheduler.py # New
├── test_proxy.py # New
├── test_transfer_utils.py # New
└── test_scheduler_kv_integration.py # New
Mock strategy: Mock aiter, mori.io, torch.distributed, zmq at sys.modules level.
test_kv_aggregator.py (enhance)
- All workers report same ID → emitted
- Partial workers → not emitted until all report
- Multiple rounds accumulate correctly
- Empty inputs don't crash
- Interleaved send/recv tracked independently
-
reset()clears pending - Counter entries deleted after emission (no leak)
-
world_size <= 0raises ValueError
test_connector_metadata.py
-
add_new_req_to_recvbuilds correct ReqMeta from kv_transfer_params -
add_new_req_to_savebuilds correct ReqMeta - Missing required params raises KeyError
- Multiple reqs don't clobber each other
-
request_id_to_transfer_idmapping passthrough
test_kv_connector_scheduler.py
-
get_num_new_matched_tokensreturns(prompt_len, True)fordo_remote_prefill - Second call returns
(0, False)—kv_async_taggedidempotent - No kv_transfer_params →
(0, False) -
update_state_after_allocconsumer: queues req, sets transfer_id mapping -
update_state_after_allocproducer: does NOT queue -
do_remote_prefillflag cleared after processing -
build_connector_metadrains pending queue into metadata -
build_connector_metaon empty queue → no crash -
request_finishedproducer output contains block_table, engine_id, host, port -
request_finishedconsumer cleans up transfer_id mapping - transfer_id ↔ request_id always bidirectionally consistent
test_proxy.py
-
_append_whole_dict_uniquededuplicates - Dedup ignores
indexfield - Transfer mode mismatch raises ValueError
- No instances → 503 response
- Round-robin cycles through instances evenly
-
_extract_ip_porton valid URL -
_extract_ip_porton invalid URL raises ValueError - Prefill request sets
max_tokens=1andstream=False
test_transfer_utils.py
-
convert_virtual_to_physical_pagesdefault 16→1 expansion - Same size → no expansion
- Custom block_size ratios
-
merge_contiguous_blocks— all contiguous → 1 merged - None contiguous → N transfers
- Partial merge
- Empty/single input
- Unsorted input → auto-sorts
-
_compute_block_transfer_offsetsMHA (5D) vs MLA (3D) -
make_zmq_pathIPv4, IPv6, no-port -
RoleManagersingleton + thread safety -
set_role/get_roleround-trip -
get_port_offsetformula:dp_rank * tp_size + tp_rank
test_scheduler_kv_integration.py
- Seq enters WAITING_FOR_REMOTE_KV state
- Finished recv moves seq to RUNNING
- Finished send triggers block cleanup
-
Nonekv_connector_output → no crash - Seqs waiting for KV excluded from scheduled batch
-
connector_meta_outputattached to ScheduledBatch
L2: CPU Integration Tests (P0)
| Test | Description |
|---|---|
| ZMQ handshake roundtrip | Listener + client threads in-process, verify metadata exchange |
| Service discovery registration | Simulate proxy ZMQ ROUTER, verify msgpack format and dedup |
| AsyncIOProcManager KV aggregation | Mock multiple worker KV outputs, verify call_func_with_aggregation |
_pop_done_transfers all-status check |
Bug: current code only checks status_list[-1]. Test with [FAIL, SUCCESS] → should NOT mark done |
| OpenAI server kv_params roundtrip | Request with kv_transfer_params → response contains output |
| Proxy prefill→decode read-mode flow | Simulate: prefill response → extract block metadata → decode request |
L3: GPU Tests (P1 — design only)
| Test | Env | Description |
|---|---|---|
register_kv_caches RDMA metadata |
1 GPU | Real KV tensors → verify RDMA metadata non-null |
| MoRIIO wrapper tensor registration | 1 GPU | CUDA tensor → packed metadata valid |
| Single-node loopback transfer | 2+ GPU | Producer → consumer RDMA read, verify data match |
| E2E proxy+prefill+decode | 8 GPU | Full 3-process inference |
| Multi-request concurrent | 8 GPU | Concurrent P/D pipeline |
Known Bugs to Cover
_pop_done_transfersonly checksstatus_list[-1]— should check ALL statusesstart_load_kvbusy-wait —while need_handshake: continueburns CPU- Proxy 600-hour timeout —
aiohttp.ClientTimeout(total=6*6000*6000)should be configurable
Estimated Effort
| Layer | Files | Test Cases | Lines (est.) |
|---|---|---|---|
| L1 | 6 | ~55 | ~800 |
| L2 | 1 | ~6 | ~300 |
| L3 | design only | ~5 | ~200 |
| Total | 7 | ~66 | ~1,300 |
CI Integration
Add to existing workflow or new atom-pd-test.yaml:
pd-unit-tests:
name: PD Disaggregation Unit Tests (CPU)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v4
with:
python-version: "3.12"
- run: pip install pytest msgpack msgspec numpy aiohttp quart
- run: pip install torch --index-url https://download.pytorch.org/whl/cpu
- run: pytest tests/disaggregation/ -v --tb=shortReference
- Design doc:
docs/plans/2026-03-04-pd-disaggregation-test-coverage-design.md - Related: PR #253, Issue #255
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels