Problem
The current pull pipeline downloads the entire package (content + metadata) before discovering its dependencies. Dependencies are then pulled sequentially within each package. For packages with large content (e.g., 200MB+ toolchains) and multiple dependencies, this creates an unnecessary waterfall where dependency downloads are blocked on the parent's full content transfer.
Current flow (sequential depth-first):
- Download full package A (content + metadata) → object store
- Read
metadata.json → discover dependencies B, C
- Download full package B → discover its dependencies
- Download full package C → discover its dependencies
Proposed Solution
Fetch only the metadata first (small JSON, fast), then start pulling the content and dependencies in parallel:
- Fetch metadata for A (fast, small layer/annotation)
- In parallel: download A's content + fetch metadata for B, C
- In parallel: download B's content, C's content + resolve their transitive deps
This turns the dependency resolution from sequential depth-first into parallel breadth-first. The PullTracker already handles dedup and cycle detection, so the coordination infrastructure exists.
Implementation sketch
- Add a
fetch_metadata_only path in the OCI client (pull just the metadata layer or annotation, skip content layers)
- Split
pull_inner into metadata resolution and content download phases
- Use the existing
JoinSet / PullTracker pattern to fan out metadata fetches, then fan out content downloads as metadata arrives
Affected code
crates/ocx_lib/src/package_manager/tasks/pull.rs — pull_inner() pipeline
crates/ocx_lib/src/oci/client.rs — new metadata-only fetch path
crates/ocx_lib/src/package_manager/tasks/pull.rs — PullTracker may need content-vs-metadata distinction
Alternatives Considered
- Prefetch all metadata in a BFS pass, then download content: simpler but adds a full round-trip phase. Still better than current sequential approach.
- Embed dependency list in OCI annotations: avoids the extra layer pull entirely, but requires publisher cooperation and registry support for annotation queries.
Problem
The current pull pipeline downloads the entire package (content + metadata) before discovering its dependencies. Dependencies are then pulled sequentially within each package. For packages with large content (e.g., 200MB+ toolchains) and multiple dependencies, this creates an unnecessary waterfall where dependency downloads are blocked on the parent's full content transfer.
Current flow (sequential depth-first):
metadata.json→ discover dependencies B, CProposed Solution
Fetch only the metadata first (small JSON, fast), then start pulling the content and dependencies in parallel:
This turns the dependency resolution from sequential depth-first into parallel breadth-first. The
PullTrackeralready handles dedup and cycle detection, so the coordination infrastructure exists.Implementation sketch
fetch_metadata_onlypath in the OCI client (pull just the metadata layer or annotation, skip content layers)pull_innerinto metadata resolution and content download phasesJoinSet/PullTrackerpattern to fan out metadata fetches, then fan out content downloads as metadata arrivesAffected code
crates/ocx_lib/src/package_manager/tasks/pull.rs—pull_inner()pipelinecrates/ocx_lib/src/oci/client.rs— new metadata-only fetch pathcrates/ocx_lib/src/package_manager/tasks/pull.rs—PullTrackermay need content-vs-metadata distinctionAlternatives Considered