Skip to content

perf: parallel dependency resolution via metadata-first pull #28

@michael-herwig

Description

@michael-herwig

Problem

The current pull pipeline downloads the entire package (content + metadata) before discovering its dependencies. Dependencies are then pulled sequentially within each package. For packages with large content (e.g., 200MB+ toolchains) and multiple dependencies, this creates an unnecessary waterfall where dependency downloads are blocked on the parent's full content transfer.

Current flow (sequential depth-first):

  1. Download full package A (content + metadata) → object store
  2. Read metadata.json → discover dependencies B, C
  3. Download full package B → discover its dependencies
  4. Download full package C → discover its dependencies

Proposed Solution

Fetch only the metadata first (small JSON, fast), then start pulling the content and dependencies in parallel:

  1. Fetch metadata for A (fast, small layer/annotation)
  2. In parallel: download A's content + fetch metadata for B, C
  3. In parallel: download B's content, C's content + resolve their transitive deps

This turns the dependency resolution from sequential depth-first into parallel breadth-first. The PullTracker already handles dedup and cycle detection, so the coordination infrastructure exists.

Implementation sketch

  • Add a fetch_metadata_only path in the OCI client (pull just the metadata layer or annotation, skip content layers)
  • Split pull_inner into metadata resolution and content download phases
  • Use the existing JoinSet / PullTracker pattern to fan out metadata fetches, then fan out content downloads as metadata arrives

Affected code

  • crates/ocx_lib/src/package_manager/tasks/pull.rspull_inner() pipeline
  • crates/ocx_lib/src/oci/client.rs — new metadata-only fetch path
  • crates/ocx_lib/src/package_manager/tasks/pull.rsPullTracker may need content-vs-metadata distinction

Alternatives Considered

  • Prefetch all metadata in a BFS pass, then download content: simpler but adds a full round-trip phase. Still better than current sequential approach.
  • Embed dependency list in OCI annotations: avoids the extra layer pull entirely, but requires publisher cooperation and registry support for annotation queries.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions