Skip to content

Mcp zelph partial load#25

Merged
zipproth merged 1 commit intoacrion:developfrom
chboishabba:mcp-zelph-partial-load
Mar 27, 2026
Merged

Mcp zelph partial load#25
zipproth merged 1 commit intoacrion:developfrom
chboishabba:mcp-zelph-partial-load

Conversation

@chboishabba
Copy link
Copy Markdown

@chboishabba chboishabba commented Mar 26, 2026

This PR adds manifest-driven partial loading and Hugging Face shard fetch support for Zelph.

What changed

  • Adds .load-partial route selectors: route-node, route-name, and route-lang.
  • Adds .stat-file and .index-file for serialized .bin inspection and byte-offset indexing.
  • Extends hf:// resolution to use Hugging Face .../resolve/main/... URLs for datasets and spaces.
  • Adds manifest-side route selection support for chunked shard layouts.
  • Blocks destructive commands while partial-load mode is active.

Validated locally

  • Explicit chunk partial load from the v3 shard layout.
  • Routed partial load from the v3 shard layout.
  • Remote hf:// manifest + shard fetch for explicit chunks.
  • Remote hf:// manifest + route-sidecar fetch for routed selection.

Proof artifact used

  • Minimal v3 shard proof under chbwa/zelph-sharded.
  • Remote smoke on the hosted proof passed after clearing the stale local HF cache.

Notes

  • route-name requires route-lang.
  • Route selectors require manifest mode.
  • This was exercised against the 20260309 v3 proof artifact.

Performance

  • Local explicit partial load from the v3 shard layout: 0.160s
  • Remote hf:// explicit partial load: 7.948s
  • Remote hf:// routed partial load: 5.476s
  • Prior sequential fallback on the same workflow: about 21.0s to 21.6s

Takeaway

  • Explicit shard reads are materially faster than sequential fallback.
  • Remote HF fetch is now working end-to-end for both explicit and routed partial loads.

@chboishabba
Copy link
Copy Markdown
Author

Maintainer test notes and artifact pointers.

Scope of this PR

  • Runtime/read-path changes only: manifest-driven partial loading, hf:// fetch support, route selectors, and read-only guardrails while partial mode is active.
  • The separate shared-contract / IPFS bridge work is not required to review this PR, but I am linking the bounded proof artifacts below because they make the storage direction easier to test.

Core local validation used here

  • Explicit local chunk smoke from the v3 layout succeeded.
  • Routed local smoke succeeded with route-node=7009581169707405312.
  • Remote hf:// explicit partial load succeeded.
  • Remote hf:// routed partial load succeeded.

Performance observed on the proof artifact

  • Local explicit partial load from the v3 shard layout: 0.160s
  • Remote hf:// explicit partial load: 7.948s
  • Remote hf:// routed partial load: 5.476s
  • Prior sequential fallback on the same workflow: about 21.0s to 21.6s

Hosted HF proof used for the remote smokes

  • Dataset repo: chbwa/zelph-sharded
  • Explicit-chunk proof: hf://datasets/chbwa/zelph-sharded/20260309-v3/manifest.json
  • Routed proof: hf://datasets/chbwa/zelph-sharded/20260309-v3-route/manifest.json

Notes from testing

  • route-name requires route-lang.
  • Route selectors require manifest mode.
  • A stale local HF cache can mask updated manifest contents; clearing the local Zelph HF cache fixed that during testing.
  • The remote routed proof uses a deliberately minimal shard set for smoke coverage, not the full 2026 artifact.

Minimal IPFS proof artifacts, if you want them for local inspection

  • Minimal routed test pack root CID:
    • bafybeicaiitoic2lvjoqrd5eh72xvbnw2ghutc5lwrq73hwvx7g2djvjyu
  • Shared-contract companion pack root CID:
    • bafybeidux6doftlrarmmdx5jptr4vfhksydllqldz452eopwellhgjdbc4

Useful per-file IPFS CIDs from the minimal routed pack

  • manifest.json: bafybeiaisydxgisa2wty5jhfnxjsxeigocrc77oisqkndd6fuxgsefpi4e
  • artifact.route.json: bafkreib5pblpujnzcp2fnbq2lwkwapqyqvwp56iczqd2d7xel4dgtjhn5a

Simple retrieval commands

ipfs get bafybeicaiitoic2lvjoqrd5eh72xvbnw2ghutc5lwrq73hwvx7g2djvjyu
ipfs get bafybeidux6doftlrarmmdx5jptr4vfhksydllqldz452eopwellhgjdbc4

The second pack is only a bounded architecture proof: same logical shard ids projected to both HF and IPFS refs. The PR itself should be judged on the Zelph runtime path and the HF-backed partial-load behavior above.

@zipproth zipproth self-assigned this Mar 27, 2026
@zipproth zipproth changed the base branch from main to develop March 27, 2026 06:56
@zipproth zipproth merged commit 037bf6a into acrion:develop Mar 27, 2026
6 checks passed
@chboishabba
Copy link
Copy Markdown
Author

Small follow-up: the bounded proof artifacts are now mirrored across both sinks.

HF

  • Dataset proofs remain under chbwa/zelph-sharded:
    • 20260309-v3
    • 20260309-v3-route
    • minimal-proof
    • minimal-proof-v3
  • The shared-contract companion pack is now on HF bucket storage at:
    • hf://buckets/chbwa/zelph-shared-contract/20260309-shared-contract
    • files:
      • shared.contract.dual.json
      • shared.contract.dual.cbor
      • ipfs-map.json

IPFS

  • 20260309-v3: bafybeidqhcome55zqkeehi4istxmr6ts34iyes2djvwbtoi26jyh3xtsfu
  • 20260309-v3-route: bafybeicaiitoic2lvjoqrd5eh72xvbnw2ghutc5lwrq73hwvx7g2djvjyu
  • 20260309-shared-contract: bafybeidux6doftlrarmmdx5jptr4vfhksydllqldz452eopwellhgjdbc4
  • minimal-proof: bafybeiddoa52fgz4u2vwccaqz3o37vkvc2lm4rp2lwu4obaxdtlw6dpq74
  • minimal-proof-v3: bafybeihx3f257toykcbqap44qx2apw2tnbud644ysvpjtdnmqapxprwu7e

So the bounded test artifacts are now available from both HF and IPFS; the only split is that the shared-contract companion lives on HF bucket storage rather than the dataset repo because the CBOR file is rejected by normal dataset commits.

@chboishabba
Copy link
Copy Markdown
Author

Hi Stefan,

Thanks — and thanks for flagging the old PR #24 comment as well. I checked your four points against current develop before replying.

  1. You were right about the manifest "load all" path. On current develop, when no explicit selectors are given in loadFromManifest, the inner per-chunk filter still checks count(...) and skips every chunk. I rebased onto current develop, patched that guard so it only applies when the corresponding selection pointer is non-null, rebuilt locally, and pushed the follow-up on my branch here:
  • 1cf4728 Fix manifest load-all chunk selection
  • branch: chboishabba:mcp-zelph-partial-load
  1. I pulled develop and reviewed the updated .help / binaries.md. They look accurate to the intended behavior. The only mismatch I found was the load-all bug above; with that fixed, the docs read consistently.

  2. route-name being a single exact string is intentional in the current implementation. route-node accepts a comma-separated list of numeric IDs, but route-name currently parses one exact name plus route-lang.

  3. is_v3 is handled for left/right too, not only nameOfNode / nodeOfName. The left and right manifest chunk branches also use (manifest_description.is_v2 || manifest_description.is_v3) && !ref.object_path.empty().

I also rebuilt locally after rebasing and applying the fix:

  • cmake --build build-local -j2
  • result: build succeeded

Thanks again — the help/docs updates on develop made the current behavior much easier to sanity-check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants