Skip to content

Add sharded storage for checkpoint scalability #762

@james-in-a-box

Description

@james-in-a-box

Context

The Entireio CLI uses 256-bucket sharded storage for checkpoint metadata: <id[:2]>/<id[2:]>/metadata.json. This distributes files across directories to avoid performance degradation from large flat directory listings in Git.

Current State

Our checkpoint system stores data on the egg/checkpoints/v2 branch with a multi-dimensional index structure. As the number of checkpoints grows (especially with multi-agent pipelines producing many sessions per issue), the branch could accumulate significant data.

Proposal

Evaluate and potentially adopt sharded storage for checkpoint metadata:

  • Shard by first 2 characters of checkpoint ID (256 buckets)
  • Keeps directory sizes manageable for Git operations
  • Improves git ls-tree and git checkout -- path performance
  • Consider also adding checkpoint pruning/archival for old data

This is a scalability concern — not urgent but worth addressing before the checkpoint branch becomes unwieldy.

Reference

See entireio/cli — sharded path structure in checkpoint storage.

Authored-by: egg

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions