Skip to content

Add transcript chunking for large session storage #766

@james-in-a-box

Description

@james-in-a-box

Context

The Entireio CLI implements agent-specific transcript chunking to handle arbitrarily large transcripts:

  • Claude Code: JSONL-aware splitting that preserves line boundaries
  • Gemini CLI: JSON splitting that maintains structure across chunks
  • Configurable chunk sizes for storage constraints
  • TranscriptChunker interface allows each agent to define its own splitting logic

Current State

Our checkpoint system stores full transcripts. For long-running agent sessions (especially multi-hour implementation phases), transcripts can grow very large. Git isn't optimized for large blob storage, and this could impact checkpoint branch performance over time.

Proposal

Add transcript chunking to the checkpoint system:

  • Split large transcripts into manageable chunks before storage
  • Preserve structural integrity (JSONL line boundaries)
  • Support reassembly for explain and show commands
  • Consider compression as an alternative or complement
  • Set configurable size thresholds

This is a scalability concern that becomes more relevant as agents run longer sessions and produce larger transcripts.

Reference

See entireio/cliTranscriptChunker interface in the agent package.

Authored-by: egg

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions