Skip to content

[Feature]: branch-aware index hierarchy (fork + personal head collections) #19

@sebas5384

Description

@sebas5384

Problem or motivation

Using SocratiCode as a centralized hub of context that every team member can use so everyone has the same base source of truth of project's context.

In team settings with a centralized SocratiCode with Qdrant instance, two workflows break:

  1. Long-lived feature branches — the shared main index diverges from the branch's actual state. Re-indexing from scratch is the only option today, which is expensive (minutes for large repos).
  2. Multiple developers on the same branch — if two devs have the file watcher running against the same collection, they overwrite each other's local uncommitted changes. The index reflects whoever wrote last, not either developer's actual working state.

Proposed solution

Proposed model

A three-tier index hierarchy, where each tier is a Qdrant collection:

main                          shared, updated on merge
  └── branch/{name}           shared per branch, updated on push
        └── head/{user}       personal, diff-only, ephemeral

Branch collection — a snapshot-fork of main at branch creation time. No re-embedding, just a Qdrant collection copy:

POST /collections/{source}/snapshots
PUT  /collections/{target}/snapshots/recover   # target auto-created

codebase_update then only processes files that actually changed on the branch.

Head collection — a snapshot-fork of the branch collection, personal per developer. The file watcher writes only here, scoped to git status --porcelain (locally modified files only). Discarded and re-forked from the branch collection on each push.

Naming convention

{project_id}__branch__{branch_name}
{project_id}__branch__{branch_name}__head__{user_id}

user_id could default to git config user.email sanitized, overridable via a SOCRATICODE_USER_ID env var.

Search behavior

When a head collection exists, codebase_search does a union search:

  1. Query __head__ and __branch__ in parallel
  2. Deduplicate by file path — __head__ wins on conflict

This gives the developer accurate context for files they've touched locally, and shared branch context for everything else. As a simpler v1 alternative, explicit PROJECT_ID switching (no union search) would already solve the collision problem.

Lifecycle

branch created    → fork main       → branch collection
worktree added    → fork branch     → head collection (per developer)
git push          → CI codebase_update on branch collection
                  → head collections discarded + re-forked
PR merged         → branch collection removed
                  → main collection updated

Questions

  • Is the Qdrant snapshot API accessible in the managed Docker setup?
  • Preference on surfacing this as MCP tools, CLI subcommands, or both?
  • Does the union search approach feel right for v1, or start with explicit PROJECT_ID switching?
  • Any concern on snapshot performance at 40M+ line scale?

Alternatives considered

No response

Area

Search

Additional context

No response

Checklist

  • I have searched existing issues and this hasn't been requested before

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions