-
Notifications
You must be signed in to change notification settings - Fork 1
Description
ADR: LFS-Only Local Cache for OID↔Path→S3 URL Hints with Authoritative Resolution at Pre-Push
Status
Proposed
Context and Problem Statement
We need to associate Git LFS–tracked files with external storage locations (S3 URLs) while preserving correct behavior under:
- content changes
- file renames / moves
- undo / restaging workflows
- offline or low-latency local development
Key constraints and preferences:
- Git LFS defines the boundary of responsibility for external objects.
- Non-LFS files are fully managed by Git and must not participate in this mechanism.
- A server-side index (Indexd / DRS) is the authoritative source of truth for mapping content identity to storage locations.
- We do not want to consult the authoritative index on every commit.
pre-commitmust be fast, deterministic, offline-friendly, and index-based.- Correctness and enforcement should occur at
pre-push, where refs, remotes, and commit ranges are known.
"How git works" 🚧
| Hook | stdin contents | Structured? | Stable format? | Purpose |
|---|---|---|---|---|
| pre-commit | Empty | ❌ No | N/A | Validate what is staged |
| pre-push | Ref update list | ✅ Yes | ✅ Yes | Validate what is about to be pushed |
The difference is intentional:
- pre-commit is index-centric
- pre-push is ref-centric
pre-commit reconciliation (practical)
In a pre-commit hook (or your custom git add wrapper), detect renames and move metadata accordingly.
How to detect renames reliably:
Use Git’s rename detection between HEAD and index:
- git diff --cached -M --name-status
This emits lines like:
- R100 old/path.txt new/path.txt
For each R* old new:
- move /old/... → /new/...
- or update the metadata file’s internal path field
- add the moved metadata file to the index
Pros:
- deterministic
- works before commit
- doesn’t require history scanning
Cons:
you must enforce the hook/wrapper usage
Decision
1. Scope: Git LFS–Only
This design applies exclusively to Git LFS–tracked files.
- A file is in scope if and only if its staged content is a valid Git LFS pointer:
version https://git-lfs.github.com/spec/v1 oid sha256:<hex> - All non-LFS files are explicitly out of scope:
- no cache entries
- no validation
- no warnings or errors
- File size, extension, or
.gitattributespatterns alone MUST NOT be used to infer scope.
2. Identity Model
- The canonical content identity is the Git LFS OID (
sha256:<hex>), extracted from the staged pointer file. - The system MUST NOT:
- hash file contents
- compute Git blob IDs
- infer identities for non-LFS files
3. Metadata Model
The local system models three non-authoritative relationships, all maintained purely for developer workflow:
-
Path → OID
- Which LFS object is currently staged at a working-tree path.
-
OID → Path(s)
- Which paths have recently referenced a given OID.
- Supports rename, undo, and multi-path reuse.
- Paths are advisory and may be stale.
-
OID → S3 URL (hint)
- A locally cached hint for where the object may live.
- Must be validated against the authoritative server index at
pre-push.
No locally stored relationship is authoritative.
4. Crisp Rule (Normative)
Path is never authoritative; OID (sha256) is.
Paths are client-side, repo-local workflow context.
The server indexes content identity and provides access methods.
5. Local Cache Location (Non-Versioned)
All local metadata is stored under:
.git/drs/pre-commit/
This directory:
- is never committed
- is local to the working copy
- may be freely deleted and reconstructed
Recommended layout
.git/drs/pre-commit/
v1/
paths/
<encoded-path>.json
oids/
<oid>.json
tombstones/
<encoded-path>.json
state.json
Cache File Schemas
What is the encoded path
Algorithm (step-by-step):
-
Start with the repo-relative path of the file (as a UTF‑8 string).
-
Encode that path using Base64 URL‑safe encoding without padding (RawURLEncoding), producing the token.
-
Append .json to the encoded token.
-
Place the file under .git/drs/pre-commit/v1/paths/.
-
This is implemented in pathEntryFile → encodePath, which uses base64.RawURLEncoding.EncodeToString([]byte(path)), then appends .json to form the final filename.
Resulting pattern:
.git/drs/pre-commit/v1/paths/<base64url_no_padding(repo_relative_path)>.json
paths/<encoded-path>.json (Path → OID)
{
"path": "data/foo.bam",
"lfs_oid": "sha256:<hex>",
"updated_at": "2026-02-01T12:34:56Z"
}oids/<oid>.json (OID → Path(s), S3 URL hint)
{
"lfs_oid": "sha256:<hex>",
"paths": [
"data/foo.bam",
"data/archive/foo-copy.bam"
],
"s3_url": "s3://bucket/key",
"updated_at": "2026-02-01T12:34:56Z",
"content_changed": false
}Pre-Commit Responsibilities (LFS-Only, Local-Only)
The pre-commit hook operates only on the staged index and only on LFS-tracked files.
Pre-Push Responsibilities (Authoritative, Networked)
The pre-push hook is the sole enforcement point.
Mapping .git/drs/pre-commit to Server Semantics
| Local cache concept | Server-side analogue | Notes |
|---|---|---|
path → lfs_oid |
none | purely client-side workflow context |
lfs_oid → paths[] |
none | advisory, repo-local |
lfs_oid (sha256) |
Indexd hashes.sha256, DRS checksums |
canonical content identity |
lfs_oid → s3_url (hint) |
Indexd urls[], DRS access_methods[] |
server is authoritative |
| logical ID | Indexd object_id, DRS object_id |
resolved at pre-push |
Summary
This ADR establishes a strict LFS-only contract:
pre-commit maintains a local, non-authoritative cache of path↔OID↔URL hints, while pre-push resolves and enforces truth using Indexd / DRS.