From 26cd4167d9f75deed3132304c06332d6c45cf0d4 Mon Sep 17 00:00:00 2001 From: James Ross Date: Sun, 8 Feb 2026 13:30:43 -0800 Subject: [PATCH 01/12] =?UTF-8?q?feat:=20vault=20=E2=80=94=20GC-safe=20ref?= =?UTF-8?q?-based=20asset=20index=20+=20ROADMAP=20cleanup?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add vault subsystem backed by refs/cas/vault with init, add, list, remove, resolve, and metadata APIs. Purge completed milestones M1–M7 from ROADMAP (3,153 β†’ 1,675 lines). --- AUDITS.md | 136 -- CHANGELOG.md | 23 + GUIDE.md | 176 +- README.md | 23 +- ROADMAP.md | 1744 +++++++++-------- bin/git-cas.js | 194 +- docs/API.md | 242 ++- index.d.ts | 77 + index.js | 400 ++++ .../ContentAddressableStore.vault.test.js | 549 ++++++ 10 files changed, 2520 insertions(+), 1044 deletions(-) delete mode 100644 AUDITS.md create mode 100644 test/unit/vault/ContentAddressableStore.vault.test.js diff --git a/AUDITS.md b/AUDITS.md deleted file mode 100644 index 7570e15..0000000 --- a/AUDITS.md +++ /dev/null @@ -1,136 +0,0 @@ -# Codebase Audit: @git-stunts/cas - -**Auditor:** Senior Principal Software Auditor -**Date:** January 7, 2026 -**Target:** `@git-stunts/cas` - ---- - -## 1. QUALITY & MAINTAINABILITY ASSESSMENT (EXHAUSTIVE) - -### 1.1. Technical Debt Score (1/10) -**Justification:** -1. **Hexagonal Architecture**: Excellent use of Ports (`GitPersistencePort`, `CodecPort`) and Adapters. -2. **Plugin Architecture**: The `CasService` is highly extensible via the Codec strategy. -3. **Value Objects**: `Manifest` and `Chunk` are immutable and validated. - -### 1.2. Readability & Consistency - -* **Issue 1:** **Codec Injection Transparency** - * The `CasService` constructor takes a `codec`, but the `ContentAddressableStore` facade takes a `format` string ('json' | 'cbor') and instantiates the codec internally. This hides the ability to pass a custom `CodecPort` implementation from the facade user. -* **Mitigation Prompt 1:** - ```text - In `index.js`, update the `ContentAddressableStore` constructor to accept either a `format` string OR a `codec` instance. If a `codec` instance is provided, use it directly; otherwise, switch on `format`. - ``` - -* **Issue 2:** **Manifest Schema vs. Implementation** - * The `ManifestSchema` in `src/domain/schemas/ManifestSchema.js` defines `encryption` as optional, but `CasService` logic implies it always encrypts if a key is provided. The relationship between providing a key and the resulting manifest structure should be explicit. -* **Mitigation Prompt 2:** - ```text - In `src/domain/services/CasService.js`, add JSDoc to `storeFile` clarifying that providing `encryptionKey` will result in an encrypted manifest and chunks, and the `encryption` field in the manifest will be populated. - ``` - -* **Issue 3:** **Chunk Size Configuration** - * The chunk size is configured in the constructor, but it's not validated against a minimum/maximum reasonable size. A chunk size of `0` or `1` byte would be inefficient but technically valid by the current code. -* **Mitigation Prompt 3:** - ```text - In `src/domain/services/CasService.js`, add validation in the constructor to ensure `chunkSize` is at least 1KB (1024) to prevent performance degradation from excessive micro-chunking. - ``` - -### 1.3. Code Quality Violation - -* **Violation 1:** **Duplicated Chunking Logic** - * `CasService.storeFile` duplicates the chunking loop logic for both the encrypted and unencrypted paths. -* **Mitigation Prompt 4:** - ```text - Refactor `src/domain/services/CasService.js`. Extract the chunking and persistence loop into a private method `_chunkAndStore(streamOrBuffer)`. Use a generator or stream transformer to handle both Buffer (encrypted) and Stream (unencrypted) inputs uniformly. - ``` - ---- - -## 2. PRODUCTION READINESS & RISK ASSESSMENT (EXHAUSTIVE) - -### 2.1. Top 3 Immediate Ship-Stopping Risks - -* **Risk 1:** **Memory Consumption on Encryption** - * **Severity:** **High** - * **Location:** `src/domain/services/CasService.js` - * **Description:** The encryption path uses `readFileSync`, loading the *entire file* into memory before encrypting. For a 1GB file, this will crash the process. It does not stream encryption. -* **Mitigation Prompt 7:** - ```text - In `src/domain/services/CasService.js`, refactor `storeFile` to use `createReadStream` and a streaming cipher (`createCipheriv`) for the encryption path, rather than `readFileSync` + buffer concatenation. This is critical for large file support. - ``` - -* **Risk 2:** **Manifest Size Explosion** - * **Severity:** **Medium** - * **Location:** `src/domain/services/CasService.js` - * **Description:** For very large files, the `manifest.chunks` array grows linearly. A 10GB file with 256KB chunks results in ~40,000 chunk objects in memory. While manageable, it sets a hard limit on scalability. -* **Mitigation Prompt 8:** - ```text - (Architectural Note) No immediate code change, but document the limitation: "Current implementation stores all chunk metadata in a single manifest. Files >100GB may require a tree-based manifest structure (Merkle Tree)." Add this to `ARCHITECTURE.md` under "Scalability Limits". - ``` - -* **Risk 3:** **Weak Randomness in Tests** - * **Severity:** **Low** - * **Location:** `test/unit/domain/services/CasService.test.js` - * **Description:** Using `a.repeat(64)` for digests is weak. -* **Mitigation Prompt 9:** - ```text - In `test/unit/domain/services/CasService.test.js`, use `crypto.randomBytes(32).toString('hex')` to generate realistic SHA-256 hashes for the test chunks. - ``` - -### 2.2. Security Posture - -* **Vulnerability 1:** **Nonce Reuse (Probability)** - * **Description:** `randomBytes(12)` is standard for GCM, but ensuring it is never reused for the same key is critical. The current implementation generates a new random nonce for every file, which is safe. - * *Status: Mitigated by design.* - -* **Vulnerability 2:** **No Integrity Check on Decrypt** - * **Description:** The `decrypt` method relies on `decipher.final()` throwing if the Auth Tag is invalid. This is correct behavior for GCM, but we should ensure we catch and wrap that error into a domain `IntegrityError`. -* **Mitigation Prompt 11:** - ```text - In `src/domain/services/CasService.js`, wrap the `decipher.final()` call in a try-catch block. If it throws, re-throw a new `CasError('Decryption failed: Integrity check error', 'INTEGRITY_ERROR')`. - ``` - -### 2.3. Operational Gaps - -* **Gap 1:** **Garbage Collection**: No mechanism to identify or prune orphaned chunks (chunks not referenced by any manifest). -* **Gap 2:** **Verification**: No utility to verify the integrity of a stored file (re-hashing chunks and comparing to manifest). - ---- - -## 3. FINAL RECOMMENDATIONS & NEXT STEP - -### 3.1. Final Ship Recommendation: **NO** -**DO NOT SHIP** until **Risk 1 (Memory Consumption on Encryption)** is resolved. Loading entire files into memory for encryption is a fatal flaw for a CAS system intended for binary assets. - -### 3.2. Prioritized Action Plan - -1. **Action 1 (Critical Urgency):** **Mitigation Prompt 7** (Streaming Encryption). This is non-negotiable. -2. **Action 2 (High Urgency):** **Mitigation Prompt 11** (Integrity Error Wrapping). -3. **Action 3 (Medium Urgency):** **Mitigation Prompt 1** (Codec Injection in Facade). - ---- - -## PART II: Two-Phase Assessment - -## 0. πŸ† EXECUTIVE REPORT CARD - -| Metric | Score (1-10) | Recommendation | -| ----------------------------- | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | -| **Developer Experience (DX)** | 9 | **Best of:** The Codec Plugin architecture allows seamlessly switching between JSON and CBOR. | -| **Internal Quality (IQ)** | 5 | **Watch Out For:** The use of `readFileSync` in the encryption path cripples the system's ability to handle large files, which is its primary purpose. | -| **Overall Recommendation** | **THUMBS DOWN** | **Justification:** A Content Addressable Store that cannot handle files larger than available RAM is not production-ready. | - -## 5. STRATEGIC SYNTHESIS & ACTION PLAN - -- **5.1. Combined Health Score:** **6/10** -- **5.2. Strategic Fix:** **Implement Streaming Encryption**. This transforms the library from a "toy" to a production-grade tool. -- **5.3. Mitigation Prompt:** - ```text - Refactor `src/domain/services/CasService.js` to implement streaming encryption. - 1. Replace `readFileSync` with `createReadStream`. - 2. Use `crypto.createCipheriv` as a transform stream or pump the read stream through it. - 3. Chunk the *encrypted* output stream, not the input buffer. - 4. Ensure `storeFile` returns a Promise that resolves only when the entire stream is processed and persisted. - ``` diff --git a/CHANGELOG.md b/CHANGELOG.md index 177bc53..612a452 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added +- **Vault** β€” GC-safe ref-based storage via `refs/cas/vault`. A single Git ref pointing to a commit chain indexes all stored assets by slug. `git gc` can no longer silently discard stored data. + - `initVault()` β€” initialize the vault, optionally with passphrase-based encryption (vault-level KDF policy). + - `addToVault()` β€” add or update an entry by slug + tree OID, with `force` flag for overwrites. + - `listVault()` β€” list all entries sorted by slug. + - `removeFromVault()` β€” remove an entry by slug. + - `resolveVaultEntry()` β€” resolve a slug to its tree OID. + - `getVaultMetadata()` β€” inspect vault metadata (encryption config, version). + - Vault metadata (`.vault.json`) supports versioning and optional encryption configuration. + - CAS-safe writes with automatic retry (up to 3 attempts with exponential backoff) on concurrent update conflicts. + - Strict slug validation: rejects empty strings, `..` traversal, control characters, oversized segments. +- New CLI subcommands: `vault init`, `vault list`, `vault remove`. +- CLI `store --tree` now auto-vaults the entry (adds to vault after creating tree). +- CLI `restore` now supports `--slug` (resolve via vault) and `--oid` (direct tree OID) flags. +- CLI `--vault-passphrase` flag for vault-level encryption on `store`, `restore`, and `vault init`. +- New error codes: `INVALID_SLUG`, `VAULT_ENTRY_NOT_FOUND`, `VAULT_ENTRY_EXISTS`, `VAULT_CONFLICT`, `VAULT_METADATA_INVALID`, `VAULT_ENCRYPTION_ALREADY_CONFIGURED`. +- TypeScript declarations for `VaultEntry`, `VaultMetadata`, `VaultState` interfaces. +- 42 new unit tests for vault functionality. + +### Changed +- CLI `restore` command no longer takes a positional `` argument. Use `--oid ` or `--slug ` instead. +- Purged completed milestones (M1–M7) and their task cards from ROADMAP.md, reducing it from 3,153 to 1,675 lines. + ## [2.0.0] β€” M7 Horizon (2026-02-08) ### Added diff --git a/GUIDE.md b/GUIDE.md index 1428845..b65a9e3 100644 --- a/GUIDE.md +++ b/GUIDE.md @@ -21,10 +21,11 @@ along from first principles to full mastery. 10. [Compression](#10-compression) 11. [Passphrase Encryption (KDF)](#11-passphrase-encryption-kdf) 12. [Merkle Manifests](#12-merkle-manifests) -13. [Architecture](#13-architecture) -14. [Codec System](#14-codec-system) -15. [Error Handling](#15-error-handling) -16. [FAQ / Troubleshooting](#16-faq--troubleshooting) +13. [Vault](#13-vault) +14. [Architecture](#14-architecture) +15. [Codec System](#15-codec-system) +16. [Error Handling](#16-error-handling) +17. [FAQ / Troubleshooting](#17-faq--troubleshooting) --- @@ -960,7 +961,164 @@ console.log(manifest.chunks.length); // Full chunk list, regardless of structur --- -## 13. Architecture +## 13. Vault + +When you call `createTree({ manifest })`, the resulting tree is a loose Git +object. If nothing references it -- no commit, no tag, no ref -- `git gc` +will garbage-collect it. This can silently lose stored data. + +The vault solves this by maintaining a single Git ref (`refs/cas/vault`) +pointing to a commit chain. The commit's tree indexes all stored assets by +slug. One ref protects everything from GC, and `git log refs/cas/vault` +gives you free history of every vault operation. + +### Vault Tree Structure + +``` +refs/cas/vault β†’ commit β†’ tree + β”œβ”€β”€ 100644 blob .vault.json + β”œβ”€β”€ 040000 tree photos/vacation + β”œβ”€β”€ 040000 tree models/v3-weights +``` + +The `.vault.json` blob contains versioned metadata. Without encryption: +`{ "version": 1 }`. With encryption, it includes KDF configuration. + +### Initializing a Vault + +```js +// Plain vault (no encryption) +await cas.initVault(); + +// Vault with passphrase-based encryption +await cas.initVault({ + passphrase: 'my vault passphrase', + kdfOptions: { algorithm: 'pbkdf2' }, +}); +``` + +When initialized with a passphrase, the vault generates a salt and stores +the KDF parameters in `.vault.json`. The passphrase itself is never stored. + +### Adding Entries + +```js +// Store a file and add it to the vault +const manifest = await cas.storeFile({ + filePath: './vacation.jpg', + slug: 'photos/vacation', +}); +const treeOid = await cas.createTree({ manifest }); +await cas.addToVault({ slug: 'photos/vacation', treeOid }); +``` + +If the vault does not exist yet, `addToVault` auto-initializes it with +`{ version: 1 }` metadata (no encryption). If the slug already exists, it +throws `VAULT_ENTRY_EXISTS` unless you pass `force: true`. + +### Listing and Resolving Entries + +```js +// List all entries (sorted by slug) +const entries = await cas.listVault(); +for (const { slug, treeOid } of entries) { + console.log(`${slug}\t${treeOid}`); +} + +// Resolve a slug to its tree OID +const treeOid = await cas.resolveVaultEntry({ slug: 'photos/vacation' }); +const manifest = await cas.readManifest({ treeOid }); +``` + +### Removing Entries + +```js +const { removedTreeOid } = await cas.removeFromVault({ slug: 'photos/vacation' }); +``` + +After removing the last entry, the vault remains (with an empty tree + +`.vault.json`). The ref stays alive. + +### Vault-Level Encryption + +When a vault is initialized with a passphrase, the CLI handles key +derivation automatically: + +```bash +# Initialize an encrypted vault +git cas vault init --vault-passphrase "secret" + +# Store with vault-level encryption (key derived from vault config) +git cas store ./vacation.jpg --slug photos/vacation --tree --vault-passphrase "secret" + +# Restore using vault slug +git cas restore --slug photos/vacation --out ./restored.jpg --vault-passphrase "secret" +``` + +The vault stores the KDF policy (algorithm, salt, iterations). The actual +encryption is still per-entry AES-256-GCM via the existing `store()`/`restore()` +paths -- the vault just provides the key-derivation policy. + +### CLI Vault Commands + +```bash +# Initialize vault (optionally with encryption) +git cas vault init +git cas vault init --vault-passphrase "secret" --algorithm pbkdf2 + +# List all vault entries (tab-separated slug + tree OID) +git cas vault list + +# Remove an entry +git cas vault remove photos/vacation +``` + +### CLI Restore with Vault + +The `restore` command now uses explicit flags instead of a positional argument: + +```bash +# Restore from a vault slug +git cas restore --slug photos/vacation --out ./restored.jpg + +# Restore from a direct tree OID (existing behavior) +git cas restore --oid a1b2c3d4... --out ./restored.jpg +``` + +### GC Survival + +Because `refs/cas/vault` points to a commit whose tree references all stored +asset trees, every blob in the chain is reachable. `git gc --prune=now` will +not touch any vault data: + +```bash +git cas vault list # entries exist +git gc --prune=now # aggressive garbage collection +git cas vault list # entries still intact +``` + +### Concurrent Write Safety + +The vault uses compare-and-swap (CAS) semantics on `git update-ref`. If +another process updates the vault between your read and write, the operation +retries automatically (up to 3 times with exponential backoff). If all +retries fail, a `VAULT_CONFLICT` error is thrown. + +### Slug Validation + +Slugs are validated strictly: + +- Must be a non-empty string +- No leading/trailing `/` +- No empty segments (`a//b`), `.`, or `..` segments +- No control characters (NUL, tabs, newlines) +- Each segment <= 255 bytes, total <= 1024 bytes + +Invalid slugs throw `INVALID_SLUG`. + +--- + +## 14. Architecture `git-cas` follows a hexagonal (ports and adapters) architecture. The domain logic in `CasService` has zero direct dependencies on Node.js, Git, or any @@ -1094,7 +1252,7 @@ const cas = new ContentAddressableStore({ --- -## 14. Codec System +## 15. Codec System ### JSON Codec @@ -1183,7 +1341,7 @@ The manifest will be stored in the tree as `manifest.msgpack`. --- -## 15. Error Handling +## 16. Error Handling All errors thrown by `git-cas` are instances of `CasError`, which extends `Error` with two additional properties: @@ -1266,7 +1424,7 @@ try { --- -## 16. FAQ / Troubleshooting +## 17. FAQ / Troubleshooting ### Q: Does this work with bare repositories? @@ -1380,7 +1538,7 @@ Every Git plumbing command is wrapped in a policy from `@git-stunts/alfred`. The default policy applies a 30-second timeout and retries up to 2 times with exponential backoff (100ms, then up to 2s). This handles transient filesystem errors and lock contention gracefully. You can override the policy at -construction time (see Section 13). +construction time (see Section 14). --- diff --git a/README.md b/README.md index 2588b5d..1f4a04a 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,7 @@ We use the object database. - **Tree output** generates standard Git trees so assets snap into commits cleanly. - **Full round-trip** store, tree, and restore β€” get your bytes back, verified. - **Lifecycle management** `readManifest`, `deleteAsset`, `findOrphanedChunks` β€” inspect trees, plan deletions, audit storage. +- **Vault** GC-safe ref-based storage. One ref (`refs/cas/vault`) indexes all assets by slug. No more silent data loss from `git gc`. **Use it for:** binary assets, build artifacts, model weights, data packs, secret bundles, weird experiments, etc. @@ -87,18 +88,24 @@ const manifest2 = await cas.storeFile({ # Store a file β€” prints manifest JSON git cas store ./image.png --slug my-image -# Store and get a tree OID directly +# Store and vault the tree OID (GC-safe) git cas store ./image.png --slug my-image --tree -# Create a tree from an existing manifest -git cas tree --manifest manifest.json +# Restore from a vault slug +git cas restore --slug my-image --out ./restored.png -# Restore from a tree OID -git cas restore --out ./restored.png +# Restore from a direct tree OID +git cas restore --oid --out ./restored.png -# Encrypted round-trip (32-byte raw key file) -git cas store ./secret.bin --slug vault --key-file ./my.key --tree -git cas restore --out ./decrypted.bin --key-file ./my.key +# Vault management +git cas vault init +git cas vault list +git cas vault remove my-image + +# Encrypted vault round-trip +git cas vault init --vault-passphrase "secret" +git cas store ./secret.bin --slug vault-entry --tree --vault-passphrase "secret" +git cas restore --slug vault-entry --out ./decrypted.bin --vault-passphrase "secret" ``` ## Why not Git LFS? diff --git a/ROADMAP.md b/ROADMAP.md index 9dc8be6..8ee815b 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -9,7 +9,9 @@ This roadmap is structured as: 3. **Contracts** β€” Return/throw semantics for all public methods 4. **Version Plan** β€” Table mapping versions to milestones 5. **Milestone Dependency Graph** β€” ASCII diagram -6. **Milestones & Task Cards** β€” 7 milestones, 26 tasks (uniform task card template) +6. **Milestones & Task Cards** β€” 5 milestones, 20 tasks (uniform task card template) +7. **Feature Matrix** β€” Competitive landscape vs. Git LFS, git-annex, Restic, Age, DVC +8. **Competitive Analysis** β€” When to use git-cas and when not to, with concrete scenarios --- @@ -36,16 +38,23 @@ This roadmap is structured as: Single registry of all error codes used across the codebase. Each code is a string passed as the `code` argument to `new CasError(message, code, meta)`. -| Code | Description | Introduced By | -|------|-------------|---------------| -| `INVALID_KEY_LENGTH` | Encryption key is not exactly 32 bytes (AES-256 requirement). Error meta includes `{ expected: 32, actual: }`. | Task 1.3 | -| `INVALID_KEY_TYPE` | Encryption key is not a Buffer. | Task 1.3 | -| `INTEGRITY_ERROR` | Decryption auth-tag verification failed (wrong key, tampered ciphertext, or tampered tag), or chunk digest mismatch on restore. | Exists (decrypt); extended by Task 1.6, Task 2.1 | -| `STREAM_ERROR` | Read stream failed during `storeFile`. Partial chunks may have been written to Git ODB (unreachable; handled by `git gc`). Meta includes `{ chunksWritten: }`. | Task 2.4 | -| `MISSING_KEY` | Encryption key required to restore encrypted content but none was provided. | Task 2.1 | -| `TREE_PARSE_ERROR` | `git ls-tree` output could not be parsed into valid entries. | Task 2.2 | -| `MANIFEST_NOT_FOUND` | No manifest entry (e.g. `manifest.json` / `manifest.cbor`) found in the Git tree. | Task 4.1 | -| `GIT_ERROR` | Underlying Git plumbing command failed. Wraps the original error from the plumbing layer. | Task 2.2, Task 4.1 | +| Code | Description | Planned By | +|------|-------------|------------| +| `INVALID_KEY_LENGTH` | Encryption key is not exactly 32 bytes (AES-256 requirement). Error meta includes `{ expected: 32, actual: }`. | v1.1.0 | +| `INVALID_KEY_TYPE` | Encryption key is not a Buffer. | v1.1.0 | +| `INTEGRITY_ERROR` | Decryption auth-tag verification failed (wrong key, tampered ciphertext, or tampered tag), or chunk digest mismatch on restore. | v1.1.0 | +| `STREAM_ERROR` | Read stream failed during `storeFile`. Partial chunks may have been written to Git ODB (unreachable; handled by `git gc`). Meta includes `{ chunksWritten: }`. | v1.2.0 | +| `MISSING_KEY` | Encryption key required to restore encrypted content but none was provided. | v1.2.0 | +| `TREE_PARSE_ERROR` | `git ls-tree` output could not be parsed into valid entries. | v1.2.0 | +| `MANIFEST_NOT_FOUND` | No manifest entry (e.g. `manifest.json` / `manifest.cbor`) found in the Git tree. | v1.4.0 | +| `GIT_ERROR` | Underlying Git plumbing command failed. Wraps the original error from the plumbing layer. | v1.2.0 | +| `INVALID_CHUNKING_STRATEGY` | Manifest contains unrecognized chunking strategy (not `fixed` or `cdc`). | Task 10.3 | +| `NO_MATCHING_RECIPIENT` | No recipient entry matches the provided KEK. Caller's key is not in the recipient list. | Task 11.1 | +| `DEK_UNWRAP_FAILED` | Failed to unwrap DEK with the provided KEK. Wrong key or tampered wrappedDek. | Task 11.1 | +| `RECIPIENT_NOT_FOUND` | Recipient label not found in manifest recipient list. | Task 11.2 | +| `RECIPIENT_ALREADY_EXISTS` | Recipient label already exists in manifest. | Task 11.2 | +| `CANNOT_REMOVE_LAST_RECIPIENT` | Cannot remove the last recipient β€” at least one must remain. | Task 11.2 | +| `ROTATION_NOT_SUPPORTED` | Key rotation requires envelope encryption (DEK/KEK model). Legacy manifests must be re-stored. | Task 12.1 | --- @@ -102,7 +111,7 @@ Return and throw semantics for every public method (current and planned). - **Throws:** `CasError('MANIFEST_NOT_FOUND')` if any `treeOid` lacks a manifest (fail closed). - **Side effects:** None. Analysis only. -### `deriveKey({ passphrase, salt?, algorithm?, iterations? })` *(planned β€” Task 7.2)* +### `deriveKey({ passphrase, salt?, algorithm?, iterations? })` - **Returns:** `Promise<{ key: Buffer, salt: Buffer, params: object }>`. - **Algorithms:** `pbkdf2` (default), `scrypt` β€” both Node.js built-ins. - **Throws:** Standard Node.js crypto errors on invalid parameters. @@ -122,41 +131,65 @@ Return and throw semantics for every public method (current and planned). - **Exit 0:** Restore succeeded, prints bytes written to stdout. - **Exit 1:** Integrity error, missing manifest, or I/O error (message to stderr). +### `restoreStream({ manifest, encryptionKey?, passphrase? })` *(planned β€” Task 8.1)* +- **Returns:** `AsyncIterable` β€” verified, decrypted, decompressed chunks in index order. +- **Throws:** `CasError('INTEGRITY_ERROR')` if any chunk fails verification (iteration stops). +- **Throws:** `CasError('MISSING_KEY')` if encrypted and no key provided. +- **Memory:** O(chunkSize) β€” never buffers full file. + +### `rotateKey({ manifest, oldKey, newKey, label? })` *(planned β€” Task 12.1)* +- **Returns:** `Promise` β€” updated manifest with re-wrapped DEK and incremented `keyVersion`. +- **Throws:** `CasError('DEK_UNWRAP_FAILED')` if `oldKey` cannot unwrap the DEK. +- **Throws:** `CasError('ROTATION_NOT_SUPPORTED')` if manifest uses legacy (non-envelope) encryption. +- **Side effects:** None. Caller must persist via `createTree()`. + +### `addRecipient({ manifest, existingKey, newRecipientKey, label })` *(planned β€” Task 11.2)* +- **Returns:** `Promise` β€” updated manifest with additional recipient entry. +- **Throws:** `CasError('DEK_UNWRAP_FAILED')` if `existingKey` is wrong. +- **Throws:** `CasError('RECIPIENT_ALREADY_EXISTS')` if `label` already exists. +- **Side effects:** None. Caller must persist. + +### `removeRecipient({ manifest, label })` *(planned β€” Task 11.2)* +- **Returns:** `Promise` β€” updated manifest without the named recipient. +- **Throws:** `CasError('RECIPIENT_NOT_FOUND')` if `label` not in recipient list. +- **Throws:** `CasError('CANNOT_REMOVE_LAST_RECIPIENT')` if only 1 recipient remains. + +### CLI: `git cas verify --oid | --slug ` *(planned β€” Task 9.2)* +- **Output:** `ok` on success, `fail` on failure. +- **Exit 0:** All chunks verified. +- **Exit 1:** Verification failed or error. + +### CLI: `git cas rotate --slug --old-key-file --new-key-file ` *(planned β€” Task 12.3)* +- **Output:** New tree OID on success. +- **Exit 0:** Rotation succeeded, vault updated. +- **Exit 1:** Wrong old key, unsupported manifest, or vault error. + --- ## 4) Version Plan | Version | Milestone | Codename | Theme | Status | |--------:|-----------|----------|-------|--------| -| v1.1.0 | M1 | Bedrock | Foundation hardening | βœ… | -| v1.2.0 | M2 | Boomerang| File retrieval round trip + CLI | βœ… | -| v1.3.0 | M3 | Launchpad| CI/CD pipeline | βœ… | -| v1.4.0 | M4 | Compass | Lifecycle management | βœ… | -| v1.5.0 | M5 | Sonar | Observability | βœ… | -| v1.6.0 | M6 | Cartographer | Documentation | βœ… | -| v2.0.0 | M7 | Horizon | Advanced features | βœ… | +| v2.1.0 | M8 | Spit Shine | Review fixups | | +| v2.2.0 | M9 | Cockpit | CLI improvements | | +| v3.0.0 | M10 | Hydra | Content-defined chunking | | +| v3.1.0 | M11 | Locksmith | Multi-recipient encryption | | +| v3.2.0 | M12 | Carousel | Key rotation | | --- ## 5) Milestone Dependency Graph ```text -M1 Bedrock (v1.1.0) -β”‚ -v -M2 Boomerang (v1.2.0) ───┐ -β”‚ β”‚ -v v -M3 Launchpad (v1.3.0) M4 Compass (v1.4.0) - β”‚ - v - M5 Sonar (v1.5.0) - β”‚ - v - M6 Cartographer (v1.6.0) - β”‚ - v - M7 Horizon (v2.0.0) +M7 Horizon (v2.0.0) βœ… ──────────────────────────┐ + β”‚ β”‚ + β”œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ + v v v v +M8 Spit M9 Cockpit M10 Hydra (v3.0.0) M11 Locksmith (v3.1.0) +Shine (v2.2.0) β”‚ β”‚ +(v2.1.0) β”‚ v + v M12 Carousel (v3.2.0) + (CDC benchmarks) ``` --- @@ -167,265 +200,292 @@ M3 Launchpad (v1.3.0) M4 Compass (v1.4.0) | # | Codename | Theme | Version | Tasks | ~LoC | ~Hours | |---:|--------------|----------------------------|:-------:|------:|-------:|------:| -| M1 | Bedrock | Foundation hardening | v1.1.0 | 7 | ~475 | ~6.5h | -| M2 | Boomerang | File retrieval round trip + CLI | v1.2.0 | 6 | ~435 | ~14h | -| M3 | Launchpad | CI/CD pipeline | v1.3.0 | 2 | ~110 | ~4h | -| M4 | Compass | Lifecycle management | v1.4.0 | 3 | ~180 | ~5.5h | -| M5 | Sonar | Observability | v1.5.0 | 2 | ~210 | ~5.5h | -| M6 | Cartographer | Documentation | v1.6.0 | 3 | ~750 | ~10h | -| M7 | Horizon | Advanced features | v2.0.0 | 3 | ~450 | ~17h | -| | **Total** | | | **26**| **~2,610** | **~62.5h** | +| M8 | Spit Shine | Review fixups | v2.1.0 | 3 | ~290 | ~7h | +| M9 | Cockpit | CLI improvements | v2.2.0 | 5 | ~260 | ~7h | +| M10| Hydra | Content-defined chunking | v3.0.0 | 4 | ~690 | ~22h | +| M11| Locksmith | Multi-recipient encryption | v3.1.0 | 4 | ~580 | ~20h | +| M12| Carousel | Key rotation | v3.2.0 | 4 | ~400 | ~13h | +| | **Total** | | | **20**| **~2,220** | **~69h** | --- -# M1 β€” Bedrock (v1.1.0) βœ… -**Theme:** Close compliance gaps, harden validation, expand test coverage. No new features. +# M8 β€” Spit Shine (v2.1.0) +**Theme:** Polish and harden based on code review findings. Fix asymmetries, eliminate duplication, improve docs. No new features. --- -## Task 1.1: Add LICENSE file (Apache-2.0) +## Task 8.1: Streaming restore **User Story** -As an open-source consumer, I want an Apache-2.0 LICENSE in the repo root so I can verify licensing terms quickly. +As a developer restoring large files, I want a streaming restore path so I don't buffer the entire file in memory. **Requirements** -- R1: Add full Apache-2.0 license text at `LICENSE` in repository root. -- R2: Include copyright line: `Copyright 2026 James Ross `. -- R3: No code changes required if `package.json` already declares Apache-2.0. +- R1: Add `CasService.restoreStream({ manifest, encryptionKey, passphrase })` returning `AsyncIterable`. +- R2: Each yielded buffer is one verified, decrypted, decompressed chunk β€” ready to write. +- R3: Integrity verified per-chunk before yield (not after full reassembly). +- R4: Decompression and decryption applied per-chunk in streaming fashion. +- R5: `restoreFile()` in the facade uses `restoreStream()` internally with `createWriteStream()` instead of `writeFileSync()`. +- R6: Existing `restore()` method remains unchanged (returns `{ buffer, bytesWritten }`) for backward compat. **Acceptance Criteria** -- AC1: `LICENSE` exists in repo root and matches Apache-2.0 full text. -- AC2: Copyright line is present and correct. +- AC1: `restoreStream()` yields chunks that, when concatenated, match the original file byte-for-byte. +- AC2: Memory usage during streaming restore is O(chunkSize), not O(fileSize). +- AC3: `restoreFile()` writes via stream and does not call `writeFileSync()`. +- AC4: Encrypted + compressed files round-trip correctly via streaming restore. +- AC5: Existing `restore()` method behavior unchanged. **Scope** -- In scope: `LICENSE` file creation only. -- Out of scope: Adding license headers to source files (defer to M6). +- In scope: `restoreStream()` on CasService + facade, refactor `restoreFile()` to use streaming writes. +- Out of scope: Parallel chunk reads, resume/partial restore, streaming decrypt rearchitecture. **Est. Complexity (LoC)** -- Prod: ~200 -- Tests: ~0 -- Total: ~200 +- Prod: ~60 +- Tests: ~80 +- Total: ~140 **Est. Human Working Hours** -- ~0.25h +- ~4h **Test Plan** - Golden path: - - Verify file exists and is included in `npm pack` output. + - Store 10KB β†’ restoreStream β†’ collect β†’ byte-compare original. + - Store encrypted + compressed β†’ restoreStream β†’ collect β†’ compare. + - restoreFile writes correct file via streaming (spy confirms no writeFileSync). - Failures: - - Missing file fails CI lint step (added in M3). + - Corrupted chunk mid-stream β†’ throws INTEGRITY_ERROR, iteration stops. + - Wrong key β†’ throws INTEGRITY_ERROR on first encrypted chunk. - Edges: - - None. + - 0-byte manifest yields empty iterable. + - Single-chunk file yields exactly 1 buffer. + - Exact multiple of chunkSize yields expected count. - Fuzz/stress: - - None. + - 50 random file sizes (seeded) β€” streaming restore matches buffered restore byte-for-byte. + - Memory profiling: restoreStream on 10MB file stays under 2Γ— chunkSize peak. **Definition of Done** -- DoD1: LICENSE file added at repo root. -- DoD2: `npm pack` includes LICENSE. +- DoD1: `restoreStream()` implemented on CasService and exposed via facade. +- DoD2: `restoreFile()` refactored to use streaming writes. +- DoD3: All existing restore tests still pass. +- DoD4: New streaming tests added and green. **Blocking** -- Blocks: Task 3.2 +- Blocks: None **Blocked By** - Blocked by: None --- -## Task 1.2: Add CHANGELOG.md (Keep a Changelog) +## Task 8.2: Extract shared crypto helpers to CryptoPort base class **User Story** -As a consumer upgrading versions, I want a changelog so I can assess upgrade impact and risk. +As a maintainer, I want duplicated crypto helpers consolidated so changes to validation or metadata format are made in one place. **Requirements** -- R1: Add `CHANGELOG.md` following Keep a Changelog v1.1.0 format. -- R2: Include `[Unreleased]` section. -- R3: Retroactively add v1.0.0 entry based on git history. -- R4: Use sections: Added, Changed, Fixed, Security. +- R1: Move key validation to `CryptoPort` as concrete `_validateKey(key)`. Adapters call `super._validateKey(key)` or inherit directly. +- R2: Move `buildMeta(nonce, tag)` to `CryptoPort` as concrete `_buildMeta(nonce, tag)`. Returns `{ algorithm: 'aes-256-gcm', nonce: string, tag: string, encrypted: true }`. +- R3: Move KDF parameter defaults to `CryptoPort.deriveKey()` as a concrete method that normalizes parameters, then calls abstract `_doDeriveKey(passphrase, salt, normalizedParams)` template method. +- R4: Remove `CasService._validateKey()` β€” service delegates to `crypto._validateKey()`. +- R5: All 3 adapters use inherited helpers. No behavioral change. **Acceptance Criteria** -- AC1: `CHANGELOG.md` exists and follows the required format. -- AC2: v1.0.0 entry exists with at least one "Added" item. -- AC3: `[Unreleased]` section exists. +- AC1: `CryptoPort` has concrete `_validateKey()`, `_buildMeta()`, and `deriveKey()` methods. +- AC2: `NodeCryptoAdapter`, `BunCryptoAdapter`, `WebCryptoAdapter` no longer duplicate these methods. +- AC3: `CasService._validateKey()` is removed; key validation delegates to crypto port. +- AC4: All existing tests pass without modification (behavior unchanged). **Scope** -- In scope: Manual changelog file creation. -- Out of scope: Automated changelog tooling. +- In scope: Refactor crypto helpers into base class + remove CasService duplication. +- Out of scope: Changing validation rules, adding new key types. **Est. Complexity (LoC)** -- Prod: ~40 -- Tests: ~0 -- Total: ~40 +- Prod: ~40 (add to base, remove from 4 sites) +- Tests: ~20 (base class unit tests) +- Total: ~60 **Est. Human Working Hours** -- ~0.5h +- ~2h **Test Plan** - Golden path: - - Ensure release workflow (M3) can extract excerpt from changelog. + - All existing crypto round-trip tests pass unchanged. + - All existing KDF tests pass unchanged. - Failures: - - Missing changelog fails release gating (M3). + - Invalid key type/length still throws same CasError codes. - Edges: - - None. + - NodeCryptoAdapter strict Buffer validation still enforced (override `_validateKey` if needed). - Fuzz/stress: - - None. + - Run full existing crypto fuzz suite β€” no regressions. **Definition of Done** -- DoD1: CHANGELOG.md created and reviewed for format compliance. -- DoD2: v1.0.0 entry populated. +- DoD1: Shared helpers live on CryptoPort. +- DoD2: All duplicated code removed from adapters and CasService. +- DoD3: Full test suite green. **Blocking** -- Blocks: Task 3.2 +- Blocks: None **Blocked By** - Blocked by: None --- -## Task 1.3: Validate encryption key length (32 bytes for AES-256) +## Task 8.3: README polish and architectural decision record **User Story** -As a developer, I want invalid encryption keys rejected immediately so I don't get cryptic crypto errors later. +As a new user, I want the README to get me started quickly. As a contributor, I want to understand why vault lives in the facade. **Requirements** -- R1: `storeFile({ encryptionKey })` throws `CasError` with code `INVALID_KEY_LENGTH` if key length β‰  32 bytes. -- R2: `encrypt({ key })` enforces identical validation and error contract. -- R3: Error includes expected vs actual length (message or metadata). -- R4: Validation occurs before any I/O (no persistence calls on failure). +- R1: Add installation instructions to README.md (`npm install @git-stunts/git-cas @git-stunts/plumbing`). +- R2: Add links to GUIDE.md and API.md in README.md. +- R3: Add `docs/ADR-001-vault-in-facade.md` documenting the decision to place vault logic in `ContentAddressableStore` rather than `CasService`, including rationale, alternatives considered, and trade-offs. **Acceptance Criteria** -- AC1: 32-byte Buffer key passes for both `storeFile` and `encrypt`. -- AC2: Any non-32 length throws `CasError.code === 'INVALID_KEY_LENGTH'`. -- AC3: Error includes expected=32 and actual length. -- AC4: No persistence calls occur on validation failure. +- AC1: README contains install command. +- AC2: README links to GUIDE.md ("Getting Started") and API.md ("API Reference"). +- AC3: ADR exists and explains the vault-in-facade decision with alternatives considered. **Scope** -- In scope: Key length validation + tests. -- Out of scope: Key format rules (hex vs base64), KDF (M7). +- In scope: README edits + ADR document. +- Out of scope: Full README rewrite, new documentation pages. **Est. Complexity (LoC)** -- Prod: ~15 -- Tests: ~30 -- Total: ~45 +- Prod: ~0 +- Docs: ~90 (README edits ~30, ADR ~60) +- Total: ~90 **Est. Human Working Hours** - ~1h **Test Plan** - Golden path: - - 32-byte key accepted in both code paths. + - Verify install command is correct by running it in a fresh project. + - Verify links resolve to existing files. - Failures: - - 16-byte key throws INVALID_KEY_LENGTH. - - 64-byte key throws INVALID_KEY_LENGTH. - - 0-byte key throws INVALID_KEY_LENGTH. - - non-Buffer key throws typed error (INVALID_KEY_TYPE). + - Dead link in README β†’ fix before merge. - Edges: - - `crypto.randomBytes(32)` passes. + - None. - Fuzz/stress: - - Test lengths 0..128 (deterministic seed), assert only 32 passes. + - None (documentation). **Definition of Done** -- DoD1: Shared validation helper exists and is used by both call sites. -- DoD2: Tests cover all required cases. -- DoD3: Error contract documented in API docs stub or inline comments. +- DoD1: README updated with install instructions and doc links. +- DoD2: ADR-001 created in `docs/` directory. **Blocking** -- Blocks: Task 1.6, Task 2.1, Task 6.2, Task 7.2 +- Blocks: None **Blocked By** - Blocked by: None --- -## Task 1.4: Handle empty file edge case (0 bytes) +# M9 β€” Cockpit (v2.2.0) +**Theme:** CLI polish β€” progress feedback, structured output, better errors, and new commands. Make the terminal experience match the API's capability. + +--- + +## Task 9.1: CLI progress feedback **User Story** -As a developer, I want storing a zero-byte file to produce a valid manifest so that empty assets are supported. +As a CLI user storing or restoring large files, I want visible progress so I know the operation is working and not hung. **Requirements** -- R1: `storeFile()` on 0-byte file returns Manifest with `size: 0` and `chunks: []`. -- R2: No chunk blob writes occur for empty content. -- R3: Works with and without encryption option enabled. +- R1: Wire `CasService` events (`chunk:stored`, `chunk:restored`, `file:stored`, `file:restored`) to CLI output. +- R2: Display a progress counter during store/restore: `Storing chunk 5/12…` or similar. +- R3: Progress output goes to stderr (stdout reserved for structured output). +- R4: Progress suppressed when stdout is not a TTY (piped mode) or when `--quiet` is passed. +- R5: Add `--quiet` global flag to suppress progress output. **Acceptance Criteria** -- AC1: Manifest returned has `size=0` and `chunks.length=0`. -- AC2: Persistence `writeBlob` is not called for chunk content. -- AC3: Behavior is identical with encryption enabled (manifest may include encryption metadata; chunks remain empty). +- AC1: `git cas store` shows per-chunk progress on stderr in TTY mode. +- AC2: `git cas restore` shows per-chunk progress on stderr in TTY mode. +- AC3: Piped mode (`git cas store … | jq`) shows no progress. +- AC4: `--quiet` suppresses all progress output. **Scope** -- In scope: Ensure empty-file store is correct + tests. -- Out of scope: Empty directory handling. +- In scope: Progress display for store and restore. +- Out of scope: Progress bars with ETA, spinners, color output, verbose debug logging. **Est. Complexity (LoC)** -- Prod: ~5 -- Tests: ~25 -- Total: ~30 +- Prod: ~50 +- Tests: ~20 +- Total: ~70 **Est. Human Working Hours** -- ~0.75h +- ~2h **Test Plan** - Golden path: - - Store 0-byte file β†’ manifest size 0, chunks []. - - Store 0-byte file with encryption option β†’ manifest valid, chunks []. + - Store 3-chunk file in TTY mode β†’ stderr shows 3 progress messages. + - Restore β†’ stderr shows 3 progress messages. - Failures: - - Nonexistent input path (covered more fully in Task 1.7). + - None expected (progress is best-effort, non-blocking). - Edges: - - Ensure no chunk writes happen (spy/mock). + - 0-chunk file (empty) β†’ no progress messages. + - 1-chunk file β†’ exactly 1 progress message. + - Non-TTY mode β†’ no progress on stderr. + - `--quiet` β†’ no progress on stderr. - Fuzz/stress: - - Run repeated empty-file stores (e.g., 100) to ensure no state leakage. + - None (thin display layer). **Definition of Done** -- DoD1: Unit tests added confirming behavior and persistence call counts. -- DoD2: No regression in non-empty file paths. +- DoD1: Progress feedback visible in CLI during store and restore. +- DoD2: `--quiet` flag implemented and functional. +- DoD3: Non-TTY detection works correctly. **Blocking** -- Blocks: Task 2.1 +- Blocks: None **Blocked By** - Blocked by: None --- -## Task 1.5: Use realistic deterministic test digests +## Task 9.2: CLI `verify` command **User Story** -As a maintainer, I want tests to use realistic SHA-256 digests so digest-length and format bugs can't hide. +As an operator, I want to verify stored asset integrity from the command line without restoring the file. **Requirements** -- R1: Replace placeholder digests (e.g., `'a'.repeat(64)`) with deterministic realistic digests. -- R2: Add helper `digestOf(seed: string): string` that returns `sha256(seed).hex`. -- R3: Ensure tests remain deterministic (no random digests). +- R1: Add `git cas verify` subcommand. +- R2: Accept `--oid ` or `--slug ` (exactly one required, same mutual-exclusion validation as `restore`). +- R3: Read manifest from tree, call `verifyIntegrity(manifest)`. +- R4: Print `ok` and exit 0 on success. Print `fail` with details and exit 1 on failure. +- R5: Supports `--cwd` and `--json` (if Task 9.3 is complete) flags. **Acceptance Criteria** -- AC1: No remaining `'a'.repeat(64)` / `'b'.repeat(64)` patterns in tests. -- AC2: Tests pass consistently across repeated runs. -- AC3: Digests produced are exactly 64 hex chars. +- AC1: Valid asset β†’ prints `ok`, exits 0. +- AC2: Corrupted asset β†’ prints `fail`, exits 1. +- AC3: Nonexistent OID/slug β†’ prints error, exits 1. **Scope** -- In scope: Test data improvements only. -- Out of scope: Large test refactors. +- In scope: `verify` subcommand wired to existing `verifyIntegrity()`. +- Out of scope: Repair, per-chunk corruption report, re-verification against original file. **Est. Complexity (LoC)** -- Prod: ~0 +- Prod: ~25 - Tests: ~15 -- Total: ~15 +- Total: ~40 **Est. Human Working Hours** -- ~0.5h +- ~1h **Test Plan** - Golden path: - - All unit tests pass using deterministic digests. + - Store file, verify via CLI β†’ exit 0. - Failures: - - Helper returns wrong length β†’ tests should fail schema validation. + - Verify with bad OID β†’ exit 1. + - Verify with both --slug and --oid β†’ exit 1 (mutual exclusion). + - Neither --slug nor --oid β†’ exit 1. - Edges: - - Multiple seeds yield distinct digests. + - 0-chunk manifest verifies successfully (vacuously true). - Fuzz/stress: - - Generate 100 digests from different seeds and validate length/hex format. + - None (thin wrapper over tested API). **Definition of Done** -- DoD1: Digest helper added and used across tests. -- DoD2: All tests deterministic and green. +- DoD1: `verify` subcommand added and functional. +- DoD2: Unit tests cover pass and fail paths. **Blocking** - Blocks: None @@ -435,112 +495,111 @@ As a maintainer, I want tests to use realistic SHA-256 digests so digest-length --- -## Task 1.6: Add encryption round-trip unit tests (encrypt/decrypt) +## Task 9.3: CLI `--json` output mode **User Story** -As a maintainer, I want encrypt/decrypt tested as a pair (including tamper detection) so crypto refactors are safe. +As a CI/CD pipeline author, I want structured JSON output from the CLI so I can parse results programmatically. **Requirements** -- R1: Add unit tests ensuring encryptβ†’decrypt returns original plaintext. -- R2: Wrong key must throw `CasError('INTEGRITY_ERROR')`. -- R3: Tampered ciphertext must throw `CasError('INTEGRITY_ERROR')`. -- R4: Tampered auth tag must throw `CasError('INTEGRITY_ERROR')`. -- R5: If `meta.encrypted === false`, decrypt returns buffer unchanged. -- R6: If `meta` absent and decrypt supports passthrough, it must return unchanged (or explicitly throw; define contract). +- R1: Add `--json` global flag. +- R2: When `--json` is passed, all command output is valid JSON on stdout: + - `store`: `{ "manifest": {...} }` or `{ "treeOid": "..." }` (with `--tree`). + - `restore`: `{ "bytesWritten": N }`. + - `verify`: `{ "ok": true|false, "slug": "...", "chunks": N }`. + - `vault list`: `[{ "slug": "...", "treeOid": "..." }, ...]`. + - `vault init`: `{ "commitOid": "..." }`. + - `vault remove`: `{ "commitOid": "...", "removedTreeOid": "..." }`. +- R3: Errors in JSON mode: `{ "error": "...", "code": "..." }` on stderr with non-zero exit. +- R4: Non-JSON mode behavior unchanged. **Acceptance Criteria** -- AC1: Multiple plaintext sizes round-trip correctly. -- AC2: Wrong-key and tamper tests fail with INTEGRITY_ERROR. -- AC3: Passthrough behavior is documented and tested. +- AC1: `git cas store --json …` outputs parseable JSON. +- AC2: `git cas vault list --json` outputs JSON array. +- AC3: `git cas store --json … | jq .treeOid` works end-to-end. +- AC4: Error in JSON mode is valid JSON with error and code fields. **Scope** -- In scope: Unit tests only. -- Out of scope: storeFile encryption integration (M2). +- In scope: JSON output for all existing commands. +- Out of scope: NDJSON streaming, custom output format templates. **Est. Complexity (LoC)** -- Prod: ~0 -- Tests: ~60 -- Total: ~60 +- Prod: ~30 +- Tests: ~20 +- Total: ~50 **Est. Human Working Hours** - ~1.5h **Test Plan** - Golden path: - - plaintext sizes: 0B, 1B, 1KB, 1MB round-trip. + - Each command with `--json` β†’ output is valid JSON (`JSON.parse` succeeds). - Failures: - - wrong key throws INTEGRITY_ERROR. - - flip one bit in ciphertext throws INTEGRITY_ERROR. - - flip one bit in auth tag throws INTEGRITY_ERROR. - - swap nonce (if represented) throws INTEGRITY_ERROR. + - Error with `--json` β†’ valid JSON error object. - Edges: - - meta.encrypted=false passthrough. - - meta undefined behavior explicitly asserted. + - Empty vault list β†’ `[]`. + - 0-byte store β†’ valid JSON manifest with empty chunks array. - Fuzz/stress: - - 50 randomized plaintext buffers (seeded), assert round-trip holds. - - Tamper one random byte each run, assert failure. + - None (formatting layer). **Definition of Done** -- DoD1: New crypto test suite added and passing. -- DoD2: Crypto error behavior is stable and enforced by tests. +- DoD1: All commands support `--json`. +- DoD2: Tests validate JSON output is parseable. **Blocking** -- Blocks: Task 2.1, Task 6.2 +- Blocks: None **Blocked By** -- Blocked by: Task 1.3 +- Blocked by: None --- -## Task 1.7: Add error-path unit tests (constructors + core failures) +## Task 9.4: CLI error handler DRY cleanup + actionable error messages **User Story** -As a maintainer, I want error conditions covered by tests so regressions in validation and failure handling are caught. +As a CLI user, I want error messages that suggest what to do next. As a maintainer, I want error handling to live in one place. **Requirements** -- R1: Add tests for CasService constructor validation (chunkSize constraints). -- R2: `storeFile` on nonexistent path rejects with error (wrapped if contract exists). -- R3: `verifyIntegrity` returns false (or throws) on digest mismatch (define contract). -- R4: `createTree` rejects invalid manifest input. -- R5: Manifest constructor rejects invalid data (missing slug, negative size, etc.). -- R6: Chunk constructor rejects invalid data (negative index, invalid digest length, etc.). +- R1: Extract shared `runAction(fn)` wrapper that handles try/catch, stderr output, and `process.exit(1)`. +- R2: All 6 command actions use `runAction()` instead of inline try/catch. +- R3: Error messages include the CasError `code` when available: `error [INTEGRITY_ERROR]: message`. +- R4: Add actionable hints for common errors: + - `MISSING_KEY` β†’ "Provide --key-file or --vault-passphrase" + - `MANIFEST_NOT_FOUND` β†’ "Verify the tree OID contains a manifest" + - `VAULT_ENTRY_NOT_FOUND` β†’ "Run 'git cas vault list' to see available entries" + - `VAULT_ENTRY_EXISTS` β†’ "Use --force to overwrite" + - `INTEGRITY_ERROR` β†’ "Check that the correct key or passphrase was used" **Acceptance Criteria** -- AC1: Each listed error path is covered by a unit test. -- AC2: Error codes/messages are stable enough for consumers (typed where applicable). -- AC3: Tests fail if validation is removed or loosened. +- AC1: All command actions delegate to `runAction()`. +- AC2: Error output includes CasError code when present. +- AC3: At least 5 common errors include actionable hints. +- AC4: No behavioral change for non-error paths. **Scope** -- In scope: Unit-level error path tests. -- Out of scope: Integration error scenarios and retries (M2/M3). +- In scope: Error handler extraction + actionable hints. +- Out of scope: Verbose/debug mode, error logging to file. **Est. Complexity (LoC)** -- Prod: ~0–10 (if missing typed errors) -- Tests: ~80 -- Total: ~80–90 +- Prod: ~45 +- Tests: ~0 (existing tests cover error paths; hints verified manually) +- Total: ~45 **Est. Human Working Hours** -- ~2h +- ~1h **Test Plan** - Golden path: - - chunkSize=1024 passes; valid Manifest/Chunk constructors pass. + - All existing CLI tests pass unchanged. - Failures: - - chunkSize=0/512 throws. - - storeFile nonexistent path rejects. - - verifyIntegrity detects mismatch (returns false per contract). - - createTree invalid manifest throws. - - Manifest invalid fields throw. - - Chunk invalid fields throw. + - Trigger each hinted error β†’ verify hint appears in stderr. - Edges: - - boundary chunkSize=1024 exactly passes. - - digest length = 63/65 fails. + - Non-CasError (e.g., ENOENT) β†’ generic message, no hint. - Fuzz/stress: - - Generate malformed manifest objects (missing fields, wrong types) and ensure Zod rejects. + - None. **Definition of Done** -- DoD1: New unit test files added and passing. -- DoD2: Failure contracts (throw vs return false) documented and consistent. +- DoD1: `runAction()` wrapper used by all commands. +- DoD2: Error output includes codes and hints. **Blocking** - Blocks: None @@ -550,149 +609,157 @@ As a maintainer, I want error conditions covered by tests so regressions in vali --- -# M2 β€” Boomerang (v1.2.0) βœ… -**Theme:** Complete storeβ†’retrieve round trip + CLI. - ---- - -## Task 2.1: Implement restoreFile() on CasService +## Task 9.5: Vault list filtering and table formatting **User Story** -As a developer, I want to reconstruct a file from its manifest so I can retrieve previously stored assets reliably. +As a user with many vault entries, I want to filter and scan the list quickly. **Requirements** -- R1: Add `CasService.restoreFile({ manifest, encryptionKey, outputPath })`. -- R2: Read chunk blobs via `persistence.readBlob(chunk.blob)` in index order. -- R3: Verify SHA-256 digest per chunk before writing; on mismatch throw `CasError('INTEGRITY_ERROR')`. -- R4: If encrypted: concatenate ciphertext, decrypt with manifest metadata + key, then write plaintext. -- R5: Must handle empty manifests (0 chunks) by creating an empty file. -- R6: Return `{ bytesWritten: number }`. +- R1: Add `--filter ` option to `vault list`. Glob-style matching against slugs (e.g., `photos/*`, `*.bin`). +- R2: Default output is table-formatted (aligned columns) when stdout is a TTY. Header row: `SLUG TREE OID`. +- R3: Pipe-friendly: tab-separated output when stdout is not a TTY (existing behavior preserved). +- R4: `--json` mode outputs filtered JSON array (if Task 9.3 is complete). **Acceptance Criteria** -- AC1: Plaintext storeβ†’restore matches original bytes. -- AC2: Encrypted storeβ†’restore matches original bytes when correct key is provided. -- AC3: Wrong key throws INTEGRITY_ERROR. -- AC4: Corrupted chunk throws INTEGRITY_ERROR. -- AC5: Empty manifest produces 0-byte output. +- AC1: `vault list --filter "photos/*"` shows only matching entries. +- AC2: TTY output shows aligned table with headers. +- AC3: Non-TTY output is tab-separated (backward compatible). **Scope** -- In scope: Restore + integrity verification + writing output. -- Out of scope: Streaming decryption, resume/partial restore. +- In scope: Glob filtering + TTY-aware table formatting. +- Out of scope: Sort options, metadata columns (size, date), pagination. **Est. Complexity (LoC)** -- Prod: ~45 -- Tests: ~80 -- Total: ~125 +- Prod: ~35 +- Tests: ~20 +- Total: ~55 **Est. Human Working Hours** -- ~3h +- ~1.5h **Test Plan** - Golden path: - - store 10KB plaintext β†’ restore β†’ byte-for-byte compare. - - store 10KB encrypted β†’ restore with key β†’ compare. + - 5 entries, filter matches 2 β†’ 2 shown. + - TTY mode β†’ table with headers. - Failures: - - wrong key β†’ INTEGRITY_ERROR. - - digest mismatch β†’ INTEGRITY_ERROR. - - outputPath unwritable surfaces error (typed if contract added). + - No matches β†’ empty output, exit 0. + - Invalid glob syntax β†’ exit 1 with error. - Edges: - - empty manifest restores empty file. - - single-chunk file (< chunkSize). - - exact multiple of chunkSize. + - No `--filter` β†’ show all (default behavior preserved). + - Single entry β†’ table still formatted correctly. - Fuzz/stress: - - 200 file sizes (seeded) around boundaries (0..3*chunkSize) ensure correctness. - - Optional local-only stress: 50MB restore. + - None. **Definition of Done** -- DoD1: restoreFile implemented and exported via facade. -- DoD2: Unit/integration tests added and passing. -- DoD3: Encrypted restore memory behavior documented (SECURITY.md in M6; add stub note now). +- DoD1: `--filter` flag functional. +- DoD2: TTY-aware table formatting implemented. +- DoD3: Backward-compatible pipe behavior preserved. **Blocking** -- Blocks: Task 2.3, Task 5.1, Task 5.2, Task 4.1 +- Blocks: None **Blocked By** -- Blocked by: Task 1.3, Task 1.4 +- Blocked by: None + +--- + +# M10 β€” Hydra (v3.0.0) +**Theme:** Content-defined chunking for dramatically better dedup on versioned files. Fixed-size chunking invalidates every chunk after an edit; CDC limits the blast radius to 1–2 chunks. Major version bump for new chunking port and manifest metadata. --- -## Task 2.2: Add readTree() to GitPersistencePort and GitPersistenceAdapter +## Task 10.1: Buzhash rolling hash + CDC chunking engine **User Story** -As the CAS system, I want to parse a Git tree into entries so I can locate manifest and chunk blobs for lifecycle operations. +As a developer storing versioned files, I want content-defined chunk boundaries so incremental changes don't invalidate every chunk downstream of the edit point. **Requirements** -- R1: Add `readTree(treeOid)` to GitPersistencePort. -- R2: Implement adapter via `git ls-tree `. -- R3: Parse each line: ` \t` into `{ mode, type, oid, name }`. -- R4: Malformed output throws typed error `CasError('TREE_PARSE_ERROR')`. +- R1: Implement Buzhash rolling hash algorithm with a 256-entry random byte table (deterministic seed). +- R2: Implement CDC chunker that uses rolling hash to find chunk boundaries. +- R3: Configurable parameters: `minChunkSize` (default 64 KiB), `maxChunkSize` (default 1 MiB), `targetChunkSize` (default 256 KiB). +- R4: Chunk boundary determined when `hash & mask === 0`, where mask is derived from `targetChunkSize` (e.g., `targetChunkSize - 1` for power-of-2 targets). +- R5: Force boundary at `maxChunkSize` if no natural boundary found (prevent unbounded chunks). +- R6: Force minimum chunk size: never split below `minChunkSize` (prevent tiny chunks). +- R7: Deterministic: same input always produces same chunks regardless of runtime. +- R8: Streaming: operates on `AsyncIterable` with O(1) memory. **Acceptance Criteria** -- AC1: Typical ls-tree output parses correctly into expected fields. -- AC2: Empty output returns []. -- AC3: Malformed output throws TREE_PARSE_ERROR. +- AC1: CDC chunker produces variable-size chunks bounded by min/max. +- AC2: Identical input always produces identical chunks (deterministic). +- AC3: Inserting 10 bytes in the middle of a 1MB file changes only 1–2 chunks (not all downstream chunks). +- AC4: Average chunk size approximates `targetChunkSize`. +- AC5: No chunk smaller than `minChunkSize` (except final chunk of file). +- AC6: No chunk larger than `maxChunkSize`. **Scope** -- In scope: Non-recursive tree parsing. -- Out of scope: Tree walking / recursion. +- In scope: Rolling hash + CDC chunker implementation + unit tests. +- Out of scope: Integration with CasService (Task 10.2), Rabin fingerprinting (Buzhash is simpler and sufficient), gear-based CDC. **Est. Complexity (LoC)** -- Prod: ~20 -- Tests: ~25 -- Total: ~45 +- Prod: ~200 (Buzhash table + rolling hash + CDC logic) +- Tests: ~150 (determinism, boundary detection, size bounds, dedup) +- Total: ~350 **Est. Human Working Hours** -- ~1.5h +- ~12h **Test Plan** - Golden path: - - parse output containing manifest + 2 chunk blobs. + - 1MB buffer β†’ produces ~4 chunks (target 256KB). + - Same buffer β†’ same chunks every time. + - Modify 10 bytes at offset 500KB β†’ only 1–2 chunks differ vs. original. - Failures: - - malformed line triggers TREE_PARSE_ERROR. - - plumbing failure propagates or wraps as GIT_ERROR (define contract). + - minChunkSize > maxChunkSize β†’ throws configuration error. + - targetChunkSize outside [min, max] β†’ throws. - Edges: - - filename contains spaces (tab delimiter must be honored). + - File smaller than minChunkSize β†’ single chunk. + - File exactly maxChunkSize β†’ single chunk. + - All-zero file (degenerate hash behavior) β†’ chunks bounded by max. + - File = 1 byte β†’ single chunk. - Fuzz/stress: - - parse synthetic output with 1,000 entries. + - 100 random buffers (1KB–10MB, seeded): verify all chunks satisfy min/max bounds. + - Determinism: chunk same buffer 100 times, assert identical output. + - Dedup test: insert/delete 1–100 bytes at random offsets, measure % of chunks unchanged (expect >80% for small edits). **Definition of Done** -- DoD1: Port and adapter methods implemented. -- DoD2: Parser tests added and green. -- DoD3: No breaking API changes (additive only). +- DoD1: Buzhash + CDC chunker implemented as standalone module under `src/infrastructure/chunkers/`. +- DoD2: All boundary and determinism tests pass. +- DoD3: Performance: >100 MB/s throughput on chunking alone (no I/O). **Blocking** -- Blocks: Task 2.3, Task 4.1 +- Blocks: Task 10.2, Task 10.4 **Blocked By** - Blocked by: None --- -## Task 2.3: Integration tests (store + restore round trip) +## Task 10.2: ChunkingPort abstraction **User Story** -As a maintainer, I want end-to-end tests against real Git so the system is validated beyond mocks. +As an architect, I want chunking strategy behind a port so fixed-size and CDC can be swapped without modifying the domain service. **Requirements** -- R1: Add integration test suite that runs against a real Git repo. -- R2: Test uses a temp bare repo (`git init --bare`) as ODB. -- R3: Exercises: storeFile β†’ createTree β†’ readTree β†’ readManifest/restoreFile. -- R4: Test both JSON and CBOR codec paths. -- R5: Test encrypted and unencrypted paths. -- R6: Integration tests run in Docker to ensure consistent Git availability. +- R1: Add `src/ports/ChunkingPort.js` with abstract method `chunk(source: AsyncIterable): AsyncIterable`. +- R2: Implement `FixedChunker` adapter wrapping existing `_chunkAndStore` buffer-slicing logic. +- R3: Implement `CdcChunker` adapter wrapping Task 10.1's CDC engine. +- R4: `CasService` constructor accepts optional `chunker` port. Defaults to `FixedChunker(chunkSize)`. +- R5: Refactor `CasService._chunkAndStore()` to use the chunking port instead of inline buffer slicing. +- R6: `ContentAddressableStore` constructor accepts optional `chunking` config: `{ strategy: 'fixed' | 'cdc', …params }`. **Acceptance Criteria** -- AC1: Integration suite passes locally and in CI (M3). -- AC2: Round-trip comparisons are byte-for-byte equal. -- AC3: Both codecs validated end-to-end. +- AC1: `CasService({ chunker: new CdcChunker(…) })` uses CDC. +- AC2: Default behavior (no chunker specified) is identical to current fixed-size chunking. +- AC3: All existing store/restore tests pass without modification. +- AC4: CDC chunker plugs in and produces valid manifests that restore correctly. **Scope** -- In scope: Integration test harness + docker runner + scenarios. -- Out of scope: Performance benchmarks (M5). +- In scope: Port + 2 adapters + CasService refactor + facade config. +- Out of scope: Additional chunking strategies, auto-detection of optimal strategy. **Est. Complexity (LoC)** -- Prod: ~0 -- Tests: ~120 +- Prod: ~80 (port + 2 adapters + service refactor + facade config) +- Tests: ~40 (port contract tests, integration with both chunkers) - Total: ~120 **Est. Human Working Hours** @@ -700,926 +767,909 @@ As a maintainer, I want end-to-end tests against real Git so the system is valid **Test Plan** - Golden path: - - 10KB plaintext β†’ round trip. - - 10KB encrypted β†’ round trip with key. - - CBOR codec round trip. + - Store with FixedChunker β†’ same behavior as before (byte-identical manifests). + - Store with CdcChunker β†’ valid manifest, restore succeeds. - Failures: - - wrong key restore fails with INTEGRITY_ERROR. + - Chunker that yields empty buffers β†’ handled gracefully (skip empty). - Edges: - - 0-byte file round trip. - - exact chunkSize file round trip. - - exact 3*chunkSize file round trip. + - Switch chunker between store and restore β†’ restore still works (chunking strategy doesn't affect restore β€” chunks are self-describing via manifest). - Fuzz/stress: - - 50 random file sizes (seeded) around chunk boundaries. - - Optional local-only: 100MB store/restore smoke (not CI). + - 50 random files stored with both chunkers β†’ all restore correctly. **Definition of Done** -- DoD1: Integration tests runnable via npm script. -- DoD2: Docker harness documented in test README or comments. -- DoD3: Integration tests pass in CI once M3 lands. +- DoD1: ChunkingPort, FixedChunker, CdcChunker implemented. +- DoD2: CasService uses chunking port. +- DoD3: All existing tests pass (no regression). **Blocking** -- Blocks: Task 3.1 +- Blocks: Task 10.3 **Blocked By** -- Blocked by: Task 2.1, Task 2.2 +- Blocked by: Task 10.1 --- -## Task 2.4: Stream error recovery β€” wrap and document partial writes +## Task 10.3: CDC manifest metadata + backward compatibility **User Story** -As a developer, I want storeFile to fail safely on stream errors so partial stores don't produce misleading manifests. +As a user, I want CDC manifests to record their chunking strategy so future tools can understand or reproduce the chunk boundaries. **Requirements** -- R1: If stream errors mid-store, storeFile rejects and does not return a Manifest. -- R2: Wrap stream errors as `CasError('STREAM_ERROR')` including partial chunks written count. -- R3: Document that orphaned chunk blobs may remain, and are handled by Git GC if unreachable. -- R4: Ensure manifest is not written/returned on partial store. +- R1: Add optional `chunking` field to ManifestSchema: `{ strategy: 'fixed' | 'cdc', params: { … } }`. +- R2: Fixed-size manifests omit the field (backward compatible with all existing manifests). +- R3: CDC manifests include `{ strategy: 'cdc', params: { target: N, min: N, max: N } }`. +- R4: `readManifest()` handles manifests with or without `chunking` field. +- R5: v1 and v2 manifests remain valid (no migration required). +- R6: Add `INVALID_CHUNKING_STRATEGY` error code for unrecognized strategies. **Acceptance Criteria** -- AC1: Simulated stream failure returns STREAM_ERROR with metadata `{ chunksWritten }`. -- AC2: No manifest is returned/created on failure. -- AC3: Documentation note exists (inline or docs placeholder). +- AC1: CDC store produces manifest with `chunking` field. +- AC2: Fixed-size store produces manifests without `chunking` field (backward compatible). +- AC3: Old manifests (no `chunking` field) read correctly on new code. +- AC4: Unrecognized strategy in manifest throws `INVALID_CHUNKING_STRATEGY`. **Scope** -- In scope: Error wrapping + tests + documentation note. -- Out of scope: Deleting blobs, resume functionality. +- In scope: Schema extension, backward compat, error code. +- Out of scope: Migration tooling for old manifests, manifest version bump (chunking field is additive). **Est. Complexity (LoC)** -- Prod: ~15 -- Tests: ~20 -- Total: ~35 +- Prod: ~40 (schema + Manifest value object + error code) +- Tests: ~60 (round-trip, backward compat, unknown strategy) +- Total: ~100 **Est. Human Working Hours** -- ~1h +- ~3h **Test Plan** - Golden path: - - No change to successful stores. + - CDC store β†’ manifest includes `chunking.strategy === 'cdc'`. + - Fixed store β†’ manifest has no `chunking` field. + - Read old manifest without `chunking` β†’ works fine. - Failures: - - stream emits error after N chunks β†’ STREAM_ERROR and metadata correct. + - Manifest with `chunking.strategy === 'unknown'` β†’ throws INVALID_CHUNKING_STRATEGY. - Edges: - - error occurs before any chunks written β†’ chunksWritten=0. + - v1 manifest with compression + encryption + no chunking field β†’ still valid. + - v2 merkle manifest with CDC β†’ both `subManifests` and `chunking` fields present. - Fuzz/stress: - - randomized failure point across 0..N chunks (seeded) to ensure metadata correctness. + - Generate 100 manifests with random valid/invalid chunking fields β†’ validate schema behavior. **Definition of Done** -- DoD1: storeFile wraps stream errors consistently. -- DoD2: Tests prove manifest is not produced. -- DoD3: Partial-write behavior documented. +- DoD1: ManifestSchema extended with optional chunking field. +- DoD2: Backward compatibility verified across v1/v2 manifests. +- DoD3: Error code registered and tested. **Blocking** - Blocks: None **Blocked By** -- Blocked by: None +- Blocked by: Task 10.2 --- -## Task 2.5: CLI scaffold + `store` and `tree` subcommands +## Task 10.4: CDC benchmarks + dedup efficiency comparison **User Story** -As a developer, I want `git cas store` and `git cas tree` commands so I can use CAS from the terminal without writing Node scripts. +As a maintainer, I want empirical data comparing CDC vs fixed chunking so I can document trade-offs and tune defaults. **Requirements** -- R1: Add `bin/git-cas.js` entry point (Git discovers `git-cas` on PATH for `git cas` subcommands). -- R2: Add `"bin": { "git-cas": "./bin/git-cas.js" }` to `package.json`. -- R3: Use a lightweight CLI framework (e.g., `commander`) for subcommand routing. -- R4: `git cas store --slug [--key-file ] [--tree]`: - - Reads the file, calls `storeFile()`. - - Prints manifest JSON to stdout by default. - - If `--tree` is passed, also calls `createTree()` and prints tree OID. - - `--key-file` reads a 32-byte raw key from a file for encryption. -- R5: `git cas tree --manifest `: - - Reads a manifest JSON from file/stdin, calls `createTree()`. - - Prints tree OID to stdout. -- R6: Exit 0 on success, exit 1 on error with message to stderr. -- R7: `--cwd` flag to set Git working directory (defaults to `.`). +- R1: Add benchmark suite comparing fixed vs CDC chunking across file sizes (1MB, 10MB, 100MB). +- R2: Measure chunking throughput (MB/s) for both strategies. +- R3: Measure dedup efficiency: for a file modified by N random byte insertions, what % of chunks remain unchanged? +- R4: Output results as a comparison table (console). **Acceptance Criteria** -- AC1: `npx git-cas store ./test.txt --slug test` prints manifest JSON. -- AC2: `npx git-cas store ./test.txt --slug test --tree` prints tree OID. -- AC3: `npx git-cas tree --manifest manifest.json` prints tree OID. -- AC4: Invalid arguments produce helpful usage message and exit 1. -- AC5: `--key-file` with valid 32-byte file encrypts successfully. -- AC6: `--key-file` with wrong-size file exits 1 with clear error. +- AC1: Benchmark suite runs without errors. +- AC2: CDC shows significantly better dedup for incrementally modified files (>80% chunk reuse for small edits vs. ~0% for fixed). +- AC3: CDC throughput is within 2Γ— of fixed chunking (rolling hash overhead is bounded). **Scope** -- In scope: CLI scaffold, store subcommand, tree subcommand, key-file reading. -- Out of scope: `restore` subcommand (Task 2.6), shell completions, config files. +- In scope: Synthetic benchmarks with in-memory data. +- Out of scope: CI benchmark tracking, real-world file corpus, regression detection. **Est. Complexity (LoC)** -- Prod: ~80 -- Tests: ~30 -- Total: ~110 +- Prod: ~0 +- Tests/Bench: ~120 +- Total: ~120 **Est. Human Working Hours** - ~3h **Test Plan** - Golden path: - - store a file via CLI β†’ valid manifest JSON on stdout. - - store with `--tree` β†’ tree OID on stdout. - - tree from manifest file β†’ tree OID on stdout. + - Bench suite completes and prints results table. - Failures: - - missing file β†’ exit 1 with error. - - missing `--slug` β†’ exit 1 with usage message. - - bad key file β†’ exit 1 with INVALID_KEY_LENGTH/TYPE error. + - N/A (benchmarks are informational). - Edges: - - 0-byte file store. - - manifest piped via stdin (if supported). + - Include 0-byte and 1-byte files in benchmark. - Fuzz/stress: - - None (thin wrapper over tested API). + - Run 3 times; verify <20% variance in throughput measurements. **Definition of Done** -- DoD1: `bin/git-cas.js` exists with store and tree subcommands. -- DoD2: `package.json` declares bin entry. -- DoD3: `npx git-cas --help` prints usage. -- DoD4: Integration smoke test passes against real Git repo. +- DoD1: Benchmark suite added to `test/benchmark/`. +- DoD2: Results documented in commit message or GUIDE.md addendum. +- DoD3: Default CDC parameters tuned based on results if needed. **Blocking** -- Blocks: Task 2.6 +- Blocks: None **Blocked By** -- Blocked by: None +- Blocked by: Task 10.1 + +--- + +# M11 β€” Locksmith (v3.1.0) +**Theme:** Multi-recipient encryption via envelope encryption (DEK/KEK model). Each file is encrypted with a random Data Encryption Key; the DEK is wrapped per-recipient. Adding or removing access never re-encrypts the data. --- -## Task 2.6: CLI `restore` subcommand +## Task 11.1: Envelope encryption (DEK/KEK model) **User Story** -As a developer, I want `git cas restore --out ` so I can retrieve stored assets from the terminal. +As a team member, I want each file encrypted with a random data key so that access control is managed by wrapping that key, not by re-encrypting the file. **Requirements** -- R1: `git cas restore --out [--key-file ]`: - - Reads the tree, extracts the manifest, restores the file to `--out`. - - Prints bytes written to stdout on success. - - `--key-file` supplies decryption key for encrypted assets. -- R2: Exit 0 on success, exit 1 on error (INTEGRITY_ERROR, MANIFEST_NOT_FOUND, etc.) with message to stderr. -- R3: Requires `restoreFile()` (Task 2.1) and `readManifest()` or equivalent tree-reading capability. +- R1: On encrypted store, generate a random 32-byte Data Encryption Key (DEK). +- R2: Encrypt file content with the DEK using existing AES-256-GCM pipeline. +- R3: Wrap (encrypt) the DEK with each recipient's Key Encryption Key (KEK) using AES-256-GCM key-wrapping. +- R4: Store wrapped DEKs in manifest under `encryption.recipients: [{ label, wrappedDek, nonce, tag }]`. +- R5: On restore, caller provides their KEK; system tries each recipient entry, unwraps DEK, then decrypts content. +- R6: Single-recipient mode (existing behavior) remains a special case: 1 recipient, no label required. +- R7: Backward compatible: old manifests (direct key encryption, no `recipients` field) still restore correctly using the existing code path. **Acceptance Criteria** -- AC1: `npx git-cas restore --out ./restored.txt` writes correct file. -- AC2: Encrypted asset with `--key-file` restores correctly. -- AC3: Wrong key exits 1 with INTEGRITY_ERROR message. -- AC4: Invalid tree OID exits 1 with clear error. +- AC1: Multi-recipient store β†’ restore with any recipient's KEK succeeds. +- AC2: Restore with a non-recipient KEK throws `NO_MATCHING_RECIPIENT`. +- AC3: Old-style manifests (no `recipients` field) restore as before. +- AC4: DEK never appears in plaintext in the manifest. **Scope** -- In scope: restore subcommand wired to restoreFile API. -- Out of scope: Streaming output to stdout, partial restore, resume. +- In scope: DEK/KEK model, wrap/unwrap, manifest schema changes, backward compat. +- Out of scope: Asymmetric KEKs (X25519), key exchange protocols, KMS integration, HSM support. **Est. Complexity (LoC)** -- Prod: ~30 -- Tests: ~20 -- Total: ~50 +- Prod: ~120 (envelope encrypt/decrypt + CasService changes + schema) +- Tests: ~100 (multi-recipient round-trip, wrong key, backward compat) +- Total: ~220 **Est. Human Working Hours** -- ~1.5h +- ~8h **Test Plan** - Golden path: - - store β†’ tree β†’ restore β†’ byte-compare original. - - encrypted store β†’ tree β†’ restore with key β†’ byte-compare. + - Store with 2 recipients β†’ restore with recipient A β†’ byte-compare original. + - Store with 2 recipients β†’ restore with recipient B β†’ byte-compare original. + - Single-recipient store β†’ restore as before. - Failures: - - wrong key β†’ exit 1 INTEGRITY_ERROR. - - nonexistent tree OID β†’ exit 1. - - missing `--out` β†’ exit 1 with usage. + - Restore with non-recipient key β†’ NO_MATCHING_RECIPIENT. + - Tampered wrappedDek β†’ DEK_UNWRAP_FAILED. - Edges: - - 0-byte file round-trip via CLI. + - 1 recipient (degenerate multi-recipient = current behavior). + - 10 recipients β†’ all can restore. + - Old manifest without recipients field β†’ restore unchanged. - Fuzz/stress: - - None (thin wrapper over tested API). + - 50 random plaintexts Γ— 3 random KEKs β†’ all round-trip correctly. + - Tamper each recipient entry independently β†’ correct error for each. **Definition of Done** -- DoD1: `restore` subcommand added to `bin/git-cas.js`. -- DoD2: Full CLI round-trip (store β†’ tree β†’ restore) documented and tested. -- DoD3: README CLI section is now accurate and deliverable. +- DoD1: Envelope encryption implemented in CasService. +- DoD2: Schema updated with recipients field. +- DoD3: Backward compatibility tested with v1/v2 manifests. +- DoD4: Security design documented in SECURITY.md addendum. **Blocking** -- Blocks: None +- Blocks: Task 11.2, Task 11.4, Task 12.1 **Blocked By** -- Blocked by: Task 2.1, Task 2.5 - ---- - -# M3 β€” Launchpad (v1.3.0) βœ… -**Theme:** Automated quality gates and release process. +- Blocked by: None --- -## Task 3.1: GitHub Actions CI workflow +## Task 11.2: Recipient management API **User Story** -As a maintainer, I want CI to run lint + unit + integration tests on every push/PR so regressions are caught early. +As a developer, I want to add and remove recipients from an existing encrypted asset without re-encrypting the data. **Requirements** -- R1: Add `.github/workflows/ci.yml`. -- R2: Triggers on push to main and pull_request to main. -- R3: Uses Node 22. -- R4: Steps: checkout, install, lint, unit tests, integration tests. -- R5: Integration tests run via Docker harness. -- R6: Cache dependencies for speed. +- R1: Add `CasService.addRecipient({ manifest, existingKey, newRecipientKey, label })`. + - Unwrap DEK with `existingKey`, re-wrap with `newRecipientKey`, append to recipients list. + - Return updated Manifest (new value object β€” manifests are immutable). +- R2: Add `CasService.removeRecipient({ manifest, label })`. + - Remove recipient entry by label. + - Return updated Manifest. +- R3: Removing last recipient throws `CasError('CANNOT_REMOVE_LAST_RECIPIENT')`. +- R4: Adding duplicate label throws `CasError('RECIPIENT_ALREADY_EXISTS')`. +- R5: Updated manifest must be re-persisted (`createTree` + vault update) by the caller. **Acceptance Criteria** -- AC1: CI runs automatically on PRs and pushes. -- AC2: CI fails if lint/tests fail. -- AC3: Integration tests execute in CI and pass. +- AC1: addRecipient β†’ new manifest has additional recipient entry. +- AC2: removeRecipient β†’ manifest has one fewer recipient entry. +- AC3: Data is never re-encrypted (only DEK is re-wrapped). +- AC4: All existing recipients can still restore after addRecipient. **Scope** -- In scope: CI workflow only. -- Out of scope: CD publishing (Task 3.2), multi-OS matrix. +- In scope: Add/remove recipient methods + manifest mutation + validation. +- Out of scope: Batch operations, per-recipient permissions, key escrow. **Est. Complexity (LoC)** -- Prod: ~60 -- Tests: ~0 -- Total: ~60 +- Prod: ~100 (add/remove methods + validation) +- Tests: ~80 (add, remove, edge cases, round-trips) +- Total: ~180 **Est. Human Working Hours** -- ~2h +- ~6h **Test Plan** - Golden path: - - Push branch β†’ CI green. + - Store with 1 recipient β†’ addRecipient β†’ both can restore. + - Store with 2 recipients β†’ removeRecipient β†’ remaining recipient restores. - Failures: - - Intentionally break a unit test β†’ CI red. - - Intentionally break integration test β†’ CI red. + - addRecipient with wrong existingKey β†’ DEK_UNWRAP_FAILED. + - Add duplicate label β†’ RECIPIENT_ALREADY_EXISTS. + - Remove last recipient β†’ CANNOT_REMOVE_LAST_RECIPIENT. + - Remove nonexistent label β†’ RECIPIENT_NOT_FOUND. - Edges: - - Cache miss still succeeds. + - Add 100 recipients β†’ all can restore. + - Remove all but 1 β†’ that 1 still works. - Fuzz/stress: - - Run CI twice with different dependency states (lockfile change) to validate caching behavior. + - Repeatedly add/remove recipients (100 cycles) β†’ final recipient set is correct. **Definition of Done** -- DoD1: CI workflow merged and green on main. -- DoD2: CI clearly reports which step failed. +- DoD1: addRecipient and removeRecipient implemented and exposed via facade. +- DoD2: Edge cases tested. +- DoD3: API documented in API.md. **Blocking** -- Blocks: Task 3.2 +- Blocks: Task 11.4 **Blocked By** -- Blocked by: Task 2.3 +- Blocked by: Task 11.1 --- -## Task 3.2: npm publish workflow (tag-driven releases) +## Task 11.3: Manifest schema for multi-recipient metadata **User Story** -As a maintainer, I want releases published automatically on version tags so publishing is reproducible and low-friction. +As a maintainer, I want the multi-recipient manifest structure validated by Zod schema so malformed recipient entries are caught early. **Requirements** -- R1: Add `.github/workflows/release.yml`. -- R2: Trigger on tag push matching `v*`. -- R3: Run full CI gates before publish. -- R4: Publish to npm with `--access public`. -- R5: Requires `NPM_TOKEN` secret. -- R6: Create GitHub Release from tag and include CHANGELOG excerpt. +- R1: Add `RecipientSchema` to ManifestSchema.js: `{ label: string, wrappedDek: base64 string, nonce: base64 string, tag: base64 string, kekType?: string }`. +- R2: Extend `EncryptionSchema` with optional `recipients: z.array(RecipientSchema)`. +- R3: Existing encryption metadata (nonce, tag on the outer level) represents the DEK encryption of the file content. +- R4: Validate: if `recipients` is present and non-empty, at least one entry must exist. +- R5: Register error codes: `NO_MATCHING_RECIPIENT`, `DEK_UNWRAP_FAILED`, `RECIPIENT_NOT_FOUND`, `RECIPIENT_ALREADY_EXISTS`, `CANNOT_REMOVE_LAST_RECIPIENT`. **Acceptance Criteria** -- AC1: Tag push triggers release workflow. -- AC2: Workflow fails if CI fails. -- AC3: Successful workflow publishes package and creates GitHub Release. +- AC1: Manifest with valid recipients passes schema validation. +- AC2: Manifest with malformed recipient (missing label, bad wrappedDek) fails validation. +- AC3: Manifest without recipients field passes (backward compat). **Scope** -- In scope: Tag-based publish workflow. -- Out of scope: Auto version bumping, changelog generation tooling. +- In scope: Schema definitions + error code registration. +- Out of scope: Runtime encryption logic (covered by Tasks 11.1 and 11.2). **Est. Complexity (LoC)** -- Prod: ~50 -- Tests: ~0 -- Total: ~50 +- Prod: ~40 (schema definitions) +- Tests: ~50 (schema validation positive/negative) +- Total: ~90 **Est. Human Working Hours** -- ~2h +- ~3h **Test Plan** - Golden path: - - Create tag in test repo β†’ workflow runs through dry-run or publish to test namespace. + - Valid manifest with 2 recipients β†’ schema passes. - Failures: - - Missing NPM_TOKEN β†’ workflow fails with clear message. - - CHANGELOG missing β†’ workflow fails. + - Missing label β†’ schema rejects. + - Missing wrappedDek β†’ schema rejects. + - Non-string wrappedDek β†’ schema rejects. - Edges: - - Tag format mismatch does not trigger. + - Empty recipients array β†’ passes schema (runtime validates separately). + - Recipients with unknown extra fields β†’ stripped by schema. - Fuzz/stress: - - Tag multiple versions sequentially (v1.1.0, v1.1.1) in fork to ensure idempotent behavior. + - 100 random malformed recipient objects β†’ all correctly rejected. **Definition of Done** -- DoD1: Release workflow exists and passes in a fork/test environment. -- DoD2: Release notes include changelog excerpt. +- DoD1: RecipientSchema and error codes added. +- DoD2: Schema tests cover positive and negative paths. **Blocking** - Blocks: None **Blocked By** -- Blocked by: Task 3.1, Task 1.1, Task 1.2 +- Blocked by: None (can be done in parallel with Task 11.1) --- -# M4 β€” Compass (v1.4.0) βœ… -**Theme:** Read manifests from Git, manage stored assets, analyze storage. - ---- - -## Task 4.1: Implement readManifest() on CasService βœ… +## Task 11.4: CLI multi-recipient support **User Story** -As a developer, I want to reconstruct a Manifest from a Git tree OID so I can inspect and restore assets without holding manifests in memory. +As a CLI user, I want to encrypt assets for multiple recipients and manage the recipient list from the terminal. **Requirements** -- R1: Add `CasService.readManifest({ treeOid })`. -- R2: Use `persistence.readTree(treeOid)` to list entries. -- R3: Locate manifest entry based on codec (e.g., `manifest.json` / `manifest.cbor`). -- R4: Read manifest blob via `persistence.readBlob(oid)`. -- R5: Decode via `codec.decode(blob)` and validate via Manifest schema. -- R6: Throw `CasError('MANIFEST_NOT_FOUND')` if missing. +- R1: Add `--recipient ` repeatable flag to `git cas store`. Each occurrence adds a recipient KEK. +- R2: Add `git cas recipient add --label