Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

## [2.0.0] — M7 Horizon (2026-02-07)

### Added
- **Compression support** (Task 7.1): Optional gzip compression pipeline via `compression: { algorithm: 'gzip' }` option on `store()`. Compression is applied before encryption when both are enabled. Manifests include a new optional `compression` field. Decompression on `restore()` is automatic.
- **KDF support** (Task 7.2): Passphrase-based encryption using PBKDF2 or scrypt via `deriveKey()` method and `passphrase` option on `store()`/`restore()`. KDF parameters are stored in `manifest.encryption.kdf` for deterministic re-derivation. All three crypto adapters (Node, Bun, Web) implement `deriveKey()`.
- **Merkle tree manifests** (Task 7.3): Large manifests (chunk count exceeding `merkleThreshold`, default 1000) are automatically split into sub-manifests stored as separate blobs. Root manifest uses `version: 2` with `subManifests` references. `readManifest()` transparently reconstitutes v2 manifests into flat chunk lists. Full backward compatibility with v1 manifests.
- New schema fields: `version`, `compression`, `subManifests` on `ManifestSchema`; `kdf` on `EncryptionSchema`.
- 52 new unit tests across three new test suites (compression, KDF, Merkle).
- Updated API reference (`docs/API.md`), guide (`GUIDE.md`), and README with v2.0.0 feature documentation.

### Changed
- **BREAKING**: Manifest schema now includes `version` field (defaults to 1). Existing v1 manifests are fully backward-compatible.
- `CasService` constructor accepts new `merkleThreshold` option.
- `ContentAddressableStore` constructor now accepts and forwards `merkleThreshold` to `CasService`.
- `store()` and `storeFile()` accept `passphrase`, `kdfOptions`, and `compression` options.
- `restore()` accepts `passphrase` option.

### Fixed
- `storeFile()` now forwards `passphrase`, `kdfOptions`, and `compression` options to `store()` (previously silently dropped).
- `NodeCryptoAdapter.deriveKey()` uses `Buffer.from(salt)` for base64 encoding, preventing corrupt output when salt is a `Uint8Array`.
- `WebCryptoAdapter.deriveKey()` now validates KDF algorithm and throws for unsupported values instead of silently falling through to scrypt.
- `WebCryptoAdapter` scrypt derivation now throws a descriptive error when `node:crypto` is unavailable (e.g. in browsers).

## [1.6.2] — OIDC publishing + JSR docs coverage (2026-02-07)

### Added
Expand Down
223 changes: 214 additions & 9 deletions GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,13 @@ along from first principles to full mastery.
7. [The CLI](#7-the-cli)
8. [Lifecycle Management](#8-lifecycle-management)
9. [Observability](#9-observability)
10. [Architecture](#10-architecture)
11. [Codec System](#11-codec-system)
12. [Error Handling](#12-error-handling)
13. [FAQ / Troubleshooting](#13-faq--troubleshooting)
10. [Compression](#10-compression)
11. [Passphrase Encryption (KDF)](#11-passphrase-encryption-kdf)
12. [Merkle Manifests](#12-merkle-manifests)
13. [Architecture](#13-architecture)
14. [Codec System](#14-codec-system)
15. [Error Handling](#15-error-handling)
16. [FAQ / Troubleshooting](#16-faq--troubleshooting)

---

Expand Down Expand Up @@ -756,7 +759,208 @@ await cas.verifyIntegrity(manifest);

---

## 10. Architecture
## 10. Compression

*New in v2.0.0.*

`git-cas` supports optional gzip compression. When enabled, file content is
compressed before encryption (if any) and before chunking. This reduces storage
size for compressible data without changing the round-trip contract.

### Storing with Compression

Pass the `compression` option when storing:

```js
const manifest = await cas.storeFile({
filePath: './vacation.jpg',
slug: 'photos/vacation',
compression: { algorithm: 'gzip' },
});

console.log(manifest.compression);
// { algorithm: 'gzip' }
```

The manifest gains an optional `compression` field recording the algorithm used.

### Compression + Encryption

Compression and encryption compose naturally. Compression runs first (on
plaintext), then encryption runs on the compressed bytes:

```js
const manifest = await cas.storeFile({
filePath: './data.csv',
slug: 'reports/q4',
compression: { algorithm: 'gzip' },
encryptionKey,
});
```

### Restoring Compressed Content

Decompression on `restore()` is automatic. If the manifest includes a
`compression` field, the restored bytes are decompressed after decryption
(if encrypted) and after chunk reassembly:

```js
await cas.restoreFile({
manifest,
outputPath: './restored.csv',
});
// restored.csv is byte-identical to the original data.csv
```

### When to Use Compression

Compression is most useful for text, CSV, JSON, XML, and other compressible
formats. For already-compressed data (JPEG, PNG, MP4, ZIP), compression adds
CPU cost without meaningful size reduction. Use your judgement.

---

## 11. Passphrase Encryption (KDF)

*New in v2.0.0.*

Instead of managing raw 32-byte encryption keys, you can derive keys from
passphrases using standard key derivation functions (KDFs). `git-cas` supports
PBKDF2 (default) and scrypt.

### Storing with a Passphrase

Pass `passphrase` instead of `encryptionKey`:

```js
const manifest = await cas.storeFile({
filePath: './vacation.jpg',
slug: 'photos/vacation',
passphrase: 'my secret passphrase',
});

console.log(manifest.encryption.kdf);
// {
// algorithm: 'pbkdf2',
// salt: 'base64-encoded-salt',
// iterations: 100000,
// keyLength: 32
// }
```

KDF parameters (salt, iterations, algorithm) are stored in the manifest's
`encryption.kdf` field. The salt is generated randomly for each store
operation.

### Restoring with a Passphrase

Provide the same passphrase on restore. The KDF parameters in the manifest
are used to re-derive the key:

```js
await cas.restoreFile({
manifest,
passphrase: 'my secret passphrase',
outputPath: './restored.jpg',
});
```

A wrong passphrase produces a wrong key, which fails with `INTEGRITY_ERROR`
(AES-256-GCM detects it).

### Using scrypt

Pass `kdfOptions` to select scrypt:

```js
const manifest = await cas.storeFile({
filePath: './secret.bin',
slug: 'vault',
passphrase: 'strong passphrase',
kdfOptions: { algorithm: 'scrypt', cost: 16384 },
});
```

### Manual Key Derivation

For advanced workflows, derive the key yourself:

```js
const { key, salt, params } = await cas.deriveKey({
passphrase: 'my secret passphrase',
algorithm: 'pbkdf2',
iterations: 200000,
});

// Use the derived key directly
const manifest = await cas.storeFile({
filePath: './vacation.jpg',
slug: 'photos/vacation',
encryptionKey: key,
});
```

### Supported KDF Algorithms

| Algorithm | Default Params | Notes |
|-----------|---------------|-------|
| `pbkdf2` (default) | 100,000 iterations, SHA-512 | Widely supported, good baseline |
| `scrypt` | N=16384, r=8, p=1 | Memory-hard, stronger against GPU attacks |

---

## 12. Merkle Manifests

*New in v2.0.0.*

When storing very large files, the manifest (which lists every chunk) can
itself become large. Merkle manifests solve this by splitting the chunk list
into sub-manifests, each stored as a separate Git blob. The root manifest
references sub-manifests by OID.

### How It Works

When the chunk count exceeds `merkleThreshold` (default: 1000), `git-cas`
automatically:

1. Groups chunks into sub-manifests (each containing up to `merkleThreshold`
chunks).
2. Stores each sub-manifest as a Git blob.
3. Writes a root manifest with `version: 2` and a `subManifests` array
referencing the sub-manifest blob OIDs.

### Configuring the Threshold

Set `merkleThreshold` at construction time:

```js
const cas = new ContentAddressableStore({
plumbing: git,
merkleThreshold: 500, // Split at 500 chunks instead of 1000
});
```

### Transparent Reconstitution

`readManifest()` transparently handles both v1 (flat) and v2 (Merkle)
manifests. When it encounters a v2 manifest, it reads all sub-manifests,
concatenates their chunk lists, and returns a flat `Manifest` object:

```js
const manifest = await cas.readManifest({ treeOid });
// Works identically whether the manifest is v1 or v2
console.log(manifest.chunks.length); // Full chunk list, regardless of structure
```

### Backward Compatibility

- v2 code reads v1 manifests without any changes.
- v1 manifests (chunk count below threshold) continue to use the flat format.
- The `version` field defaults to `1` for existing manifests.

---

## 13. Architecture

`git-cas` follows a hexagonal (ports and adapters) architecture. The domain
logic in `CasService` has zero direct dependencies on Node.js, Git, or any
Expand Down Expand Up @@ -824,6 +1028,7 @@ class CryptoPort {
encryptBuffer(buffer, key) {} // Returns { buf, meta }
decryptBuffer(buffer, key, meta) {} // Returns Buffer
createEncryptionStream(key) {} // Returns { encrypt, finalize }
deriveKey(options) {} // Returns { key, salt, params } (v2.0.0)
}
```

Expand Down Expand Up @@ -889,7 +1094,7 @@ const cas = new ContentAddressableStore({

---

## 11. Codec System
## 14. Codec System

### JSON Codec

Expand Down Expand Up @@ -978,7 +1183,7 @@ The manifest will be stored in the tree as `manifest.msgpack`.

---

## 12. Error Handling
## 15. Error Handling

All errors thrown by `git-cas` are instances of `CasError`, which extends
`Error` with two additional properties:
Expand Down Expand Up @@ -1061,7 +1266,7 @@ try {

---

## 13. FAQ / Troubleshooting
## 16. FAQ / Troubleshooting

### Q: Does this work with bare repositories?

Expand Down Expand Up @@ -1175,7 +1380,7 @@ Every Git plumbing command is wrapped in a policy from `@git-stunts/alfred`.
The default policy applies a 30-second timeout and retries up to 2 times with
exponential backoff (100ms, then up to 2s). This handles transient filesystem
errors and lock contention gracefully. You can override the policy at
construction time (see Section 10).
construction time (see Section 13).

---

Expand Down
21 changes: 21 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,26 @@ We use the object database.
- **Dedupe for free** Git already hashes objects. We just lean into it.
- **Chunked storage** big files become stable, reusable blobs.
- **Optional AES-256-GCM encryption** store secrets without leaking plaintext into the ODB.
- **Compression** gzip before encryption — smaller blobs, same round-trip.
- **Passphrase encryption** derive keys from passphrases via PBKDF2 or scrypt — no raw key management.
- **Merkle manifests** large files auto-split into sub-manifests for scalability.
- **Manifests** a tiny explicit index of chunks + metadata (JSON/CBOR).
- **Tree output** generates standard Git trees so assets snap into commits cleanly.
- **Full round-trip** store, tree, and restore — get your bytes back, verified.
- **Lifecycle management** `readManifest`, `deleteAsset`, `findOrphanedChunks` — inspect trees, plan deletions, audit storage.

**Use it for:** binary assets, build artifacts, model weights, data packs, secret bundles, weird experiments, etc.

## What's new in v2.0.0

**Compression** — `compression: { algorithm: 'gzip' }` on `store()`. Compression runs before encryption. Decompression on `restore()` is automatic.

**Passphrase-based encryption** — Pass `passphrase` instead of `encryptionKey`. Keys are derived via PBKDF2 (default) or scrypt. KDF parameters are stored in the manifest for deterministic re-derivation. Use `deriveKey()` directly for manual control.

**Merkle tree manifests** — When chunk count exceeds `merkleThreshold` (default: 1000), manifests are automatically split into sub-manifests stored as separate blobs. `readManifest()` transparently reconstitutes them. Full backward compatibility with v1 manifests.

See [CHANGELOG.md](./CHANGELOG.md) for the full list of changes.

## Usage (Node API)

```js
Expand Down Expand Up @@ -56,6 +69,14 @@ const m = await cas.readManifest({ treeOid });
// Lifecycle: inspect deletion impact, find orphaned chunks
const { slug, chunksOrphaned } = await cas.deleteAsset({ treeOid });
const { referenced, total } = await cas.findOrphanedChunks({ treeOids: [treeOid] });

// v2.0.0: Compressed + passphrase-encrypted store
const manifest2 = await cas.storeFile({
filePath: './image.png',
slug: 'my-image',
passphrase: 'my secret passphrase',
compression: { algorithm: 'gzip' },
});
```

## CLI (git plugin)
Expand Down
4 changes: 2 additions & 2 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,7 @@ Return and throw semantics for every public method (current and planned).
| v1.4.0 | M4 | Compass | Lifecycle management | ✅ |
| v1.5.0 | M5 | Sonar | Observability | ✅ |
| v1.6.0 | M6 | Cartographer | Documentation | ✅ |
| v2.0.0 | M7 | Horizon | Advanced features | |
| v2.0.0 | M7 | Horizon | Advanced features | |

---

Expand Down Expand Up @@ -1461,7 +1461,7 @@ As a new user, I want runnable examples so I can integrate quickly and correctly

---

# M7 — Horizon (v2.0.0)
# M7 — Horizon (v2.0.0)
**Theme:** Advanced capabilities that may change manifest format; major version bump.

---
Expand Down
Loading