diff --git a/CHANGELOG.md b/CHANGELOG.md index c793d79..20df9cc 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,7 +5,7 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). -## [Unreleased] — M4 Compass +## [Unreleased] — M4 Compass + M5 Sonar + M6 Cartographer ### Added - `CasService.readManifest({ treeOid })` — reads a Git tree, locates and decodes the manifest, returns a validated `Manifest` value object. @@ -14,6 +14,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - Facade pass-throughs for `readManifest`, `deleteAsset`, and `findOrphanedChunks` on `ContentAddressableStore`. - New error codes: `MANIFEST_NOT_FOUND`, `GIT_ERROR`. - 42 new unit tests across three new test suites. +- `CasService` now extends `EventEmitter` with lifecycle events: + `chunk:stored`, `chunk:restored`, `file:stored`, `file:restored`, + `integrity:pass`, `integrity:fail`, and `error` (guarded). +- Comprehensive benchmark suite (`test/benchmark/cas.bench.js`) covering + store, restore, encrypt/decrypt, createTree, verifyIntegrity, and + JsonCodec vs CborCodec at multiple data sizes. +- 14 new unit tests for EventEmitter integration. +- `docs/API.md` — full API reference for all public methods, events, value objects, ports, and error codes. +- `docs/SECURITY.md` — threat model, AES-256-GCM design, key handling, limitations. +- `GUIDE.md` — progressive-disclosure guide from zero knowledge to mastery. +- `examples/` directory with runnable scripts: `store-and-restore.js`, `encrypted-workflow.js`, `progress-tracking.js`. +- ESLint config now ignores `examples/` directory (runnable scripts use `console.log`). ## [1.3.0] — M3 Launchpad (2026-02-06) diff --git a/GUIDE.md b/GUIDE.md new file mode 100644 index 0000000..ce8eecf --- /dev/null +++ b/GUIDE.md @@ -0,0 +1,1173 @@ +# git-cas: The Complete Guide + +A progressive guide to content-addressed storage backed by Git. Every section +builds on the same running example -- storing, managing, and restoring a photo +called `vacation.jpg` under the slug `photos/vacation` -- so you can follow +along from first principles to full mastery. + +--- + +## Table of Contents + +1. [What is git-cas?](#1-what-is-git-cas) +2. [Quick Start](#2-quick-start) +3. [Core Concepts](#3-core-concepts) +4. [Storing Files](#4-storing-files) +5. [Restoring Files](#5-restoring-files) +6. [Encryption](#6-encryption) +7. [The CLI](#7-the-cli) +8. [Lifecycle Management](#8-lifecycle-management) +9. [Observability](#9-observability) +10. [Architecture](#10-architecture) +11. [Codec System](#11-codec-system) +12. [Error Handling](#12-error-handling) +13. [FAQ / Troubleshooting](#13-faq--troubleshooting) + +--- + +## 1. What is git-cas? + +Git is, at its core, a content-addressed object database. Every object -- +blob, tree, commit, tag -- is stored by the SHA-1 hash of its content. When +two files share the same bytes, Git stores them once. `git-cas` takes this +property seriously: it turns Git's object database into a general-purpose +content-addressed storage (CAS) system for arbitrary binary files. + +The problem `git-cas` solves is straightforward. You have large binary assets +-- images, model weights, data packs, build artifacts, encrypted secret +bundles -- and you want to store them in a way that is deterministic, +deduplicated, integrity-verified, and committable. Git LFS solves this by +moving blobs to an external server, but that introduces a separate +infrastructure dependency and breaks the self-contained nature of a Git +repository. `git-cas` keeps everything inside Git's own object database. + +The approach works as follows. A file is split into fixed-size chunks, each +chunk is written as a Git blob via `git hash-object -w`, and a manifest +(a small JSON or CBOR document listing every chunk's hash, size, and blob OID) +is written alongside them into a Git tree via `git mktree`. That tree OID can +then be committed, tagged, or referenced like any other Git object. Restoring +the file means reading the tree, parsing the manifest, fetching each blob, +verifying SHA-256 digests, and concatenating the bytes back together. Optional +AES-256-GCM encryption can be applied before chunking, so ciphertext is what +lands in the object database -- plaintext never touches disk or the ODB. + +--- + +## 2. Quick Start + +### Prerequisites + +- Node.js >= 22.0.0 (Bun and Deno are also supported) +- A Git repository (bare or working tree) + +### Install + +```bash +npm install @git-stunts/git-cas @git-stunts/plumbing +``` + +### Minimal Working Example + +```js +import GitPlumbing from '@git-stunts/plumbing'; +import ContentAddressableStore from '@git-stunts/git-cas'; + +// Point at a Git repository +const git = new GitPlumbing({ cwd: './my-repo' }); +const cas = new ContentAddressableStore({ plumbing: git }); + +// Store vacation.jpg under the slug "photos/vacation" +const manifest = await cas.storeFile({ + filePath: './vacation.jpg', + slug: 'photos/vacation', +}); + +console.log(manifest.slug); // "photos/vacation" +console.log(manifest.filename); // "vacation.jpg" +console.log(manifest.size); // total bytes stored +console.log(manifest.chunks.length); // number of chunks + +// Create a Git tree from the manifest +const treeOid = await cas.createTree({ manifest }); +console.log(treeOid); // e.g. "a1b2c3d4..." + +// Restore the file later +await cas.restoreFile({ manifest, outputPath: './restored.jpg' }); +``` + +That is the full round-trip: store, tree, restore. The rest of this guide +unpacks what happens at each step. + +--- + +## 3. Core Concepts + +### Slugs + +A slug is a logical identifier for your asset. It is a freeform, non-empty +string -- typically a path-like name such as `photos/vacation` or +`models/v3-weights`. The slug is stored inside the manifest and is how you +refer to the asset in your application logic. It does not affect where +data lives in Git's object database. + +### Chunks + +Large files are split into fixed-size pieces called chunks. Each chunk is +stored as a Git blob. A chunk has four properties: + +| Field | Type | Description | +|---------|--------|----------------------------------------------| +| `index` | number | Zero-based position in the file | +| `size` | number | Byte length of this chunk | +| `digest`| string | SHA-256 hex digest of the chunk's raw bytes | +| `blob` | string | Git OID (the SHA-1 hash Git uses to store it) | + +Because Git is itself content-addressed, if two chunks happen to contain +identical bytes, Git stores them only once. This gives you deduplication +for free. + +### Manifests + +A manifest is the index that ties everything together. After storing +`vacation.jpg`, the manifest looks like this: + +```json +{ + "slug": "photos/vacation", + "filename": "vacation.jpg", + "size": 524288, + "chunks": [ + { + "index": 0, + "size": 262144, + "digest": "e3b0c44298fc1c149afbf4c8996fb924...", + "blob": "a1b2c3d4e5f6..." + }, + { + "index": 1, + "size": 262144, + "digest": "d7a8fbb307d7809469ca9abcb0082e4f...", + "blob": "f6e5d4c3b2a1..." + } + ] +} +``` + +Manifests are immutable value objects validated by a Zod schema at +construction time. If you try to create a `Manifest` with missing or +malformed fields, an error is thrown immediately. + +When encryption is used, the manifest gains an additional `encryption` field: + +```json +{ + "slug": "photos/vacation", + "filename": "vacation.jpg", + "size": 524288, + "chunks": [ ... ], + "encryption": { + "algorithm": "aes-256-gcm", + "nonce": "base64-encoded-nonce", + "tag": "base64-encoded-auth-tag", + "encrypted": true + } +} +``` + +### Git Trees + +When you call `createTree({ manifest })`, `git-cas` serializes the manifest +using the configured codec (JSON by default), writes it as a blob, then +builds a Git tree that looks like this: + +``` +100644 blob manifest.json +100644 blob e3b0c44298fc1c149afbf4c8996fb924... +100644 blob d7a8fbb307d7809469ca9abcb0082e4f... +``` + +The tree contains one entry for the manifest file (named `manifest.json` or +`manifest.cbor` depending on the codec) and one entry per chunk, named by +its SHA-256 digest. This tree OID is a standard Git object -- you can commit +it, tag it, push it, or embed it in a larger tree. + +### Codecs + +The codec controls how the manifest is serialized before being written to +Git. Two codecs ship with `git-cas`: + +- **JsonCodec** -- human-readable, produces `manifest.json`. Default. +- **CborCodec** -- compact binary format, produces `manifest.cbor`. Smaller manifests. + +Both implement the same `CodecPort` interface: `encode(data)`, `decode(buffer)`, +and `get extension()`. + +--- + +## 4. Storing Files + +### The Store Flow + +When you call `cas.storeFile()`, the following happens: + +1. The file at `filePath` is opened as a readable stream. +2. The stream is consumed in chunks of `chunkSize` bytes (default: 256 KiB). +3. Each chunk is SHA-256 hashed and written to Git as a blob via + `git hash-object -w --stdin`. +4. A manifest is assembled from the chunk metadata. +5. The manifest is returned as a frozen `Manifest` value object. + +### Configuring Chunk Size + +The default chunk size is 256 KiB (262,144 bytes). You can change it at +construction time. The minimum is 1,024 bytes. + +```js +const cas = new ContentAddressableStore({ + plumbing: git, + chunkSize: 1024 * 1024, // 1 MiB chunks +}); +``` + +Larger chunks mean fewer Git objects but coarser deduplication. Smaller chunks +improve deduplication but increase object count and manifest size. For most +use cases, the default is a good balance. + +### Storing Our Example File + +```js +import GitPlumbing from '@git-stunts/plumbing'; +import ContentAddressableStore from '@git-stunts/git-cas'; + +const git = new GitPlumbing({ cwd: './assets-repo' }); +const cas = new ContentAddressableStore({ plumbing: git }); + +const manifest = await cas.storeFile({ + filePath: './vacation.jpg', + slug: 'photos/vacation', +}); + +// Inspect the result +console.log(`Stored ${manifest.filename} (${manifest.size} bytes)`); +console.log(`Split into ${manifest.chunks.length} chunks`); + +for (const chunk of manifest.chunks) { + console.log(` chunk[${chunk.index}]: ${chunk.size} bytes, blob ${chunk.blob}`); +} +``` + +For a 500 KiB file with the default 256 KiB chunk size, you would see two +chunks: the first at 262,144 bytes and the second at the remaining bytes. + +### Storing from an Async Iterable + +If you already have data in memory or coming from a non-file source, use +`store()` directly instead of `storeFile()`: + +```js +async function* generateData() { + yield Buffer.from('first batch of bytes...'); + yield Buffer.from('second batch of bytes...'); +} + +const manifest = await cas.store({ + source: generateData(), + slug: 'photos/vacation', + filename: 'vacation.jpg', +}); +``` + +### Creating a Git Tree + +Once you have the manifest, persist it as a Git tree: + +```js +const treeOid = await cas.createTree({ manifest }); +console.log(`Tree OID: ${treeOid}`); + +// You can now commit this tree: +// git commit-tree -m "Store vacation.jpg" +``` + +--- + +## 5. Restoring Files + +### Restoring to Disk + +Given a manifest, `restoreFile()` reads every chunk from Git, verifies each +chunk's SHA-256 digest, concatenates the buffers, and writes the result to +the specified output path. + +```js +await cas.restoreFile({ + manifest, + outputPath: './restored-vacation.jpg', +}); +// restored-vacation.jpg is now byte-identical to the original +``` + +### Restoring to a Buffer + +If you need the bytes in memory rather than on disk, use `restore()`: + +```js +const { buffer, bytesWritten } = await cas.restore({ manifest }); +console.log(`Restored ${bytesWritten} bytes into memory`); +``` + +### Byte-Level Integrity Verification + +During restore, each chunk is re-hashed with SHA-256 and compared against the +digest recorded in the manifest. If any chunk has been corrupted or tampered +with, an `INTEGRITY_ERROR` is thrown immediately: + +``` +CasError: Chunk 0 integrity check failed + code: 'INTEGRITY_ERROR' + meta: { chunkIndex: 0, expected: '...', actual: '...' } +``` + +You can also verify integrity without restoring: + +```js +const isValid = await cas.verifyIntegrity(manifest); +if (isValid) { + console.log('All chunks intact'); +} else { + console.log('Corruption detected'); +} +``` + +### Restoring from a Tree OID + +In many workflows you do not have the manifest object in memory -- you have a +Git tree OID that was committed earlier. To restore, you need to read the tree, +extract the manifest, and then restore from it: + +```js +const service = await cas.getService(); + +// Read the tree entries +const entries = await service.persistence.readTree(treeOid); + +// Find the manifest entry (named manifest.json or manifest.cbor) +const manifestEntry = entries.find(e => e.name.startsWith('manifest.')); +const manifestBlob = await service.persistence.readBlob(manifestEntry.oid); + +// Decode the manifest using the configured codec +import Manifest from '@git-stunts/git-cas/src/domain/value-objects/Manifest.js'; +const manifest = new Manifest(service.codec.decode(manifestBlob)); + +// Restore the file +await cas.restoreFile({ manifest, outputPath: './restored-vacation.jpg' }); +``` + +The CLI (Section 7) handles this entire flow with a single command. + +--- + +## 6. Encryption + +`git-cas` supports optional AES-256-GCM encryption. When enabled, the file +content is encrypted via a streaming cipher before chunking, so only +ciphertext is stored in Git's object database. Plaintext never touches the +ODB. + +### Generating a Key + +An encryption key must be exactly 32 bytes (256 bits). Generate one with +OpenSSL: + +```bash +openssl rand -out vacation.key 32 +``` + +Or in Node.js: + +```js +import { randomBytes } from 'node:crypto'; +import { writeFileSync } from 'node:fs'; + +const key = randomBytes(32); +writeFileSync('./vacation.key', key); +``` + +### Encrypted Store + +Pass the `encryptionKey` option when storing: + +```js +import { readFileSync } from 'node:fs'; + +const encryptionKey = readFileSync('./vacation.key'); + +const manifest = await cas.storeFile({ + filePath: './vacation.jpg', + slug: 'photos/vacation', + encryptionKey, +}); + +console.log(manifest.encryption); +// { +// algorithm: 'aes-256-gcm', +// nonce: 'dGhpcyBpcyBhIG5vbmNl', +// tag: 'YXV0aGVudGljYXRpb24gdGFn', +// encrypted: true +// } +``` + +The manifest now carries an `encryption` field containing the algorithm, +a base64-encoded nonce, a base64-encoded authentication tag, and a flag +indicating the content is encrypted. The nonce and tag are generated fresh +for every store operation. + +### Encrypted Restore + +To restore encrypted content, provide the same key: + +```js +await cas.restoreFile({ + manifest, + encryptionKey, + outputPath: './decrypted-vacation.jpg', +}); +// decrypted-vacation.jpg is byte-identical to the original vacation.jpg +``` + +### What Happens with the Wrong Key + +If you attempt to restore with an incorrect key, AES-256-GCM's authenticated +encryption detects the mismatch and throws: + +``` +CasError: Decryption failed: Integrity check error + code: 'INTEGRITY_ERROR' +``` + +If you attempt to restore encrypted content without providing any key at all: + +``` +CasError: Encryption key required to restore encrypted content + code: 'MISSING_KEY' +``` + +### Key Validation + +Keys must be a `Buffer` or `Uint8Array` of exactly 32 bytes. Violations +produce clear errors: + +- Non-buffer key: `INVALID_KEY_TYPE` +- Wrong length: `INVALID_KEY_LENGTH` (includes expected and actual lengths) + +### Encrypted Tree Round-Trip + +The full encrypted workflow, from store to tree to restore: + +```js +import { readFileSync } from 'node:fs'; +import GitPlumbing from '@git-stunts/plumbing'; +import ContentAddressableStore from '@git-stunts/git-cas'; + +const git = new GitPlumbing({ cwd: './assets-repo' }); +const cas = new ContentAddressableStore({ plumbing: git }); +const encryptionKey = readFileSync('./vacation.key'); + +// Store with encryption +const manifest = await cas.storeFile({ + filePath: './vacation.jpg', + slug: 'photos/vacation', + encryptionKey, +}); + +// Persist as a Git tree +const treeOid = await cas.createTree({ manifest }); + +// Later: restore from tree OID (see Section 5 for readTree pattern) +// ...pass encryptionKey to restoreFile() +``` + +--- + +## 7. The CLI + +`git-cas` installs as a Git subcommand. After installation, `git cas` is +available in any Git repository. + +### Store a File + +```bash +# Store vacation.jpg and print the manifest JSON +git cas store ./vacation.jpg --slug photos/vacation +``` + +Output (manifest JSON): + +```json +{ + "slug": "photos/vacation", + "filename": "vacation.jpg", + "size": 524288, + "chunks": [ + { + "index": 0, + "size": 262144, + "digest": "e3b0c44298fc1c149afbf4c8996fb924...", + "blob": "a1b2c3d4e5f6..." + }, + { + "index": 1, + "size": 262144, + "digest": "d7a8fbb307d7809469ca9abcb0082e4f...", + "blob": "f6e5d4c3b2a1..." + } + ] +} +``` + +### Store and Get a Tree OID + +```bash +# The --tree flag creates a tree and prints its OID instead of the manifest +git cas store ./vacation.jpg --slug photos/vacation --tree +# Output: a1b2c3d4e5f67890... +``` + +### Create a Tree from an Existing Manifest + +If you saved the manifest JSON to a file, you can create a tree from it later: + +```bash +git cas store ./vacation.jpg --slug photos/vacation > manifest.json +git cas tree --manifest manifest.json +# Output: a1b2c3d4e5f67890... +``` + +### Restore from a Tree OID + +```bash +git cas restore a1b2c3d4e5f67890... --out ./restored-vacation.jpg +# Output: 524288 (bytes written) +``` + +The `restore` command reads the tree, finds the manifest entry, decodes it, +reads and verifies all chunks, and writes the reassembled file. + +### Encrypted CLI Round-Trip + +```bash +# Generate a 32-byte key +openssl rand -out vacation.key 32 + +# Store with encryption, get a tree OID +git cas store ./vacation.jpg --slug photos/vacation --key-file ./vacation.key --tree +# Output: a1b2c3d4e5f67890... + +# Restore with the same key +git cas restore a1b2c3d4e5f67890... --out ./decrypted-vacation.jpg --key-file ./vacation.key +# Output: 524288 +``` + +### Working Directory + +By default the CLI operates in the current directory. Use `--cwd` to point at +a different repository: + +```bash +git cas store ./vacation.jpg --slug photos/vacation --cwd /path/to/assets-repo --tree +``` + +--- + +## 8. Lifecycle Management + +### Reading a Manifest from a Tree + +Given a tree OID (from a commit, tag, or ref), you can reconstruct the +manifest object: + +```js +import Manifest from '@git-stunts/git-cas/src/domain/value-objects/Manifest.js'; + +const service = await cas.getService(); +const entries = await service.persistence.readTree(treeOid); + +const manifestEntry = entries.find(e => e.name.startsWith('manifest.')); +if (!manifestEntry) { + throw new Error('No manifest found in tree'); +} + +const blob = await service.persistence.readBlob(manifestEntry.oid); +const manifest = new Manifest(service.codec.decode(blob)); + +console.log(manifest.slug); // "photos/vacation" +console.log(manifest.chunks); // array of Chunk objects +``` + +### Verifying Integrity Over Time + +Stored assets can be verified at any time without restoring them. This is +useful for periodic integrity checks or auditing: + +```js +const ok = await cas.verifyIntegrity(manifest); +if (!ok) { + console.error(`Asset ${manifest.slug} has corrupted chunks`); +} +``` + +The `verifyIntegrity` method reads each chunk blob from Git, recomputes its +SHA-256 digest, and compares it against the manifest. It emits either +`integrity:pass` or `integrity:fail` events (see Section 9). + +### Finding Orphaned Chunks + +When you store the same file multiple times with different chunk sizes, or +store overlapping files, some chunk blobs may no longer be referenced by any +manifest. Identifying these orphans is a matter of collecting all blob OIDs +referenced by your manifests and comparing them against what exists in the +tree: + +```js +const service = await cas.getService(); +const entries = await service.persistence.readTree(treeOid); + +// The manifest entry is not a chunk +const chunkEntries = entries.filter(e => !e.name.startsWith('manifest.')); + +// Cross-reference with manifest +const referencedBlobs = new Set(manifest.chunks.map(c => c.blob)); +const orphaned = chunkEntries.filter(e => !referencedBlobs.has(e.oid)); +``` + +### Working with Multiple Assets + +A common pattern is to store multiple assets and assemble their trees into +a larger Git tree structure using standard Git plumbing: + +```js +const photoManifest = await cas.storeFile({ + filePath: './vacation.jpg', + slug: 'photos/vacation', +}); +const photoTree = await cas.createTree({ manifest: photoManifest }); + +const videoManifest = await cas.storeFile({ + filePath: './clip.mp4', + slug: 'videos/clip', +}); +const videoTree = await cas.createTree({ manifest: videoManifest }); + +// Now photoTree and videoTree are standard Git tree OIDs +// You can compose them into a parent tree, commit them, etc. +``` + +--- + +## 9. Observability + +`CasService` extends `EventEmitter`. Every significant operation emits an +event you can listen to for progress tracking, logging, or monitoring. + +### Available Events + +| Event | Emitted When | Payload | +|--------------------|-------------------------------------------|----------------------------------------------------------| +| `chunk:stored` | A chunk is written to Git | `{ index, size, digest, blob }` | +| `chunk:restored` | A chunk is read back from Git | `{ index, size, digest }` | +| `file:stored` | All chunks for a file have been stored | `{ slug, size, chunkCount, encrypted }` | +| `file:restored` | A file has been fully restored | `{ slug, size, chunkCount }` | +| `integrity:pass` | All chunks pass integrity verification | `{ slug }` | +| `integrity:fail` | A chunk fails integrity verification | `{ slug, chunkIndex, expected, actual }` | +| `error` | An error occurs (guarded) | `{ code, message }` | + +The `error` event is guarded: it is only emitted if there is at least one +listener attached. This prevents unhandled `error` event crashes from +`EventEmitter`. + +### Building a Progress Bar + +```js +const service = await cas.getService(); + +let chunksStored = 0; +service.on('chunk:stored', ({ index, size }) => { + chunksStored++; + console.log(` Stored chunk ${index} (${size} bytes)`); +}); + +service.on('file:stored', ({ slug, size, chunkCount }) => { + console.log(`Finished: ${slug} -- ${size} bytes in ${chunkCount} chunks`); +}); + +// Now store -- events fire as chunks are written +const manifest = await cas.storeFile({ + filePath: './vacation.jpg', + slug: 'photos/vacation', +}); +``` + +### Monitoring Restores + +```js +service.on('chunk:restored', ({ index, size, digest }) => { + console.log(` Restored chunk ${index} (${size} bytes, digest: ${digest.slice(0, 8)}...)`); +}); + +service.on('file:restored', ({ slug, size, chunkCount }) => { + console.log(`Restored: ${slug} -- ${size} bytes from ${chunkCount} chunks`); +}); + +await cas.restoreFile({ manifest, outputPath: './restored-vacation.jpg' }); +``` + +### Logging Errors + +```js +service.on('error', ({ code, message }) => { + console.error(`[CAS ERROR] ${code}: ${message}`); +}); +``` + +### Integrity Monitoring + +```js +service.on('integrity:pass', ({ slug }) => { + console.log(`Integrity OK: ${slug}`); +}); + +service.on('integrity:fail', ({ slug, chunkIndex, expected, actual }) => { + console.error(`CORRUPT: ${slug} chunk ${chunkIndex}`); + console.error(` expected: ${expected}`); + console.error(` actual: ${actual}`); +}); + +await cas.verifyIntegrity(manifest); +``` + +--- + +## 10. Architecture + +`git-cas` follows a hexagonal (ports and adapters) architecture. The domain +logic in `CasService` has zero direct dependencies on Node.js, Git, or any +specific crypto library. All platform-specific behavior is injected through +ports. + +### Layers + +``` +Facade (ContentAddressableStore) + | + +-- Domain Layer + | +-- CasService (core logic, EventEmitter) + | +-- Manifest (value object, Zod-validated) + | +-- Chunk (value object, Zod-validated) + | +-- CasError (structured errors) + | +-- ManifestSchema (Zod schemas) + | + +-- Ports (interfaces) + | +-- GitPersistencePort (writeBlob, writeTree, readBlob, readTree) + | +-- CodecPort (encode, decode, extension) + | +-- CryptoPort (sha256, randomBytes, encryptBuffer, decryptBuffer, createEncryptionStream) + | + +-- Infrastructure (adapters) + +-- GitPersistenceAdapter (Git plumbing commands) + +-- JsonCodec (JSON serialization) + +-- CborCodec (CBOR serialization) + +-- NodeCryptoAdapter (node:crypto) + +-- BunCryptoAdapter (Bun.CryptoHasher) + +-- WebCryptoAdapter (crypto.subtle) +``` + +### Ports + +Each port is an abstract base class with methods that throw `Not implemented`. +Adapters extend these classes and provide concrete implementations. + +**GitPersistencePort** -- the storage interface: + +```js +class GitPersistencePort { + async writeBlob(content) {} // Returns Git OID + async writeTree(entries) {} // Returns tree OID + async readBlob(oid) {} // Returns Buffer + async readTree(treeOid) {} // Returns array of tree entries +} +``` + +**CodecPort** -- the serialization interface: + +```js +class CodecPort { + encode(data) {} // Returns Buffer or string + decode(buffer) {} // Returns object + get extension() {} // Returns 'json', 'cbor', etc. +} +``` + +**CryptoPort** -- the cryptographic operations interface: + +```js +class CryptoPort { + sha256(buf) {} // Returns hex digest + randomBytes(n) {} // Returns Buffer + encryptBuffer(buffer, key) {} // Returns { buf, meta } + decryptBuffer(buffer, key, meta) {} // Returns Buffer + createEncryptionStream(key) {} // Returns { encrypt, finalize } +} +``` + +### Writing a Custom Persistence Adapter + +To store chunks somewhere other than Git (e.g., S3, a database, or the local +filesystem), implement `GitPersistencePort`: + +```js +import GitPersistencePort from '@git-stunts/git-cas/src/ports/GitPersistencePort.js'; + +class S3PersistenceAdapter extends GitPersistencePort { + async writeBlob(content) { + const hash = computeHash(content); + await s3.putObject({ Key: hash, Body: content }); + return hash; + } + + async readBlob(oid) { + const response = await s3.getObject({ Key: oid }); + return Buffer.from(await response.Body.transformToByteArray()); + } + + async writeTree(entries) { + // Implement tree assembly for your storage backend + } + + async readTree(treeOid) { + // Implement tree reading for your storage backend + } +} +``` + +Then inject it: + +```js +import CasService from '@git-stunts/git-cas/service'; + +const service = new CasService({ + persistence: new S3PersistenceAdapter(), + codec: new JsonCodec(), + crypto: new NodeCryptoAdapter(), +}); +``` + +### Resilience Policy + +The `GitPersistenceAdapter` wraps every Git command in a resilience policy +(provided by `@git-stunts/alfred`). The default policy is a 30-second timeout +wrapping an exponential-backoff retry (2 retries, 100ms initial delay, 2s max +delay). You can override this: + +```js +import { Policy } from '@git-stunts/alfred'; + +const cas = new ContentAddressableStore({ + plumbing: git, + policy: Policy.timeout(60_000).wrap( + Policy.retry({ retries: 5, backoff: 'exponential', delay: 200 }) + ), +}); +``` + +--- + +## 11. Codec System + +### JSON Codec + +The default codec. Produces human-readable manifest files with pretty-printed +indentation. + +```js +import { JsonCodec } from '@git-stunts/git-cas'; + +const codec = new JsonCodec(); +const encoded = codec.encode({ slug: 'photos/vacation', chunks: [] }); +// '{\n "slug": "photos/vacation",\n "chunks": []\n}' + +codec.extension; // 'json' +``` + +Manifests are stored in the tree as `manifest.json`. + +### CBOR Codec + +A binary codec that produces smaller manifests. Useful when you are storing +many assets and want to minimize overhead, or when the manifest does not +need to be human-readable. + +```js +import { CborCodec } from '@git-stunts/git-cas'; + +const cas = new ContentAddressableStore({ + plumbing: git, + codec: new CborCodec(), +}); + +// Or use the factory method: +const cas2 = ContentAddressableStore.createCbor({ plumbing: git }); +``` + +Manifests are stored in the tree as `manifest.cbor`. + +### When to Use Which + +| Consideration | JSON | CBOR | +|-----------------------|--------------------|---------------------| +| Human-readable | Yes | No | +| Manifest size | Larger | Smaller | +| Debugging ease | Easy to inspect | Requires tooling | +| Parse performance | Good | Slightly better | +| Default | Yes | No | + +For most use cases, JSON is the right choice. Switch to CBOR if you are +storing thousands of assets and the manifest size difference matters, or if +you are in a pipeline where human readability is irrelevant. + +### Implementing a Custom Codec + +To implement your own codec (e.g., MessagePack, Protobuf), extend `CodecPort`: + +```js +import CodecPort from '@git-stunts/git-cas/src/ports/CodecPort.js'; +import msgpack from 'msgpack-lite'; + +class MsgPackCodec extends CodecPort { + encode(data) { + return msgpack.encode(data); + } + + decode(buffer) { + return msgpack.decode(buffer); + } + + get extension() { + return 'msgpack'; + } +} +``` + +Then pass it to the constructor: + +```js +const cas = new ContentAddressableStore({ + plumbing: git, + codec: new MsgPackCodec(), +}); +``` + +The manifest will be stored in the tree as `manifest.msgpack`. + +--- + +## 12. Error Handling + +All errors thrown by `git-cas` are instances of `CasError`, which extends +`Error` with two additional properties: + +- `code` -- a machine-readable string identifier +- `meta` -- an object with additional context + +### Error Codes Reference + +| Code | Meaning | Typical `meta` | +|----------------------|------------------------------------------------------|----------------------------------------------| +| `INVALID_KEY_TYPE` | Encryption key is not a Buffer or Uint8Array | -- | +| `INVALID_KEY_LENGTH` | Encryption key is not 32 bytes | `{ expected: 32, actual: N }` | +| `MISSING_KEY` | Encrypted content restored without a key | -- | +| `INTEGRITY_ERROR` | Chunk digest mismatch or decryption auth failure | `{ chunkIndex, expected, actual }` or `{ originalError }` | +| `STREAM_ERROR` | Error reading from source stream during store | `{ chunksWritten, originalError }` | +| `TREE_PARSE_ERROR` | Malformed `ls-tree` output from Git | `{ rawEntry }` | + +### Catching and Handling Errors + +```js +import { CasError } from '@git-stunts/git-cas/src/domain/errors/CasError.js'; + +try { + await cas.restoreFile({ + manifest, + outputPath: './restored.jpg', + // Oops, forgot the encryption key + }); +} catch (err) { + if (err.code === 'MISSING_KEY') { + console.error('This asset is encrypted. Please provide the encryption key.'); + } else if (err.code === 'INTEGRITY_ERROR') { + console.error('Data corruption detected:', err.meta); + } else { + throw err; // unexpected error, re-throw + } +} +``` + +### Structured Error Pattern + +Because every `CasError` has a `code`, you can build exhaustive error +handlers: + +```js +function handleCasError(err) { + switch (err.code) { + case 'INVALID_KEY_TYPE': + case 'INVALID_KEY_LENGTH': + return { status: 400, message: 'Invalid encryption key' }; + case 'MISSING_KEY': + return { status: 401, message: 'Encryption key required' }; + case 'INTEGRITY_ERROR': + return { status: 500, message: 'Data integrity check failed' }; + case 'STREAM_ERROR': + return { status: 502, message: `Stream failed after ${err.meta.chunksWritten} chunks` }; + case 'TREE_PARSE_ERROR': + return { status: 500, message: 'Corrupted Git tree' }; + default: + return { status: 500, message: err.message }; + } +} +``` + +### Manifest Validation Errors + +Constructing a `Manifest` or `Chunk` with invalid data throws a plain `Error` +(not a `CasError`) with a descriptive message from Zod validation: + +```js +import Manifest from '@git-stunts/git-cas/src/domain/value-objects/Manifest.js'; + +try { + new Manifest({ slug: '', filename: 'test.jpg', size: 0, chunks: [] }); +} catch (err) { + // Error: Invalid manifest data: String must contain at least 1 character(s) +} +``` + +--- + +## 13. FAQ / Troubleshooting + +### Q: Does this work with bare repositories? + +Yes. `git-cas` uses Git plumbing commands (`hash-object`, `mktree`, `cat-file`, +`ls-tree`) that work identically in bare and non-bare repositories. Point +`GitPlumbing` at the bare repo path. + +### Q: What happens if I store the same file twice? + +You get two manifests, but Git deduplicates the underlying blobs. If the file +content has not changed, the blob OIDs will be identical. You are not wasting +storage. + +### Q: Can I change the chunk size after storing? + +Yes, but the new store will produce different chunks and different blob OIDs. +The old manifest remains valid -- its chunks are still in Git. You will have +two sets of blobs: one for each chunk size. + +### Q: Is the encryption key stored anywhere? + +No. The manifest stores only the algorithm, nonce, and authentication tag. +The key is never stored in Git. If you lose the key, you cannot decrypt the +content. Treat your key files like any other secret. + +### Q: What encryption algorithm is used? + +AES-256-GCM (Galois/Counter Mode). This is an authenticated encryption +algorithm -- it provides both confidentiality and integrity. The authentication +tag in the manifest ensures that any tampering with the ciphertext is detected +during decryption. + +### Q: Can I use this with Bun or Deno? + +Yes. `git-cas` v1.3.0+ includes runtime detection that automatically selects +the appropriate crypto adapter: + +- **Node.js**: `NodeCryptoAdapter` (uses `node:crypto`) +- **Bun**: `BunCryptoAdapter` (uses `Bun.CryptoHasher`) +- **Deno**: `WebCryptoAdapter` (uses `crypto.subtle`) + +### Q: How do I commit a tree OID? + +Use standard Git plumbing: + +```bash +TREE_OID=$(git cas store ./vacation.jpg --slug photos/vacation --tree) +COMMIT_OID=$(git commit-tree "$TREE_OID" -m "Store vacation.jpg") +git update-ref refs/heads/assets "$COMMIT_OID" +``` + +### Q: What is the maximum file size? + +There is no hard limit imposed by `git-cas`. The practical limit is determined +by your Git repository's object database and available memory. Files are +streamed in chunks, so memory usage is proportional to `chunkSize`, not to +file size. However, the restore operation currently concatenates all chunks +into a single buffer, so restoring very large files requires enough memory +to hold the entire file. + +### Q: I get "Chunk size must be at least 1024 bytes" + +The minimum chunk size is 1 KiB. This prevents pathologically small chunks +that would create excessive Git objects. Increase your `chunkSize` parameter. + +### Q: I get "Encryption key must be 32 bytes, got N" + +AES-256 requires exactly a 256-bit (32-byte) key. Ensure your key file +contains exactly 32 raw bytes. A common mistake is to store the key as a +hex string (64 characters) rather than raw bytes. + +```bash +# Correct: 32 raw bytes +openssl rand -out my.key 32 + +# Wrong: this creates a hex-encoded file (64 bytes of ASCII) +openssl rand -hex 32 > my.key +``` + +### Q: The manifest JSON contains "blob" OIDs -- what are those? + +The `blob` field in each chunk is the Git SHA-1 OID returned by +`git hash-object -w`. It is the address of that chunk in Git's object +database. You can inspect any chunk directly: + +```bash +git cat-file blob | sha256sum +``` + +The output should match the `digest` field in the manifest. + +### Q: Can I use git-cas in a CI/CD pipeline? + +Yes. A typical pattern: + +```bash +# In your build step: +TREE=$(git cas store ./dist/artifact.tar.gz --slug builds/latest --tree) +git commit-tree "$TREE" -p HEAD -m "Build $(date +%s)" | xargs git update-ref refs/builds/latest +git push origin refs/builds/latest + +# In your deploy step: +git fetch origin refs/builds/latest +TREE=$(git log -1 --format='%T' FETCH_HEAD) +git cas restore "$TREE" --out ./artifact.tar.gz +``` + +### Q: How does the resilience policy work? + +Every Git plumbing command is wrapped in a policy from `@git-stunts/alfred`. +The default policy applies a 30-second timeout and retries up to 2 times with +exponential backoff (100ms, then up to 2s). This handles transient filesystem +errors and lock contention gracefully. You can override the policy at +construction time (see Section 10). + +--- + +*Copyright 2026 James Ross. Licensed under Apache-2.0.* diff --git a/ROADMAP.md b/ROADMAP.md index 59d79bc..8fc3d2b 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -126,15 +126,15 @@ Return and throw semantics for every public method (current and planned). ## 4) Version Plan -| Version | Milestone | Codename | Theme | -|--------:|-----------|----------|-------| +| Version | Milestone | Codename | Theme | Status | +|--------:|-----------|----------|-------|--------| | v1.1.0 | M1 | Bedrock | Foundation hardening | ✅ | | v1.2.0 | M2 | Boomerang| File retrieval round trip + CLI | ✅ | | v1.3.0 | M3 | Launchpad| CI/CD pipeline | ✅ | | v1.4.0 | M4 | Compass | Lifecycle management | ✅ | -| v1.5.0 | M5 | Sonar | Observability | -| v1.6.0 | M6 | Cartographer | Documentation | -| v2.0.0 | M7 | Horizon | Advanced features | +| v1.5.0 | M5 | Sonar | Observability | ✅ | +| v1.6.0 | M6 | Cartographer | Documentation | ✅ | +| v2.0.0 | M7 | Horizon | Advanced features | | --- @@ -1175,12 +1175,12 @@ As an operator, I want to identify referenced chunks across many assets so I can --- -# M5 — Sonar (v1.5.0) +# M5 — Sonar (v1.5.0) ✅ **Theme:** Events, hooks, and benchmarks. --- -## Task 5.1: EventEmitter integration (progress + observability hooks) +## Task 5.1: EventEmitter integration (progress + observability hooks) ✅ **User Story** As an application developer, I want progress and lifecycle events so I can build logging, progress bars, and monitoring. @@ -1240,7 +1240,7 @@ As an application developer, I want progress and lifecycle events so I can build --- -## Task 5.2: Comprehensive benchmark suite +## Task 5.2: Comprehensive benchmark suite ✅ **User Story** As a maintainer, I want benchmarks for critical operations so I can detect regressions and make optimization decisions. @@ -1298,12 +1298,12 @@ As a maintainer, I want benchmarks for critical operations so I can detect regre --- -# M6 — Cartographer (v1.6.0) +# M6 — Cartographer (v1.6.0) ✅ **Theme:** Documentation that makes the library usable and trustworthy. --- -## Task 6.1: API reference documentation +## Task 6.1: API reference documentation ✅ **User Story** As a developer evaluating this library, I want complete API docs so I can integrate without reading source. @@ -1353,7 +1353,7 @@ As a developer evaluating this library, I want complete API docs so I can integr --- -## Task 6.2: Security model documentation +## Task 6.2: Security model documentation ✅ **User Story** As a security reviewer, I want a clear threat model and crypto design description so I can assess safety and limitations. @@ -1407,7 +1407,7 @@ As a security reviewer, I want a clear threat model and crypto design descriptio --- -## Task 6.3: Usage examples (cookbook) +## Task 6.3: Usage examples (cookbook) ✅ **User Story** As a new user, I want runnable examples so I can integrate quickly and correctly. diff --git a/docs/API.md b/docs/API.md new file mode 100644 index 0000000..5a9d758 --- /dev/null +++ b/docs/API.md @@ -0,0 +1,1014 @@ +# API Reference + +This document provides the complete API reference for git-cas. + +## Table of Contents + +1. [ContentAddressableStore](#contentaddressablestore) +2. [CasService](#casservice) +3. [Events](#events) +4. [Value Objects](#value-objects) +5. [Ports](#ports) +6. [Codecs](#codecs) +7. [Error Codes](#error-codes) + +## ContentAddressableStore + +The main facade class providing high-level API for content-addressable storage. + +### Constructor + +```javascript +new ContentAddressableStore(options) +``` + +**Parameters:** + +- `options.plumbing` (required): Plumbing instance from `@git-stunts/plumbing` +- `options.chunkSize` (optional): Chunk size in bytes (default: 262144 / 256 KiB) +- `options.codec` (optional): CodecPort implementation (default: JsonCodec) +- `options.crypto` (optional): CryptoPort implementation (default: auto-detected) +- `options.policy` (optional): Resilience policy from `@git-stunts/alfred` for Git I/O + +**Example:** + +```javascript +import ContentAddressableStore from 'git-cas'; +import Plumbing from '@git-stunts/plumbing'; + +const plumbing = await Plumbing.create({ repoPath: '/path/to/repo' }); +const cas = new ContentAddressableStore({ plumbing }); +``` + +### Factory Methods + +#### createJson + +```javascript +ContentAddressableStore.createJson({ plumbing, chunkSize, policy }) +``` + +Creates a CAS instance with JSON codec. + +**Parameters:** + +- `plumbing` (required): Plumbing instance +- `chunkSize` (optional): Chunk size in bytes +- `policy` (optional): Resilience policy + +**Returns:** `ContentAddressableStore` + +**Example:** + +```javascript +const cas = ContentAddressableStore.createJson({ plumbing }); +``` + +#### createCbor + +```javascript +ContentAddressableStore.createCbor({ plumbing, chunkSize, policy }) +``` + +Creates a CAS instance with CBOR codec. + +**Parameters:** + +- `plumbing` (required): Plumbing instance +- `chunkSize` (optional): Chunk size in bytes +- `policy` (optional): Resilience policy + +**Returns:** `ContentAddressableStore` + +**Example:** + +```javascript +const cas = ContentAddressableStore.createCbor({ plumbing }); +``` + +### Methods + +#### getService + +```javascript +await cas.getService() +``` + +Lazily initializes and returns the underlying CasService instance. + +**Returns:** `Promise` + +**Example:** + +```javascript +const service = await cas.getService(); +``` + +#### store + +```javascript +await cas.store({ source, slug, filename, encryptionKey }) +``` + +Stores content from an async iterable source. + +**Parameters:** + +- `source` (required): `AsyncIterable` - Content stream +- `slug` (required): `string` - Unique identifier for the asset +- `filename` (required): `string` - Original filename +- `encryptionKey` (optional): `Buffer` - 32-byte encryption key + +**Returns:** `Promise` + +**Throws:** + +- `CasError` with code `INVALID_KEY_TYPE` if encryptionKey is not a Buffer +- `CasError` with code `INVALID_KEY_LENGTH` if encryptionKey is not 32 bytes +- `CasError` with code `STREAM_ERROR` if the source stream fails + +**Example:** + +```javascript +import { createReadStream } from 'node:fs'; + +const stream = createReadStream('/path/to/file.txt'); +const manifest = await cas.store({ + source: stream, + slug: 'my-asset', + filename: 'file.txt' +}); +``` + +#### storeFile + +```javascript +await cas.storeFile({ filePath, slug, filename, encryptionKey }) +``` + +Convenience method that opens a file and stores it. + +**Parameters:** + +- `filePath` (required): `string` - Path to file +- `slug` (required): `string` - Unique identifier for the asset +- `filename` (optional): `string` - Filename (defaults to basename of filePath) +- `encryptionKey` (optional): `Buffer` - 32-byte encryption key + +**Returns:** `Promise` + +**Throws:** Same as `store()` + +**Example:** + +```javascript +const manifest = await cas.storeFile({ + filePath: '/path/to/file.txt', + slug: 'my-asset' +}); +``` + +#### restore + +```javascript +await cas.restore({ manifest, encryptionKey }) +``` + +Restores content from a manifest and returns the buffer. + +**Parameters:** + +- `manifest` (required): `Manifest` - Manifest object +- `encryptionKey` (optional): `Buffer` - 32-byte encryption key (required if content is encrypted) + +**Returns:** `Promise<{ buffer: Buffer, bytesWritten: number }>` + +**Throws:** + +- `CasError` with code `MISSING_KEY` if content is encrypted but no key provided +- `CasError` with code `INVALID_KEY_TYPE` if encryptionKey is not a Buffer +- `CasError` with code `INVALID_KEY_LENGTH` if encryptionKey is not 32 bytes +- `CasError` with code `INTEGRITY_ERROR` if chunk digest verification fails +- `CasError` with code `INTEGRITY_ERROR` if decryption fails + +**Example:** + +```javascript +const { buffer, bytesWritten } = await cas.restore({ manifest }); +``` + +#### restoreFile + +```javascript +await cas.restoreFile({ manifest, encryptionKey, outputPath }) +``` + +Restores content from a manifest and writes it to a file. + +**Parameters:** + +- `manifest` (required): `Manifest` - Manifest object +- `encryptionKey` (optional): `Buffer` - 32-byte encryption key +- `outputPath` (required): `string` - Path to write the restored file + +**Returns:** `Promise<{ bytesWritten: number }>` + +**Throws:** Same as `restore()` + +**Example:** + +```javascript +await cas.restoreFile({ + manifest, + outputPath: '/path/to/output.txt' +}); +``` + +#### createTree + +```javascript +await cas.createTree({ manifest }) +``` + +Creates a Git tree object from a manifest. + +**Parameters:** + +- `manifest` (required): `Manifest` - Manifest object + +**Returns:** `Promise` - Git tree OID + +**Example:** + +```javascript +const treeOid = await cas.createTree({ manifest }); +``` + +#### verifyIntegrity + +```javascript +await cas.verifyIntegrity(manifest) +``` + +Verifies the integrity of stored content by re-hashing all chunks. + +**Parameters:** + +- `manifest` (required): `Manifest` - Manifest object + +**Returns:** `Promise` - True if all chunks pass verification + +**Example:** + +```javascript +const isValid = await cas.verifyIntegrity(manifest); +if (!isValid) { + console.log('Integrity check failed'); +} +``` + +#### encrypt + +```javascript +await cas.encrypt({ buffer, key }) +``` + +Encrypts a buffer using AES-256-GCM. + +**Parameters:** + +- `buffer` (required): `Buffer` - Data to encrypt +- `key` (required): `Buffer` - 32-byte encryption key + +**Returns:** `Promise<{ buf: Buffer, meta: Object }>` + +**Throws:** + +- `CasError` with code `INVALID_KEY_TYPE` if key is not a Buffer +- `CasError` with code `INVALID_KEY_LENGTH` if key is not 32 bytes + +**Example:** + +```javascript +const { buf, meta } = await cas.encrypt({ + buffer: Buffer.from('secret data'), + key: crypto.randomBytes(32) +}); +``` + +#### decrypt + +```javascript +await cas.decrypt({ buffer, key, meta }) +``` + +Decrypts a buffer using AES-256-GCM. + +**Parameters:** + +- `buffer` (required): `Buffer` - Encrypted data +- `key` (required): `Buffer` - 32-byte encryption key +- `meta` (required): `Object` - Encryption metadata (from encrypt result) + +**Returns:** `Promise` - Decrypted data + +**Throws:** + +- `CasError` with code `INTEGRITY_ERROR` if decryption fails + +**Example:** + +```javascript +const decrypted = await cas.decrypt({ buffer: buf, key, meta }); +``` + +### Properties + +#### chunkSize + +```javascript +cas.chunkSize +``` + +Returns the configured chunk size in bytes. + +**Type:** `number` + +**Example:** + +```javascript +console.log(cas.chunkSize); // 262144 +``` + +## CasService + +Core domain service implementing CAS operations. Usually accessed via ContentAddressableStore, but can be used directly for advanced scenarios. + +### Constructor + +```javascript +new CasService({ persistence, codec, crypto, chunkSize }) +``` + +**Parameters:** + +- `persistence` (required): `GitPersistencePort` implementation +- `codec` (required): `CodecPort` implementation +- `crypto` (required): `CryptoPort` implementation +- `chunkSize` (optional): `number` - Chunk size in bytes (default: 262144, minimum: 1024) + +**Throws:** `Error` if chunkSize is less than 1024 bytes + +**Example:** + +```javascript +import CasService from 'git-cas/src/domain/services/CasService.js'; +import GitPersistenceAdapter from 'git-cas/src/infrastructure/adapters/GitPersistenceAdapter.js'; +import JsonCodec from 'git-cas/src/infrastructure/codecs/JsonCodec.js'; +import NodeCryptoAdapter from 'git-cas/src/infrastructure/adapters/NodeCryptoAdapter.js'; + +const service = new CasService({ + persistence: new GitPersistenceAdapter({ plumbing }), + codec: new JsonCodec(), + crypto: new NodeCryptoAdapter(), + chunkSize: 512 * 1024 +}); +``` + +### Methods + +All methods from ContentAddressableStore delegate to CasService. See ContentAddressableStore documentation above for: + +- `store({ source, slug, filename, encryptionKey })` +- `restore({ manifest, encryptionKey })` +- `createTree({ manifest })` +- `verifyIntegrity(manifest)` +- `encrypt({ buffer, key })` +- `decrypt({ buffer, key, meta })` + +### EventEmitter + +CasService extends Node.js EventEmitter. See [Events](#events) section for all emitted events. + +## Events + +CasService emits the following events. Listen using standard EventEmitter API: + +```javascript +const service = await cas.getService(); +service.on('chunk:stored', (payload) => { + console.log('Chunk stored:', payload); +}); +``` + +### chunk:stored + +Emitted when a chunk is successfully stored. + +**Payload:** + +```javascript +{ + index: number, // Chunk index (0-based) + size: number, // Chunk size in bytes + digest: string, // SHA-256 hex digest (64 chars) + blob: string // Git blob OID +} +``` + +### chunk:restored + +Emitted when a chunk is successfully restored and verified. + +**Payload:** + +```javascript +{ + index: number, // Chunk index (0-based) + size: number, // Chunk size in bytes + digest: string // SHA-256 hex digest (64 chars) +} +``` + +### file:stored + +Emitted when a complete file is successfully stored. + +**Payload:** + +```javascript +{ + slug: string, // Asset slug + size: number, // Total file size in bytes + chunkCount: number, // Number of chunks + encrypted: boolean // Whether content was encrypted +} +``` + +### file:restored + +Emitted when a complete file is successfully restored. + +**Payload:** + +```javascript +{ + slug: string, // Asset slug + size: number, // Total file size in bytes + chunkCount: number // Number of chunks +} +``` + +### integrity:pass + +Emitted when integrity verification passes for all chunks. + +**Payload:** + +```javascript +{ + slug: string // Asset slug +} +``` + +### integrity:fail + +Emitted when integrity verification fails for a chunk. + +**Payload:** + +```javascript +{ + slug: string, // Asset slug + chunkIndex: number, // Failed chunk index + expected: string, // Expected SHA-256 digest + actual: string // Actual SHA-256 digest +} +``` + +### error + +Emitted when an error occurs during streaming operations (if listeners are registered). + +**Payload:** + +```javascript +{ + code: string, // CasError code + message: string // Error message +} +``` + +## Value Objects + +### Manifest + +Immutable value object representing a file manifest. + +#### Constructor + +```javascript +new Manifest(data) +``` + +**Parameters:** + +- `data.slug` (required): `string` - Unique identifier (min length: 1) +- `data.filename` (required): `string` - Original filename (min length: 1) +- `data.size` (required): `number` - Total file size in bytes (>= 0) +- `data.chunks` (required): `Array` - Chunk metadata array +- `data.encryption` (optional): `Object` - Encryption metadata + +**Throws:** `Error` if data does not match ManifestSchema + +**Example:** + +```javascript +const manifest = new Manifest({ + slug: 'my-asset', + filename: 'file.txt', + size: 1024, + chunks: [ + { + index: 0, + size: 1024, + digest: 'a'.repeat(64), + blob: 'abc123def456' + } + ] +}); +``` + +#### Fields + +- `slug`: `string` - Asset identifier +- `filename`: `string` - Original filename +- `size`: `number` - Total file size +- `chunks`: `Array` - Array of Chunk objects +- `encryption`: `Object | undefined` - Encryption metadata + +#### Methods + +##### toJSON + +```javascript +manifest.toJSON() +``` + +Returns a plain object representation suitable for serialization. + +**Returns:** `Object` + +**Example:** + +```javascript +const json = manifest.toJSON(); +console.log(JSON.stringify(json, null, 2)); +``` + +### Chunk + +Immutable value object representing a content chunk. + +#### Constructor + +```javascript +new Chunk(data) +``` + +**Parameters:** + +- `data.index` (required): `number` - Chunk index (>= 0) +- `data.size` (required): `number` - Chunk size in bytes (> 0) +- `data.digest` (required): `string` - SHA-256 hex digest (exactly 64 chars) +- `data.blob` (required): `string` - Git blob OID (min length: 1) + +**Throws:** `Error` if data does not match ChunkSchema + +**Example:** + +```javascript +const chunk = new Chunk({ + index: 0, + size: 262144, + digest: 'a'.repeat(64), + blob: 'abc123def456' +}); +``` + +#### Fields + +- `index`: `number` - Chunk index (0-based) +- `size`: `number` - Chunk size in bytes +- `digest`: `string` - SHA-256 hex digest +- `blob`: `string` - Git blob OID + +## Ports + +Ports define the interfaces for pluggable adapters. Implementations are provided but you can create custom adapters. + +### GitPersistencePort + +Interface for Git persistence operations. + +#### Methods + +##### writeBlob + +```javascript +await port.writeBlob(content) +``` + +Writes content as a Git blob. + +**Parameters:** + +- `content`: `Buffer | string` - Content to store + +**Returns:** `Promise` - Git blob OID + +##### writeTree + +```javascript +await port.writeTree(entries) +``` + +Creates a Git tree object. + +**Parameters:** + +- `entries`: `Array` - Git mktree format lines (e.g., `"100644 blob \t"`) + +**Returns:** `Promise` - Git tree OID + +##### readBlob + +```javascript +await port.readBlob(oid) +``` + +Reads a Git blob. + +**Parameters:** + +- `oid`: `string` - Git blob OID + +**Returns:** `Promise` - Blob content + +##### readTree + +```javascript +await port.readTree(treeOid) +``` + +Reads a Git tree object. + +**Parameters:** + +- `treeOid`: `string` - Git tree OID + +**Returns:** `Promise>` + +**Example Implementation:** + +```javascript +import GitPersistencePort from 'git-cas/src/ports/GitPersistencePort.js'; + +class CustomGitAdapter extends GitPersistencePort { + async writeBlob(content) { + // Implementation + } + + async writeTree(entries) { + // Implementation + } + + async readBlob(oid) { + // Implementation + } + + async readTree(treeOid) { + // Implementation + } +} +``` + +### CodecPort + +Interface for encoding/decoding manifest data. + +#### Methods + +##### encode + +```javascript +port.encode(data) +``` + +Encodes data to Buffer or string. + +**Parameters:** + +- `data`: `Object` - Data to encode + +**Returns:** `Buffer | string` - Encoded data + +##### decode + +```javascript +port.decode(buffer) +``` + +Decodes data from Buffer or string. + +**Parameters:** + +- `buffer`: `Buffer | string` - Encoded data + +**Returns:** `Object` - Decoded data + +#### Properties + +##### extension + +```javascript +port.extension +``` + +File extension for this codec (e.g., 'json', 'cbor'). + +**Returns:** `string` + +**Example Implementation:** + +```javascript +import CodecPort from 'git-cas/src/ports/CodecPort.js'; + +class XmlCodec extends CodecPort { + encode(data) { + return convertToXml(data); + } + + decode(buffer) { + return parseXml(buffer.toString('utf8')); + } + + get extension() { + return 'xml'; + } +} +``` + +### CryptoPort + +Interface for cryptographic operations. + +#### Methods + +##### sha256 + +```javascript +port.sha256(buf) +``` + +Computes SHA-256 hash. + +**Parameters:** + +- `buf`: `Buffer` - Data to hash + +**Returns:** `string` - 64-character hex digest + +##### randomBytes + +```javascript +port.randomBytes(n) +``` + +Generates cryptographically random bytes. + +**Parameters:** + +- `n`: `number` - Number of bytes + +**Returns:** `Buffer` - Random bytes + +##### encryptBuffer + +```javascript +port.encryptBuffer(buffer, key) +``` + +Encrypts a buffer using AES-256-GCM. + +**Parameters:** + +- `buffer`: `Buffer` - Data to encrypt +- `key`: `Buffer` - 32-byte encryption key + +**Returns:** `{ buf: Buffer, meta: { algorithm: string, nonce: string, tag: string, encrypted: boolean } }` + +##### decryptBuffer + +```javascript +port.decryptBuffer(buffer, key, meta) +``` + +Decrypts a buffer using AES-256-GCM. + +**Parameters:** + +- `buffer`: `Buffer` - Encrypted data +- `key`: `Buffer` - 32-byte encryption key +- `meta`: `Object` - Encryption metadata with `algorithm`, `nonce`, `tag`, `encrypted` + +**Returns:** `Buffer` - Decrypted data + +**Throws:** On authentication failure + +##### createEncryptionStream + +```javascript +port.createEncryptionStream(key) +``` + +Creates a streaming encryption context. + +**Parameters:** + +- `key`: `Buffer` - 32-byte encryption key + +**Returns:** `{ encrypt: Function, finalize: Function }` + +- `encrypt`: `(source: AsyncIterable) => AsyncIterable` - Transform function +- `finalize`: `() => { algorithm: string, nonce: string, tag: string, encrypted: boolean }` - Get metadata + +**Example Implementation:** + +```javascript +import CryptoPort from 'git-cas/src/ports/CryptoPort.js'; + +class CustomCryptoAdapter extends CryptoPort { + sha256(buf) { + // Implementation + } + + randomBytes(n) { + // Implementation + } + + encryptBuffer(buffer, key) { + // Implementation + } + + decryptBuffer(buffer, key, meta) { + // Implementation + } + + createEncryptionStream(key) { + // Implementation + } +} +``` + +## Codecs + +Built-in codec implementations. + +### JsonCodec + +JSON codec for manifest serialization. + +```javascript +import { JsonCodec } from 'git-cas'; + +const codec = new JsonCodec(); +const encoded = codec.encode({ key: 'value' }); +const decoded = codec.decode(encoded); +console.log(codec.extension); // 'json' +``` + +### CborCodec + +CBOR codec for compact binary serialization. + +```javascript +import { CborCodec } from 'git-cas'; + +const codec = new CborCodec(); +const encoded = codec.encode({ key: 'value' }); +const decoded = codec.decode(encoded); +console.log(codec.extension); // 'cbor' +``` + +## Error Codes + +All errors thrown by git-cas are instances of `CasError`. + +### CasError + +```javascript +import CasError from 'git-cas/src/domain/errors/CasError.js'; +``` + +#### Constructor + +```javascript +new CasError(message, code, meta) +``` + +**Parameters:** + +- `message`: `string` - Error message +- `code`: `string` - Error code (see below) +- `meta`: `Object` - Additional error context (default: `{}`) + +#### Fields + +- `name`: `string` - Always "CasError" +- `message`: `string` - Error message +- `code`: `string` - Error code +- `meta`: `Object` - Additional context +- `stack`: `string` - Stack trace + +### Error Codes + +| Code | Description | Thrown By | +|------|-------------|-----------| +| `INVALID_KEY_TYPE` | Encryption key must be a Buffer or Uint8Array | `encrypt()`, `decrypt()`, `store()`, `restore()` | +| `INVALID_KEY_LENGTH` | Encryption key must be exactly 32 bytes | `encrypt()`, `decrypt()`, `store()`, `restore()` | +| `MISSING_KEY` | Encryption key required to restore encrypted content but none was provided | `restore()` | +| `INTEGRITY_ERROR` | Chunk digest verification failed or decryption authentication failed | `restore()`, `verifyIntegrity()`, `decrypt()` | +| `STREAM_ERROR` | Stream error occurred during store operation | `store()` | + +### Error Handling + +**Example:** + +```javascript +import CasError from 'git-cas/src/domain/errors/CasError.js'; + +try { + await cas.restore({ manifest, encryptionKey }); +} catch (err) { + if (err instanceof CasError) { + console.error('CAS Error:', err.code); + console.error('Message:', err.message); + console.error('Meta:', err.meta); + + switch (err.code) { + case 'MISSING_KEY': + console.log('Content is encrypted - please provide a key'); + break; + case 'INTEGRITY_ERROR': + console.log('Content verification failed - may be corrupted'); + break; + case 'INVALID_KEY_LENGTH': + console.log('Key must be 32 bytes'); + break; + } + } else { + throw err; + } +} +``` + +### Error Metadata + +Different error codes include different metadata: + +**INVALID_KEY_LENGTH:** + +```javascript +{ + expected: 32, + actual: +} +``` + +**INTEGRITY_ERROR (chunk verification):** + +```javascript +{ + chunkIndex: , + expected: , // Expected SHA-256 digest + actual: // Actual SHA-256 digest +} +``` + +**INTEGRITY_ERROR (decryption):** + +```javascript +{ + originalError: +} +``` + +**STREAM_ERROR:** + +```javascript +{ + chunksWritten: , + originalError: +} +``` diff --git a/docs/SECURITY.md b/docs/SECURITY.md new file mode 100644 index 0000000..68cf5dd --- /dev/null +++ b/docs/SECURITY.md @@ -0,0 +1,617 @@ +# Security Model + +This document describes the security architecture, cryptographic design, and limitations of git-cas's content-addressable storage system with optional encryption. + +## Table of Contents + +1. [Threat Model](#threat-model) +2. [Cryptographic Design](#cryptographic-design) +3. [Key Handling](#key-handling) +4. [Encryption Flow](#encryption-flow) +5. [Decryption Flow](#decryption-flow) +6. [Chunk Digest Verification](#chunk-digest-verification) +7. [Limitations](#limitations) +8. [Git Object Immutability](#git-object-immutability) +9. [Error Codes for Security Operations](#error-codes-for-security-operations) + +--- + +## Threat Model + +### What git-cas Protects Against + +git-cas provides defense against the following threat scenarios: + +1. **At-rest confidentiality**: When encryption is enabled, stored content is protected from unauthorized reading by anyone who gains access to the Git object database without the encryption key. + +2. **Data integrity**: All stored content (encrypted or not) is protected by SHA-256 digests per chunk. Any corruption, tampering, or bit-rot is detected during restore or integrity verification. + +3. **Authentication of ciphertext**: AES-256-GCM's built-in authentication tag ensures that encrypted data has not been modified or tampered with. Any modification to ciphertext will cause decryption to fail. + +### What git-cas Does NOT Protect Against + +git-cas does NOT provide protection in the following scenarios: + +1. **Key management**: git-cas does not store, manage, or rotate encryption keys. Key storage and lifecycle management are entirely the caller's responsibility. + +2. **Access control**: git-cas does not implement access control lists or authorization policies. If an attacker can access the Git repository and has the encryption key, they can read all content. + +3. **Side-channel attacks**: No protection against timing attacks, power analysis, or other side-channel attacks on the cryptographic operations. + +4. **Memory safety**: Decryption of encrypted content loads the entire ciphertext into memory. No protection against memory dumps or swap file exposure. + +5. **Key recovery**: If an encryption key is lost, there is no key recovery mechanism. Encrypted data becomes permanently inaccessible. + +6. **Metadata privacy**: The following metadata is NOT encrypted: + - Manifest structure (slug, filename, chunk count) + - Chunk sizes and indices + - SHA-256 digests of encrypted chunks + - Git tree and blob object IDs + +7. **Deletion guarantees**: Logical deletion from the manifest does not physically remove data from Git's object database. See [Git Object Immutability](#git-object-immutability). + +8. **Concurrent key rotation**: There is no support for re-encrypting content with a different key while maintaining availability. + +--- + +## Cryptographic Design + +### AES-256-GCM + +git-cas uses **AES-256-GCM** (Galois/Counter Mode) for authenticated encryption: + +- **Algorithm**: `aes-256-gcm` via runtime-specific adapters (Node.js `node:crypto`, Bun `CryptoHasher` + `node:crypto`, Deno/Web `crypto.subtle`) +- **Key size**: 256 bits (32 bytes) +- **Nonce size**: 96 bits (12 bytes), cryptographically random +- **Authentication tag**: 128 bits (16 bytes) + +### Why AES-256-GCM? + +AES-256-GCM was chosen because: + +1. **Authenticated Encryption with Associated Data (AEAD)**: Provides both confidentiality and integrity/authenticity in a single operation. +2. **Nonce-based**: Does not require unique per-message keys, only unique nonces. +3. **Industry standard**: Widely deployed, well-studied, and supported by hardware acceleration on modern CPUs. +4. **Streaming-friendly**: GCM mode allows incremental encryption without padding requirements. + +### Nonce Generation + +Each encryption operation generates a fresh 96-bit (12-byte) nonce using `crypto.randomBytes(12)`: + +- **Uniqueness requirement**: The same key must NEVER be used with the same nonce twice. +- **Random generation**: git-cas uses cryptographically secure random number generation from Node.js's `crypto.randomBytes()`, which sources from the OS entropy pool. +- **Collision probability**: With 96-bit random nonces, the probability of collision is negligible for practical use cases (< 2^48 encryptions with the same key). + +**CRITICAL**: Callers must NOT reuse encryption keys across a large number of operations (approaching 2^32 encryptions with a single key). While collision is unlikely, best practice is to rotate keys periodically. + +### Authentication Tag + +After encryption completes, AES-256-GCM produces a 128-bit authentication tag: + +- The tag is stored in the manifest's `encryption.tag` field (base64-encoded). +- During decryption, the tag is verified by `createDecipheriv()` via `setAuthTag()`. +- If the ciphertext or tag has been modified, `decipher.final()` will throw an error. + +### Encryption Wraps Around Chunked Storage + +The encryption layer wraps the chunking layer: + +``` +[Plain source stream] → [Encrypt stream] → [Chunk into 256KB blocks] → [Store as Git blobs] +``` + +This means: + +- **Encrypted chunks are not individually authenticated**: The entire ciphertext is authenticated as a single unit by the GCM tag. +- **Chunk digests are computed on ciphertext**: The SHA-256 digest stored in each chunk entry is the hash of the encrypted data, not the plaintext. +- **Chunking is deterministic**: Given the same plaintext and key/nonce, the encrypted chunks will be identical (because nonce is fixed at encryption time). + +--- + +## Key Handling + +### Caller Responsibility + +git-cas **does not store encryption keys**. All key management responsibilities fall on the caller: + +1. **Key generation**: The caller must generate cryptographically secure 256-bit (32-byte) keys. +2. **Key storage**: The caller must securely store keys (e.g., in environment variables, key management systems, hardware security modules). +3. **Key distribution**: If keys need to be shared across systems, the caller must implement secure key distribution. +4. **Key rotation**: The caller must implement key rotation policies. git-cas does not support re-encrypting content with a new key. + +### Key Validation + +git-cas validates keys before use: + +```javascript +_validateKey(key) { + if (!Buffer.isBuffer(key) && !(key instanceof Uint8Array)) { + throw new CasError( + 'Encryption key must be a Buffer or Uint8Array', + 'INVALID_KEY_TYPE', + ); + } + if (key.length !== 32) { + throw new CasError( + `Encryption key must be 32 bytes, got ${key.length}`, + 'INVALID_KEY_LENGTH', + { expected: 32, actual: key.length }, + ); + } +} +``` + +**Accepted types**: `Buffer` or `Uint8Array` +**Required length**: Exactly 32 bytes (256 bits) + +If validation fails: +- **INVALID_KEY_TYPE**: Key is not a Buffer or Uint8Array +- **INVALID_KEY_LENGTH**: Key is not 32 bytes + +### Key Best Practices + +1. **Generate keys using a CSPRNG**: Use `crypto.randomBytes(32)` or equivalent. +2. **Never hardcode keys**: Store keys in secure configuration, not in source code. +3. **Use unique keys per project/environment**: Do not reuse the same key across different repositories or environments. +4. **Rotate keys periodically**: Establish a key rotation policy (e.g., every 90 days). +5. **Secure key backups**: If keys are backed up, encrypt the backup with a separate master key. + +--- + +## Encryption Flow + +### High-Level Overview + +When storing content with encryption enabled: + +1. Caller provides `source` (async iterable of Buffers), `slug`, `filename`, and `encryptionKey`. +2. git-cas validates the key. +3. git-cas creates a streaming encryption context with a random nonce. +4. The source stream is encrypted incrementally. +5. Encrypted chunks are buffered to 256KB boundaries. +6. Each 256KB encrypted chunk is hashed (SHA-256) and written as a Git blob. +7. After encryption completes, the GCM authentication tag is retrieved. +8. Encryption metadata (algorithm, nonce, tag) is stored in the manifest. + +### Step-by-Step: `store({ source, slug, filename, encryptionKey })` + +**Step 1: Key Validation** +```javascript +if (encryptionKey) { + this._validateKey(encryptionKey); +} +``` +- If `encryptionKey` is provided, validate it is a 32-byte Buffer/Uint8Array. +- If validation fails, throw `CasError` with code `INVALID_KEY_TYPE` or `INVALID_KEY_LENGTH`. + +**Step 2: Initialize Manifest Data** +```javascript +const manifestData = { + slug, + filename, + size: 0, + chunks: [], +}; +``` + +**Step 3: Create Encryption Stream** +```javascript +const { encrypt, finalize } = this.crypto.createEncryptionStream(encryptionKey); +``` +- `createEncryptionStream()` generates a 12-byte random nonce. +- Creates an `aes-256-gcm` cipher with the key and nonce. +- Returns: + - `encrypt`: an async generator function that yields encrypted chunks + - `finalize`: a function that returns encryption metadata after encryption completes + +**Step 4: Chunk and Store Encrypted Stream** +```javascript +await this._chunkAndStore(encrypt(source), manifestData); +``` +- The `encrypt(source)` async generator reads from the source, encrypts data incrementally, and yields encrypted buffers. +- `_chunkAndStore()` buffers encrypted data to 256KB boundaries. +- Each 256KB chunk is SHA-256 hashed and written as a Git blob. +- Chunk metadata (index, size, digest, blob OID) is appended to `manifestData.chunks`. + +**Step 5: Finalize Encryption Metadata** +```javascript +manifestData.encryption = finalize(); +``` +- `finalize()` retrieves the GCM authentication tag. +- Returns an object: + ```javascript + { + algorithm: 'aes-256-gcm', + nonce: '', + tag: '', + encrypted: true, + } + ``` +- This metadata is stored in the manifest's `encryption` field. + +**Step 6: Create Manifest** +```javascript +const manifest = new Manifest(manifestData); +``` + +### Important Properties + +- **Streaming encryption**: Data is encrypted incrementally. The entire plaintext does NOT need to fit in memory during encryption. +- **Deterministic chunking**: For the same plaintext and key/nonce, the chunk boundaries and digests are deterministic. +- **No plaintext leakage**: The plaintext source is never written to disk. Only encrypted chunks are persisted. + +--- + +## Decryption Flow + +### High-Level Overview + +When restoring content with encryption: + +1. Caller provides `manifest` and `encryptionKey`. +2. git-cas validates the key. +3. git-cas reads all chunk blobs from Git. +4. Each chunk's SHA-256 digest is verified against the stored digest in the manifest. +5. All encrypted chunks are concatenated into a single ciphertext buffer. +6. The ciphertext is decrypted using AES-256-GCM with the stored nonce and tag. +7. If the tag verification fails, decryption throws an integrity error. +8. The plaintext buffer is returned to the caller. + +### Step-by-Step: `restore({ manifest, encryptionKey })` + +**Step 1: Key Validation** +```javascript +if (encryptionKey) { + this._validateKey(encryptionKey); +} +``` + +**Step 2: Check if Key is Required** +```javascript +if (manifest.encryption?.encrypted && !encryptionKey) { + throw new CasError( + 'Encryption key required to restore encrypted content', + 'MISSING_KEY', + ); +} +``` +- If the manifest indicates content is encrypted but no key is provided, throw `MISSING_KEY`. + +**Step 3: Read and Verify Chunks** +```javascript +const chunks = await this._readAndVerifyChunks(manifest.chunks); +``` +- For each chunk in the manifest: + 1. Read the Git blob by OID. + 2. Compute SHA-256 digest of the blob. + 3. Compare computed digest with stored digest in manifest. + 4. If mismatch, throw `CasError` with code `INTEGRITY_ERROR`. + 5. If match, append blob to `buffers` array. + +**Step 4: Concatenate Encrypted Chunks** +```javascript +let buffer = Buffer.concat(chunks); +``` +- All encrypted chunk buffers are concatenated into a single ciphertext buffer. + +**CRITICAL**: This operation loads the entire ciphertext into memory. For large files, this may cause memory exhaustion. See [Limitations](#limitations). + +**Step 5: Decrypt Buffer** +```javascript +if (manifest.encryption?.encrypted) { + buffer = await this.decrypt({ + buffer, + key: encryptionKey, + meta: manifest.encryption, + }); +} +``` +- Extract nonce and tag from `manifest.encryption`. +- Create `aes-256-gcm` decipher with key and nonce. +- Set authentication tag via `setAuthTag()`. +- Decrypt the ciphertext: + ```javascript + const nonce = Buffer.from(meta.nonce, 'base64'); + const tag = Buffer.from(meta.tag, 'base64'); + const decipher = createDecipheriv('aes-256-gcm', key, nonce); + decipher.setAuthTag(tag); + return Buffer.concat([decipher.update(buffer), decipher.final()]); + ``` +- If `decipher.final()` throws (due to tag mismatch or corrupted ciphertext), catch and re-throw as `CasError` with code `INTEGRITY_ERROR`. + +**Step 6: Return Plaintext** +```javascript +return { buffer, bytesWritten: buffer.length }; +``` + +### Important Properties + +- **No streaming decryption**: The entire ciphertext must be loaded into memory before decryption. This is a limitation of the current implementation. +- **Authentication before decryption**: GCM mode ensures that ciphertext integrity is verified before any plaintext is returned. If the tag check fails, no plaintext is leaked. +- **Chunk integrity before decryption**: SHA-256 verification of encrypted chunks occurs before decryption. This detects corruption at the chunk level. + +--- + +## Chunk Digest Verification + +### SHA-256 Per Chunk + +Every chunk (encrypted or unencrypted) is protected by a SHA-256 digest: + +- **Digest computation**: When a chunk is stored, `crypto.createHash('sha256').update(buf).digest('hex')` is computed and stored in the manifest. +- **Digest verification**: When a chunk is read during `restore()` or `verifyIntegrity()`, the digest is recomputed and compared. + +### When Digests Are Verified + +1. **During restore** (`restore()` method): + - Every chunk is read from Git and its SHA-256 digest is verified. + - If any digest mismatch is detected, `restore()` throws `CasError` with code `INTEGRITY_ERROR`. + +2. **During integrity verification** (`verifyIntegrity()` method): + - All chunks are read and their SHA-256 digests are verified. + - If any digest mismatch is detected, `verifyIntegrity()` returns `false` and emits an `integrity:fail` event. + +### What Digests Protect Against + +- **Bit-rot**: Silent corruption of Git objects on disk. +- **Storage errors**: Corruption during disk writes or reads. +- **Tampering**: Intentional modification of chunk blobs. +- **Incomplete writes**: Partial writes during storage failures. + +### What Digests Do NOT Protect Against + +- **Manifest tampering**: If an attacker modifies the manifest to point to different blobs with matching digests, the chunk verification will pass. However: + - For unencrypted content, this results in incorrect data being restored. + - For encrypted content, GCM tag verification will fail unless the attacker also forges the authentication tag (which is computationally infeasible). + +- **Rollback attacks**: If an attacker replaces a newer manifest with an older one, chunk digests will still verify. Application-level versioning or commit signing is required to prevent rollback. + +--- + +## Limitations + +### 1. Encrypted Restore Loads Full Ciphertext into Memory + +**Issue**: The `restore()` method concatenates all encrypted chunks into a single buffer before decryption: + +```javascript +let buffer = Buffer.concat(chunks); +``` + +**Impact**: +- For large encrypted files (e.g., 1GB+), this can cause memory exhaustion. +- Node.js has a maximum buffer size of ~2GB (depending on architecture). + +**Workaround**: +- Avoid encrypting extremely large files with git-cas. +- If large encrypted files are required, implement application-level chunking (e.g., split a 10GB file into 10 separate 1GB files before storing). + +**Future improvement**: Implement streaming decryption to process ciphertext in chunks without full concatenation. + +### 2. No Streaming Decryption + +**Issue**: AES-256-GCM decryption is currently performed on the entire ciphertext as a single operation. The authentication tag is verified only at the end of decryption. + +**Impact**: +- Cannot stream decrypted plaintext to the caller incrementally. +- Cannot detect tampering until the entire ciphertext is processed. + +**Future improvement**: Investigate chunked AEAD modes or encrypt-then-MAC schemes that allow incremental authentication. + +### 3. No Key Rotation + +**Issue**: git-cas does not support re-encrypting content with a new key while maintaining the same manifest structure. + +**Impact**: +- If a key is compromised, all content encrypted with that key must be manually re-encrypted by: + 1. Restoring content with the old key. + 2. Storing content again with the new key. + 3. Updating all references to the old manifest tree to the new manifest tree. + +**Workaround**: +- Implement application-level key rotation by maintaining a key version identifier alongside each manifest. + +**Future improvement**: Add a `reencrypt()` method that re-encrypts content with a new key without requiring full restore. + +### 4. Nonce Collision Risk After 2^32 Operations + +**Issue**: While 96-bit nonces have negligible collision probability for practical use cases, the GCM security proof degrades after ~2^32 encryptions with the same key. + +**Impact**: +- If the same key is used to encrypt more than 2^32 files, nonce reuse becomes more likely. +- Nonce reuse with AES-GCM is catastrophic: it allows attackers to recover the plaintext and authentication key. + +**Mitigation**: +- Rotate encryption keys after a reasonable number of operations (e.g., every 1 million encryptions, or every 90 days, whichever comes first). + +### 5. Metadata Not Encrypted + +**Issue**: The following metadata is stored in plaintext in the manifest: +- `slug` (file identifier) +- `filename` +- `size` (total size of encrypted content) +- `chunks` array (chunk indices, sizes, digests, blob OIDs) + +**Impact**: +- An attacker with access to the repository can infer file structure, sizes, and access patterns. +- Chunk digests may leak information about plaintext content if chunks are small or predictable. + +**Mitigation**: +- If metadata privacy is required, implement application-level encryption of the entire manifest before storing it as a Git blob. + +### 6. No Protection Against Replay or Rollback Attacks + +**Issue**: git-cas does not include versioning or timestamps in the encryption metadata. + +**Impact**: +- An attacker can replace a newer manifest tree with an older one (rollback attack). +- An attacker can duplicate encrypted content across different slugs (replay attack). + +**Mitigation**: +- Use Git commit signing to authenticate manifest trees. +- Implement application-level versioning or monotonic counters. + +--- + +## Git Object Immutability + +### Objects Are Immutable in Git's Object Database + +Git's object database (ODB) is **append-only** and **content-addressed**: + +- Once a blob, tree, or commit is written, its content is immutable. +- Objects are stored in `.git/objects/` and referenced by their SHA-1 (or SHA-256) hash. + +### Logical vs. Physical Deletion + +git-cas does NOT provide a `delete()` method because: + +1. **Logical deletion** is trivial: Remove the reference to a manifest tree from your application's index. +2. **Physical deletion** is a Git-level operation: Unreferenced objects remain in `.git/objects/` until garbage collection. + +### Garbage Collection via `git gc` + +To physically remove unreferenced objects: + +```bash +git gc --aggressive --prune=now +``` + +**Important**: +- `git gc` only removes objects that are not reachable from any ref (branch, tag, commit). +- If a manifest tree is still referenced (e.g., in a commit or reflog), its chunks will NOT be pruned. + +### Security Implications + +1. **Deleted content may persist**: If you "delete" a file by removing its manifest reference, the encrypted chunks remain in `.git/objects/` until `git gc` prunes them. + +2. **Reflog prevents immediate pruning**: Git's reflog keeps references to old commits for 90 days by default. To prune immediately: + ```bash + git reflog expire --expire=now --all + git gc --prune=now + ``` + +3. **Shallow clones do not remove history**: Even if you force-push to remove a commit, the objects remain in the local repository until pruned. + +### Best Practices + +- **Do not rely on logical deletion for security**: If sensitive content was encrypted and stored, assume the ciphertext remains in the repository until `git gc` prunes it. +- **Prune after sensitive operations**: After removing sensitive content, run: + ```bash + git reflog expire --expire=now --all + git gc --aggressive --prune=now + ``` +- **Consider repository rotation**: For highly sensitive data, periodically create a new repository and migrate only non-sensitive content. + +--- + +## Error Codes for Security Operations + +git-cas defines the following error codes for security-related operations: + +### `INTEGRITY_ERROR` + +**Thrown when**: +- A chunk's SHA-256 digest does not match the stored digest in the manifest. +- AES-256-GCM authentication tag verification fails during decryption. + +**Example**: +```javascript +throw new CasError( + 'Chunk 2 integrity check failed', + 'INTEGRITY_ERROR', + { chunkIndex: 2, expected: 'abc123...', actual: 'def456...' }, +); +``` + +**Possible causes**: +- Corruption of Git objects on disk. +- Tampering with chunk blobs. +- Wrong encryption key used for decryption (GCM tag mismatch). +- Incomplete or interrupted writes. + +**Recommended action**: +- If this occurs during `restore()`, the file is corrupted and cannot be recovered without a backup. +- If this occurs during `verifyIntegrity()`, investigate storage hardware or Git repository health. + +### `INVALID_KEY_LENGTH` + +**Thrown when**: +- An encryption key is provided but is not exactly 32 bytes (256 bits). + +**Example**: +```javascript +throw new CasError( + 'Encryption key must be 32 bytes, got 16', + 'INVALID_KEY_LENGTH', + { expected: 32, actual: 16 }, +); +``` + +**Possible causes**: +- Incorrect key generation (e.g., using 128-bit AES key instead of 256-bit). +- Key truncation during storage or transmission. +- Encoding issues (e.g., base64 decoding resulting in wrong length). + +**Recommended action**: +- Verify key generation logic uses `crypto.randomBytes(32)` or equivalent. +- Check key storage/retrieval does not corrupt or truncate the key. + +### `INVALID_KEY_TYPE` + +**Thrown when**: +- An encryption key is provided but is not a `Buffer` or `Uint8Array`. + +**Example**: +```javascript +throw new CasError( + 'Encryption key must be a Buffer or Uint8Array', + 'INVALID_KEY_TYPE', +); +``` + +**Possible causes**: +- Passing a string instead of a Buffer (e.g., `"my-secret-key"` instead of `Buffer.from("my-secret-key")`). +- Passing a base64-encoded string without decoding it first. + +**Recommended action**: +- Ensure keys are stored as `Buffer` or `Uint8Array`. +- If keys are stored as hex/base64 strings, decode them before passing to git-cas: + ```javascript + const key = Buffer.from(keyBase64, 'base64'); + ``` + +### `MISSING_KEY` + +**Thrown when**: +- A manifest indicates content is encrypted (`manifest.encryption.encrypted === true`) but no `encryptionKey` is provided to `restore()`. + +**Example**: +```javascript +throw new CasError( + 'Encryption key required to restore encrypted content', + 'MISSING_KEY', +); +``` + +**Possible causes**: +- Application logic error: Forgot to pass key to `restore()`. +- Key was lost or not available in the current environment. + +**Recommended action**: +- Verify the encryption key is available and passed to `restore()`. +- If the key is lost, the content is permanently inaccessible. + +--- + +## Conclusion + +git-cas provides strong at-rest encryption and integrity guarantees through AES-256-GCM and SHA-256 chunk verification. However, it is critical to understand the limitations and caller responsibilities: + +- **Key management is entirely your responsibility**. git-cas does not store or manage keys. +- **Encrypted restore is not streaming**. Large encrypted files may cause memory issues. +- **No key rotation support**. Re-encrypting content requires manual restore/store cycles. +- **Metadata is not encrypted**. File structure and sizes are visible to anyone with repository access. +- **Logical deletion does not physically remove data**. Use `git gc` to prune unreferenced objects. + +For questions or security concerns, please review the [ROADMAP](../ROADMAP.md) or file an issue. diff --git a/eslint.config.js b/eslint.config.js index 1584939..aecaebb 100644 --- a/eslint.config.js +++ b/eslint.config.js @@ -1,6 +1,7 @@ import js from "@eslint/js"; export default [ + { ignores: ["examples/"] }, js.configs.recommended, { languageOptions: { diff --git a/examples/README.md b/examples/README.md new file mode 100644 index 0000000..015462f --- /dev/null +++ b/examples/README.md @@ -0,0 +1,198 @@ +# git-cas Examples + +This directory contains runnable examples demonstrating the core features of `@git-stunts/git-cas`. + +## Prerequisites + +- Node.js 22 or later +- Git installed and available in PATH +- `@git-stunts/git-cas` and `@git-stunts/plumbing` installed + +## Setup + +Before running the examples, ensure you have a Git repository initialized. The examples will create a temporary bare repository for demonstration purposes. + +```bash +# Install dependencies (from the repository root) +npm install + +# Navigate to the examples directory +cd examples +``` + +## Running the Examples + +Each example is a standalone Node.js script that can be run directly: + +```bash +node store-and-restore.js +node encrypted-workflow.js +node progress-tracking.js +``` + +## Examples Overview + +### store-and-restore.js + +**Demonstrates:** Basic CAS workflow with verification + +This example shows the complete lifecycle of storing and restoring a file: +1. Creates a temporary Git bare repository +2. Stores a file in the content-addressable store +3. Creates a Git tree to persist the manifest +4. Reads the manifest back from the tree +5. Restores the file to disk +6. Verifies the restored content matches the original +7. Runs integrity verification on the stored chunks + +**Key concepts:** +- `ContentAddressableStore.createJson()` factory +- `storeFile()` to store files +- `createTree()` to persist manifests in Git +- Reading manifests from Git trees +- `restoreFile()` to write files back to disk +- `verifyIntegrity()` to check chunk digests + +### encrypted-workflow.js + +**Demonstrates:** Encryption and decryption with AES-256-GCM + +This example shows how to work with encrypted content: +1. Generates a secure 32-byte encryption key +2. Stores a file with encryption enabled +3. Restores the file using the correct key +4. Demonstrates that using the wrong key causes an integrity error +5. Shows the encryption metadata stored in the manifest + +**Key concepts:** +- Generating encryption keys with `crypto.randomBytes(32)` +- Storing encrypted files with `encryptionKey` parameter +- Encryption metadata in manifests +- Decryption during restore +- Handling wrong key errors (INTEGRITY_ERROR) + +### progress-tracking.js + +**Demonstrates:** Event-driven progress monitoring + +This example shows how to track storage and restore operations using Node.js EventEmitter: +1. Accesses the CasService via `cas.getService()` +2. Attaches event listeners for various operations +3. Builds a progress logger that tracks: + - Chunk storage progress + - File storage completion + - Chunk restoration progress + - File restoration completion + - Integrity verification results + +**Key concepts:** +- Accessing the underlying CasService +- Event types: `chunk:stored`, `file:stored`, `chunk:restored`, `file:restored`, `integrity:pass`, `integrity:fail`, `error` +- Building real-time progress indicators +- Calculating percentages based on chunk counts + +## API Reference + +### Factory Methods + +```javascript +// JSON codec (default) +const cas = ContentAddressableStore.createJson({ plumbing }); + +// CBOR codec (binary) +const cas = ContentAddressableStore.createCbor({ plumbing }); +``` + +### Storage Operations + +```javascript +// Store a file +const manifest = await cas.storeFile({ + filePath: '/path/to/file', + slug: 'unique-identifier', + filename: 'optional-name.txt', + encryptionKey: optionalKeyBuffer // 32-byte Buffer +}); + +// Create a Git tree +const treeOid = await cas.createTree({ manifest }); +``` + +### Restore Operations + +```javascript +// Restore to disk +await cas.restoreFile({ + manifest, + encryptionKey: optionalKeyBuffer, + outputPath: '/path/to/output' +}); + +// Restore to memory (returns Buffer) +const { buffer, bytesWritten } = await cas.restore({ + manifest, + encryptionKey: optionalKeyBuffer +}); +``` + +### Verification + +```javascript +// Verify chunk integrity +const isValid = await cas.verifyIntegrity(manifest); +``` + +### Reading Manifests from Trees + +```javascript +// Get the service +const service = await cas.getService(); + +// Read tree entries +const entries = await service.persistence.readTree(treeOid); + +// Find manifest entry +const manifestEntry = entries.find(e => e.name === 'manifest.json'); + +// Read and decode manifest blob +const manifestBlob = await service.persistence.readBlob(manifestEntry.oid); +const manifestData = service.codec.decode(manifestBlob); +const manifest = new Manifest(manifestData); +``` + +## Encryption Keys + +Encryption keys must be 32-byte Buffers for AES-256-GCM: + +```javascript +import { randomBytes } from 'node:crypto'; + +// Generate a secure random key +const key = randomBytes(32); + +// Or use a key derived from a password +// (use a proper KDF like PBKDF2 or scrypt in production) +``` + +## Notes + +- All examples clean up temporary files and directories +- The examples use temporary Git bare repositories to avoid polluting your working directory +- Chunk size defaults to 256 KiB (262,144 bytes) +- File paths must be absolute paths, not relative paths +- The CAS service extends EventEmitter for progress tracking + +## Troubleshooting + +**Error: "Encryption key must be 32 bytes"** +- Ensure your encryption key is exactly 32 bytes +- Use `crypto.randomBytes(32)` or equivalent + +**Error: "INTEGRITY_ERROR"** +- Using wrong decryption key +- Chunk corruption in Git object database +- Run `verifyIntegrity()` to identify corrupted chunks + +**Error: "MISSING_KEY"** +- Attempting to restore encrypted content without providing the key +- Check if `manifest.encryption.encrypted === true` diff --git a/examples/encrypted-workflow.js b/examples/encrypted-workflow.js new file mode 100755 index 0000000..47bdf3c --- /dev/null +++ b/examples/encrypted-workflow.js @@ -0,0 +1,168 @@ +#!/usr/bin/env node +/** + * Encrypted workflow demonstration + * + * This example shows: + * 1. Generating a secure encryption key + * 2. Storing a file with encryption + * 3. Restoring with the correct key + * 4. Demonstrating what happens with the wrong key + * 5. Inspecting encryption metadata + */ + +import { mkdtempSync, writeFileSync, rmSync } from 'node:fs'; +import { randomBytes } from 'node:crypto'; +import { execSync } from 'node:child_process'; +import path from 'node:path'; +import os from 'node:os'; +import GitPlumbing from '@git-stunts/plumbing'; +import ContentAddressableStore from '@git-stunts/git-cas'; + +console.log('=== Encrypted Workflow Example ===\n'); + +// Create a temporary bare Git repository +const repoDir = mkdtempSync(path.join(os.tmpdir(), 'cas-encrypted-')); +console.log(`Created temporary repository: ${repoDir}`); +execSync('git init --bare', { cwd: repoDir, stdio: 'ignore' }); + +// Initialize plumbing and CAS +const plumbing = GitPlumbing.createDefault({ cwd: repoDir }); +const cas = ContentAddressableStore.createJson({ plumbing }); + +// Create a test file with sensitive content +const testDir = mkdtempSync(path.join(os.tmpdir(), 'cas-test-')); +const testFilePath = path.join(testDir, 'sensitive.txt'); +const secretContent = Buffer.from('This is sensitive information that should be encrypted.'); +writeFileSync(testFilePath, secretContent); + +console.log(`Created test file: ${testFilePath}`); +console.log(`Content: "${secretContent.toString()}"`); +console.log(`Size: ${secretContent.length} bytes`); + +// Step 1: Generate a secure encryption key +console.log('\n--- Step 1: Generating encryption key ---'); +const encryptionKey = randomBytes(32); +console.log(`Generated 32-byte encryption key: ${encryptionKey.toString('hex').substring(0, 16)}...`); +console.log(`Key size: ${encryptionKey.length} bytes (256 bits)`); +console.log('Encryption algorithm: AES-256-GCM'); + +// Step 2: Store the file with encryption +console.log('\n--- Step 2: Storing encrypted file ---'); +const manifest = await cas.storeFile({ + filePath: testFilePath, + slug: 'encrypted-secret', + filename: 'sensitive.txt', + encryptionKey +}); + +console.log('File stored with encryption!'); +console.log(` Slug: ${manifest.slug}`); +console.log(` Size: ${manifest.size} bytes`); +console.log(` Chunks: ${manifest.chunks.length}`); +console.log(` Encrypted: ${manifest.encryption?.encrypted ? 'YES' : 'NO'}`); + +// Step 3: Inspect encryption metadata +console.log('\n--- Step 3: Encryption metadata ---'); +if (manifest.encryption?.encrypted) { + console.log('Encryption details:'); + console.log(` Algorithm: ${manifest.encryption.algorithm || 'AES-256-GCM'}`); + const nonceBytes = manifest.encryption.nonce ? Buffer.from(manifest.encryption.nonce, 'base64') : null; + const tagBytes = manifest.encryption.tag ? Buffer.from(manifest.encryption.tag, 'base64') : null; + console.log(` Nonce length: ${nonceBytes?.length || 0} bytes`); + console.log(` Auth tag length: ${tagBytes?.length || 0} bytes`); + + if (nonceBytes) { + console.log(` Nonce (hex): ${nonceBytes.toString('hex')}`); + } +} else { + console.error('ERROR: File was not encrypted!'); + process.exit(1); +} + +// Create a Git tree to persist the manifest +const treeOid = await cas.createTree({ manifest }); +console.log(`\nGit tree created: ${treeOid}`); + +// Step 4: Restore with the correct key +console.log('\n--- Step 4: Restoring with correct key ---'); +try { + const { buffer } = await cas.restore({ + manifest, + encryptionKey + }); + + const decryptedContent = buffer.toString(); + console.log('Decryption successful!'); + console.log(`Restored content: "${decryptedContent}"`); + console.log(`Content matches: ${buffer.equals(secretContent) ? 'YES' : 'NO'}`); + + if (!buffer.equals(secretContent)) { + console.error('ERROR: Decrypted content does not match original!'); + process.exit(1); + } +} catch (err) { + console.error(`Decryption failed: ${err.message}`); + process.exit(1); +} + +// Step 5: Demonstrate wrong key failure +console.log('\n--- Step 5: Attempting restore with wrong key ---'); +const wrongKey = randomBytes(32); +console.log(`Wrong key (hex): ${wrongKey.toString('hex').substring(0, 16)}...`); + +try { + await cas.restore({ + manifest, + encryptionKey: wrongKey + }); + + // If we get here, something is wrong + console.error('ERROR: Decryption should have failed with wrong key!'); + process.exit(1); +} catch (err) { + console.log('Decryption correctly failed!'); + console.log(`Error type: ${err.constructor.name}`); + console.log(`Error code: ${err.code}`); + console.log(`Error message: ${err.message}`); + + if (err.code !== 'INTEGRITY_ERROR') { + console.warn(`Warning: Expected error code 'INTEGRITY_ERROR', got '${err.code}'`); + } +} + +// Step 6: Demonstrate missing key error +console.log('\n--- Step 6: Attempting restore without key ---'); +try { + await cas.restore({ manifest }); + + console.error('ERROR: Restore should have failed without key!'); + process.exit(1); +} catch (err) { + console.log('Restore correctly failed!'); + console.log(`Error code: ${err.code}`); + console.log(`Error message: ${err.message}`); + + if (err.code !== 'MISSING_KEY') { + console.warn(`Warning: Expected error code 'MISSING_KEY', got '${err.code}'`); + } +} + +// Step 7: Verify integrity of encrypted chunks +console.log('\n--- Step 7: Verifying encrypted chunk integrity ---'); +const isValid = await cas.verifyIntegrity(manifest); +console.log(`Integrity check: ${isValid ? 'PASSED' : 'FAILED'}`); +console.log('Note: Integrity check verifies chunk digests, not decryption'); + +// Cleanup +console.log('\n--- Cleanup ---'); +rmSync(testDir, { recursive: true, force: true }); +rmSync(repoDir, { recursive: true, force: true }); +console.log('Temporary files removed'); + +console.log('\n=== Example completed successfully! ==='); +console.log('\nKey takeaways:'); +console.log('- Encryption keys must be exactly 32 bytes (256 bits)'); +console.log('- Wrong keys produce INTEGRITY_ERROR during decryption'); +console.log('- Missing keys produce MISSING_KEY error'); +console.log('- Encryption metadata (IV, auth tag) is stored in the manifest'); +console.log('- Chunk integrity is verified independently of encryption'); diff --git a/examples/progress-tracking.js b/examples/progress-tracking.js new file mode 100755 index 0000000..44b5dbb --- /dev/null +++ b/examples/progress-tracking.js @@ -0,0 +1,232 @@ +#!/usr/bin/env node +/** + * Progress tracking demonstration using EventEmitter + * + * This example shows: + * 1. Accessing the CasService to attach event listeners + * 2. Tracking chunk-by-chunk progress during store + * 3. Tracking chunk-by-chunk progress during restore + * 4. Building a real-time progress indicator + * 5. Monitoring integrity verification events + */ + +import { mkdtempSync, writeFileSync, rmSync } from 'node:fs'; +import { randomBytes } from 'node:crypto'; +import { execSync } from 'node:child_process'; +import path from 'node:path'; +import os from 'node:os'; +import GitPlumbing from '@git-stunts/plumbing'; +import ContentAddressableStore from '@git-stunts/git-cas'; + +console.log('=== Progress Tracking Example ===\n'); + +// Create a temporary bare Git repository +const repoDir = mkdtempSync(path.join(os.tmpdir(), 'cas-progress-')); +console.log(`Created temporary repository: ${repoDir}`); +execSync('git init --bare', { cwd: repoDir, stdio: 'ignore' }); + +// Initialize plumbing and CAS +const plumbing = GitPlumbing.createDefault({ cwd: repoDir }); +const cas = ContentAddressableStore.createJson({ plumbing, chunkSize: 128 * 1024 }); // 128 KB chunks + +// Create a larger test file to see multiple chunks +const testDir = mkdtempSync(path.join(os.tmpdir(), 'cas-test-')); +const testFilePath = path.join(testDir, 'large-file.bin'); +const fileSize = 1024 * 1024; // 1 MB (will be ~8 chunks at 128 KB) +const originalContent = randomBytes(fileSize); +writeFileSync(testFilePath, originalContent); + +console.log(`Created test file: ${testFilePath}`); +console.log(`File size: ${fileSize.toLocaleString()} bytes`); +console.log(`Chunk size: ${(128 * 1024).toLocaleString()} bytes`); +console.log(`Expected chunks: ~${Math.ceil(fileSize / (128 * 1024))}`); + +// Get the CasService to attach event listeners +const service = await cas.getService(); + +// Progress tracker state +const progress = { + store: { chunks: 0, bytes: 0 }, + restore: { chunks: 0, bytes: 0 } +}; + +// Event listeners for storage operations +console.log('\n--- Setting up event listeners ---'); + +service.on('chunk:stored', (event) => { + progress.store.chunks++; + progress.store.bytes += event.size; + console.log(`[STORE] Chunk ${event.index} stored: ${event.size.toLocaleString()} bytes (digest: ${event.digest.substring(0, 8)}...)`); +}); + +service.on('file:stored', (event) => { + console.log(`[STORE] File complete: ${event.slug}`); + console.log(` Total size: ${event.size.toLocaleString()} bytes`); + console.log(` Total chunks: ${event.chunkCount}`); + console.log(` Encrypted: ${event.encrypted ? 'Yes' : 'No'}`); +}); + +service.on('chunk:restored', (event) => { + progress.restore.chunks++; + progress.restore.bytes += event.size; + console.log(`[RESTORE] Chunk ${event.index} restored: ${event.size.toLocaleString()} bytes (digest: ${event.digest.substring(0, 8)}...)`); +}); + +service.on('file:restored', (event) => { + console.log(`[RESTORE] File complete: ${event.slug}`); + console.log(` Total size: ${event.size.toLocaleString()} bytes`); + console.log(` Total chunks: ${event.chunkCount}`); +}); + +service.on('integrity:pass', (event) => { + console.log(`[INTEGRITY] Passed for: ${event.slug}`); +}); + +service.on('integrity:fail', (event) => { + console.error(`[INTEGRITY] FAILED for: ${event.slug}`); + console.error(` Chunk index: ${event.chunkIndex}`); + console.error(` Expected: ${event.expected}`); + console.error(` Actual: ${event.actual}`); +}); + +service.on('error', (event) => { + console.error(`[ERROR] ${event.code}: ${event.message}`); +}); + +console.log('Event listeners attached:'); +console.log(' - chunk:stored'); +console.log(' - file:stored'); +console.log(' - chunk:restored'); +console.log(' - file:restored'); +console.log(' - integrity:pass'); +console.log(' - integrity:fail'); +console.log(' - error'); + +// Step 1: Store the file with progress tracking +console.log('\n--- Step 1: Storing file (watch for chunk events) ---\n'); +const startStore = Date.now(); +const manifest = await cas.storeFile({ + filePath: testFilePath, + slug: 'progress-demo', + filename: 'large-file.bin' +}); +const storeTime = Date.now() - startStore; + +// Snapshot Step 1 counters before they accumulate further in Step 4 +const step1StoreChunks = progress.store.chunks; +const step1StoreBytes = progress.store.bytes; + +console.log(`\nStorage completed in ${storeTime}ms`); +console.log(`Chunks stored: ${step1StoreChunks}`); +console.log(`Bytes processed: ${step1StoreBytes.toLocaleString()}`); +console.log(`Average chunk size: ${Math.round(step1StoreBytes / step1StoreChunks).toLocaleString()} bytes`); + +// Calculate storage throughput +const storeThroughputMBps = (step1StoreBytes / 1024 / 1024) / (storeTime / 1000); +console.log(`Throughput: ${storeThroughputMBps.toFixed(2)} MB/s`); + +// Step 2: Restore the file with progress tracking +console.log('\n--- Step 2: Restoring file (watch for chunk events) ---\n'); +const startRestore = Date.now(); +const { buffer, bytesWritten } = await cas.restore({ manifest }); +const restoreTime = Date.now() - startRestore; + +console.log(`\nRestore completed in ${restoreTime}ms`); +console.log(`Chunks restored: ${progress.restore.chunks}`); +console.log(`Bytes processed: ${progress.restore.bytes.toLocaleString()}`); +console.log(`Bytes written: ${bytesWritten.toLocaleString()}`); + +// Calculate restore throughput +const restoreThroughputMBps = (progress.restore.bytes / 1024 / 1024) / (restoreTime / 1000); +console.log(`Throughput: ${restoreThroughputMBps.toFixed(2)} MB/s`); + +// Verify content +const contentMatches = buffer.equals(originalContent); +console.log(`Content verification: ${contentMatches ? 'PASSED' : 'FAILED'}`); + +if (!contentMatches) { + console.error('ERROR: Content mismatch!'); + process.exit(1); +} + +// Step 3: Run integrity verification with events +console.log('\n--- Step 3: Integrity verification (watch for events) ---\n'); +const startVerify = Date.now(); +const isValid = await cas.verifyIntegrity(manifest); +const verifyTime = Date.now() - startVerify; + +console.log(`\nIntegrity verification completed in ${verifyTime}ms`); +console.log(`Result: ${isValid ? 'PASSED' : 'FAILED'}`); + +if (!isValid) { + console.error('ERROR: Integrity check failed!'); + process.exit(1); +} + +// Step 4: Build a more sophisticated progress indicator +console.log('\n--- Step 4: Advanced progress tracking ---'); +console.log('\nStoring another file with percentage progress...\n'); + +// Reset progress counters +let storeChunkCount = 0; +let totalChunks = 0; + +// Create a new event listener for percentage progress +const progressListener = (event) => { + storeChunkCount++; + if (totalChunks === 0) { + // First chunk - estimate total chunks + totalChunks = Math.ceil(fileSize / (128 * 1024)); + } + const percentage = Math.min(100, Math.round((storeChunkCount / totalChunks) * 100)); + const progressBar = '='.repeat(Math.floor(percentage / 5)) + ' '.repeat(20 - Math.floor(percentage / 5)); + process.stdout.write(`\rProgress: [${progressBar}] ${percentage}% (${storeChunkCount}/${totalChunks} chunks)`); +}; + +service.on('chunk:stored', progressListener); + +// Store another test file +const testFilePath2 = path.join(testDir, 'progress-demo.bin'); +writeFileSync(testFilePath2, randomBytes(fileSize)); + +const manifest2 = await cas.storeFile({ + filePath: testFilePath2, + slug: 'progress-demo-2', + filename: 'progress-demo.bin' +}); + +console.log('\n\nProgress tracking complete!'); +console.log(`Final chunk count: ${storeChunkCount}`); + +// Remove the progress listener to avoid cluttering output +service.removeListener('chunk:stored', progressListener); + +// Summary statistics +console.log('\n--- Performance Summary ---'); +console.log('Storage operation (Step 1):'); +console.log(` Time: ${storeTime}ms`); +console.log(` Throughput: ${storeThroughputMBps.toFixed(2)} MB/s`); +console.log(` Chunks: ${step1StoreChunks}`); + +console.log('\nRestore operation:'); +console.log(` Time: ${restoreTime}ms`); +console.log(` Throughput: ${restoreThroughputMBps.toFixed(2)} MB/s`); +console.log(` Chunks: ${progress.restore.chunks}`); + +console.log('\nIntegrity verification:'); +console.log(` Time: ${verifyTime}ms`); + +// Cleanup +console.log('\n--- Cleanup ---'); +rmSync(testDir, { recursive: true, force: true }); +rmSync(repoDir, { recursive: true, force: true }); +console.log('Temporary files removed'); + +console.log('\n=== Example completed successfully! ==='); +console.log('\nKey takeaways:'); +console.log('- Access CasService via cas.getService() for events'); +console.log('- chunk:stored fires for each chunk during storage'); +console.log('- chunk:restored fires for each chunk during restore'); +console.log('- file:stored and file:restored fire when operations complete'); +console.log('- Events can be used to build progress bars and monitors'); +console.log('- Remove listeners with removeListener() to avoid memory leaks'); diff --git a/examples/store-and-restore.js b/examples/store-and-restore.js new file mode 100755 index 0000000..4f0a111 --- /dev/null +++ b/examples/store-and-restore.js @@ -0,0 +1,128 @@ +#!/usr/bin/env node +/** + * Basic store-and-restore workflow demonstration + * + * This example shows the complete lifecycle: + * 1. Store a file in the CAS + * 2. Create a Git tree to persist the manifest + * 3. Read the manifest back from the tree + * 4. Restore the file to disk + * 5. Verify the restored content matches the original + * 6. Run integrity verification + */ + +import { mkdtempSync, writeFileSync, readFileSync, rmSync } from 'node:fs'; +import { randomBytes } from 'node:crypto'; +import { execSync } from 'node:child_process'; +import path from 'node:path'; +import os from 'node:os'; +import GitPlumbing from '@git-stunts/plumbing'; +import ContentAddressableStore, { Manifest } from '@git-stunts/git-cas'; + +console.log('=== Store and Restore Example ===\n'); + +// Create a temporary bare Git repository +const repoDir = mkdtempSync(path.join(os.tmpdir(), 'cas-example-')); +console.log(`Created temporary repository: ${repoDir}`); +execSync('git init --bare', { cwd: repoDir, stdio: 'ignore' }); + +// Initialize plumbing and CAS +const plumbing = GitPlumbing.createDefault({ cwd: repoDir }); +const cas = ContentAddressableStore.createJson({ plumbing }); + +// Create a test file with random content +const testDir = mkdtempSync(path.join(os.tmpdir(), 'cas-test-')); +const testFilePath = path.join(testDir, 'sample.bin'); +const originalContent = randomBytes(500 * 1024); // 500 KB +writeFileSync(testFilePath, originalContent); + +console.log(`\nCreated test file: ${testFilePath}`); +console.log(`File size: ${originalContent.length.toLocaleString()} bytes`); + +// Step 1: Store the file +console.log('\n--- Step 1: Storing file ---'); +const manifest = await cas.storeFile({ + filePath: testFilePath, + slug: 'example-file', + filename: 'sample.bin' +}); + +console.log(`Stored successfully!`); +console.log(` Slug: ${manifest.slug}`); +console.log(` Filename: ${manifest.filename}`); +console.log(` Size: ${manifest.size.toLocaleString()} bytes`); +console.log(` Chunks: ${manifest.chunks.length}`); +console.log(` Encrypted: ${manifest.encryption?.encrypted ? 'Yes' : 'No'}`); + +// Step 2: Create a Git tree to persist the manifest +console.log('\n--- Step 2: Creating Git tree ---'); +const treeOid = await cas.createTree({ manifest }); +console.log(`Git tree created: ${treeOid}`); + +// Step 3: Read the manifest back from the tree +console.log('\n--- Step 3: Reading manifest from tree ---'); +const service = await cas.getService(); +const entries = await service.persistence.readTree(treeOid); + +console.log(`Tree contains ${entries.length} entries:`); +entries.forEach(entry => { + const label = entry.name.startsWith('manifest.') ? 'Manifest' : `Chunk ${entry.name.substring(0, 8)}...`; + console.log(` - ${label} (${entry.type}): ${entry.oid}`); +}); + +// Find and decode the manifest +const manifestEntry = entries.find(e => e.name === 'manifest.json'); +if (!manifestEntry) { + throw new Error('Manifest not found in tree'); +} + +const manifestBlob = await service.persistence.readBlob(manifestEntry.oid); +const manifestData = service.codec.decode(manifestBlob); +const restoredManifest = new Manifest(manifestData); + +console.log('\nManifest successfully read from tree'); +console.log(` Slug: ${restoredManifest.slug}`); +console.log(` Chunks: ${restoredManifest.chunks.length}`); + +// Step 4: Restore the file to disk +console.log('\n--- Step 4: Restoring file to disk ---'); +const outputPath = path.join(testDir, 'restored.bin'); +const { bytesWritten } = await cas.restoreFile({ + manifest: restoredManifest, + outputPath +}); + +console.log(`File restored to: ${outputPath}`); +console.log(`Bytes written: ${bytesWritten.toLocaleString()}`); + +// Step 5: Verify the restored content matches the original +console.log('\n--- Step 5: Verifying content ---'); +const restoredContent = readFileSync(outputPath); + +const contentMatches = originalContent.equals(restoredContent); +console.log(`Original size: ${originalContent.length.toLocaleString()} bytes`); +console.log(`Restored size: ${restoredContent.length.toLocaleString()} bytes`); +console.log(`Content matches: ${contentMatches ? 'YES' : 'NO'}`); + +if (!contentMatches) { + console.error('ERROR: Content mismatch!'); + process.exit(1); +} + +// Step 6: Run integrity verification +console.log('\n--- Step 6: Integrity verification ---'); +const isValid = await cas.verifyIntegrity(restoredManifest); +console.log(`Integrity check: ${isValid ? 'PASSED' : 'FAILED'}`); + +if (!isValid) { + console.error('ERROR: Integrity check failed!'); + process.exit(1); +} + +// Cleanup +console.log('\n--- Cleanup ---'); +rmSync(testDir, { recursive: true, force: true }); +rmSync(repoDir, { recursive: true, force: true }); +console.log('Temporary files removed'); + +console.log('\n=== Example completed successfully! ==='); diff --git a/index.js b/index.js index 9a35d51..2b6ac14 100644 --- a/index.js +++ b/index.js @@ -26,6 +26,7 @@ export { /** * Detects the best crypto adapter for the current runtime. + * @returns {Promise} A runtime-appropriate CryptoPort implementation. */ async function getDefaultCryptoAdapter() { if (globalThis.Bun) { @@ -40,16 +41,19 @@ async function getDefaultCryptoAdapter() { } /** - * Facade class for the CAS library. + * High-level facade for the Content Addressable Store library. + * + * Wraps {@link CasService} with lazy initialization, runtime-adaptive crypto + * selection, and convenience helpers for file I/O. */ export default class ContentAddressableStore { /** * @param {Object} options - * @param {import('../plumbing/index.js').default} options.plumbing - * @param {number} [options.chunkSize] - * @param {import('./src/ports/CodecPort.js').default} [options.codec] - * @param {import('./src/ports/CryptoPort.js').default} [options.crypto] - * @param {import('@git-stunts/alfred').Policy} [options.policy] - Resilience policy for Git I/O + * @param {import('@git-stunts/plumbing').default} options.plumbing - GitPlumbing instance for Git operations. + * @param {number} [options.chunkSize] - Chunk size in bytes (default 256 KiB). + * @param {import('./src/ports/CodecPort.js').default} [options.codec] - Manifest codec (default JsonCodec). + * @param {import('./src/ports/CryptoPort.js').default} [options.crypto] - Crypto adapter (auto-detected if omitted). + * @param {import('@git-stunts/alfred').Policy} [options.policy] - Resilience policy for Git I/O. */ constructor({ plumbing, chunkSize, codec, policy, crypto }) { this.plumbing = plumbing; @@ -64,8 +68,9 @@ export default class ContentAddressableStore { #servicePromise = null; /** - * Lazily initializes the service to handle async adapter discovery. + * Lazily initializes the service, handling async adapter discovery. * @private + * @returns {Promise} */ async #getService() { if (!this.#servicePromise) { @@ -74,6 +79,11 @@ export default class ContentAddressableStore { return await this.#servicePromise; } + /** + * Constructs the persistence adapter, resolves crypto, and creates the CasService. + * @private + * @returns {Promise} + */ async #initService() { const persistence = new GitPersistenceAdapter({ plumbing: this.plumbing, @@ -90,7 +100,8 @@ export default class ContentAddressableStore { } /** - * Lazily initializes and returns the service. + * Lazily initializes and returns the underlying {@link CasService}. + * @returns {Promise} */ async getService() { return await this.#getService(); @@ -98,6 +109,11 @@ export default class ContentAddressableStore { /** * Factory to create a CAS with JSON codec. + * @param {Object} options + * @param {import('@git-stunts/plumbing').default} options.plumbing - GitPlumbing instance. + * @param {number} [options.chunkSize] - Chunk size in bytes. + * @param {import('@git-stunts/alfred').Policy} [options.policy] - Resilience policy. + * @returns {ContentAddressableStore} */ static createJson({ plumbing, chunkSize, policy }) { return new ContentAddressableStore({ plumbing, chunkSize, codec: new JsonCodec(), policy }); @@ -105,28 +121,61 @@ export default class ContentAddressableStore { /** * Factory to create a CAS with CBOR codec. + * @param {Object} options + * @param {import('@git-stunts/plumbing').default} options.plumbing - GitPlumbing instance. + * @param {number} [options.chunkSize] - Chunk size in bytes. + * @param {import('@git-stunts/alfred').Policy} [options.policy] - Resilience policy. + * @returns {ContentAddressableStore} */ static createCbor({ plumbing, chunkSize, policy }) { return new ContentAddressableStore({ plumbing, chunkSize, codec: new CborCodec(), policy }); } + /** + * Returns the configured chunk size in bytes. + * @returns {number} + */ get chunkSize() { return this.service?.chunkSize || this.chunkSizeConfig || 256 * 1024; } + /** + * Encrypts a buffer using AES-256-GCM. + * @param {Object} options + * @param {Buffer} options.buffer - Plaintext data to encrypt. + * @param {Buffer} options.key - 32-byte encryption key. + * @returns {Promise<{ buf: Buffer, meta: { algorithm: string, nonce: string, tag: string, encrypted: boolean } }>} + */ async encrypt(options) { const service = await this.#getService(); return await service.encrypt(options); } + /** + * Decrypts a buffer. Returns it unchanged if `meta.encrypted` is falsy. + * @param {Object} options + * @param {Buffer} options.buffer - Ciphertext to decrypt. + * @param {Buffer} options.key - 32-byte encryption key. + * @param {{ encrypted: boolean, algorithm: string, nonce: string, tag: string }} options.meta - Encryption metadata. + * @returns {Promise} + */ async decrypt(options) { const service = await this.#getService(); return await service.decrypt(options); } /** - * Opens a file and delegates to CasService.store(). - * Backward-compatible API that accepts a filePath. + * Reads a file from disk and stores it in Git as chunked blobs. + * + * Convenience wrapper that opens a read stream and delegates to + * {@link CasService#store}. + * + * @param {Object} options + * @param {string} options.filePath - Absolute or relative path to the file. + * @param {string} options.slug - Logical identifier for the stored asset. + * @param {string} [options.filename] - Override filename (defaults to basename of filePath). + * @param {Buffer} [options.encryptionKey] - 32-byte key for AES-256-GCM encryption. + * @returns {Promise} The resulting manifest. */ async storeFile({ filePath, slug, filename, encryptionKey }) { const source = createReadStream(filePath); @@ -140,7 +189,13 @@ export default class ContentAddressableStore { } /** - * Direct passthrough for callers who already have an async iterable source. + * Stores an async iterable source in Git as chunked blobs. + * @param {Object} options + * @param {AsyncIterable} options.source - Data to store. + * @param {string} options.slug - Logical identifier for the stored asset. + * @param {string} options.filename - Filename for the manifest. + * @param {Buffer} [options.encryptionKey] - 32-byte key for AES-256-GCM encryption. + * @returns {Promise} The resulting manifest. */ async store(options) { const service = await this.#getService(); @@ -148,7 +203,12 @@ export default class ContentAddressableStore { } /** - * Restores a file from its manifest and writes it to outputPath. + * Restores a file from its manifest and writes it to disk. + * @param {Object} options + * @param {import('./src/domain/value-objects/Manifest.js').default} options.manifest - The file manifest. + * @param {Buffer} [options.encryptionKey] - 32-byte key, required if manifest is encrypted. + * @param {string} options.outputPath - Destination file path. + * @returns {Promise<{ bytesWritten: number }>} */ async restoreFile({ manifest, encryptionKey, outputPath }) { const service = await this.#getService(); @@ -162,17 +222,32 @@ export default class ContentAddressableStore { /** * Restores a file from its manifest, returning the buffer directly. + * @param {Object} options + * @param {import('./src/domain/value-objects/Manifest.js').default} options.manifest - The file manifest. + * @param {Buffer} [options.encryptionKey] - 32-byte key, required if manifest is encrypted. + * @returns {Promise<{ buffer: Buffer, bytesWritten: number }>} */ async restore(options) { const service = await this.#getService(); return await service.restore(options); } + /** + * Creates a Git tree object from a manifest. + * @param {Object} options + * @param {import('./src/domain/value-objects/Manifest.js').default} options.manifest - The file manifest. + * @returns {Promise} Git OID of the created tree. + */ async createTree(options) { const service = await this.#getService(); return await service.createTree(options); } + /** + * Verifies the integrity of a stored file by re-hashing its chunks. + * @param {import('./src/domain/value-objects/Manifest.js').default} manifest - The file manifest. + * @returns {Promise} `true` if all chunks pass verification. + */ async verifyIntegrity(manifest) { const service = await this.#getService(); return await service.verifyIntegrity(manifest); @@ -180,6 +255,9 @@ export default class ContentAddressableStore { /** * Reads a manifest from a Git tree OID. + * @param {Object} options + * @param {string} options.treeOid - Git tree OID to read the manifest from. + * @returns {Promise} */ async readManifest(options) { const service = await this.#getService(); @@ -188,6 +266,10 @@ export default class ContentAddressableStore { /** * Returns deletion metadata for an asset stored in a Git tree. + * Does not perform any destructive Git operations. + * @param {Object} options + * @param {string} options.treeOid - Git tree OID of the asset. + * @returns {Promise<{ slug: string, chunksOrphaned: number }>} */ async deleteAsset(options) { const service = await this.#getService(); @@ -196,6 +278,10 @@ export default class ContentAddressableStore { /** * Aggregates referenced chunk blob OIDs across multiple stored assets. + * Analysis only — does not delete or modify anything. + * @param {Object} options + * @param {string[]} options.treeOids - Git tree OIDs to analyze. + * @returns {Promise<{ referenced: Set, total: number }>} */ async findOrphanedChunks(options) { const service = await this.#getService(); diff --git a/src/domain/errors/CasError.js b/src/domain/errors/CasError.js index 24b46bd..6acc1da 100644 --- a/src/domain/errors/CasError.js +++ b/src/domain/errors/CasError.js @@ -1,7 +1,15 @@ /** - * Base error for CAS operations. + * Base error class for CAS operations. + * + * Carries a machine-readable `code` and an optional `meta` bag for + * structured error context. */ export default class CasError extends Error { + /** + * @param {string} message - Human-readable error description. + * @param {string} code - Machine-readable error code (e.g. `'INTEGRITY_ERROR'`). + * @param {Object} [meta={}] - Arbitrary metadata for diagnostics. + */ constructor(message, code, meta = {}) { super(message); this.name = this.constructor.name; diff --git a/src/domain/schemas/ManifestSchema.js b/src/domain/schemas/ManifestSchema.js index 5a24d56..71c44e5 100644 --- a/src/domain/schemas/ManifestSchema.js +++ b/src/domain/schemas/ManifestSchema.js @@ -1,5 +1,10 @@ +/** + * @fileoverview Zod schemas for validating CAS manifest and chunk data. + */ + import z from 'zod'; +/** Validates a single chunk entry within a manifest. */ export const ChunkSchema = z.object({ index: z.number().int().min(0), size: z.number().int().positive(), @@ -7,6 +12,7 @@ export const ChunkSchema = z.object({ blob: z.string().min(1), // Git OID }); +/** Validates the encryption metadata attached to an encrypted manifest. */ export const EncryptionSchema = z.object({ algorithm: z.string(), nonce: z.string(), @@ -14,6 +20,7 @@ export const EncryptionSchema = z.object({ encrypted: z.boolean().default(true), }); +/** Validates a complete file manifest. */ export const ManifestSchema = z.object({ slug: z.string().min(1), filename: z.string().min(1), diff --git a/src/domain/services/CasService.js b/src/domain/services/CasService.js index 513c2fa..34339db 100644 --- a/src/domain/services/CasService.js +++ b/src/domain/services/CasService.js @@ -1,10 +1,23 @@ +import { EventEmitter } from 'node:events'; import Manifest from '../value-objects/Manifest.js'; import CasError from '../errors/CasError.js'; /** * Domain service for Content Addressable Storage operations. + * + * Provides chunking, encryption, and integrity verification for storing + * arbitrary data in Git's object database. Extends {@link EventEmitter} to + * emit progress events during store/restore operations. + * + * @fires CasService#chunk:stored + * @fires CasService#chunk:restored + * @fires CasService#file:stored + * @fires CasService#file:restored + * @fires CasService#integrity:pass + * @fires CasService#integrity:fail + * @fires CasService#error */ -export default class CasService { +export default class CasService extends EventEmitter { /** * @param {Object} options * @param {import('../../ports/GitPersistencePort.js').default} options.persistence @@ -13,6 +26,7 @@ export default class CasService { * @param {number} [options.chunkSize=262144] - 256 KiB */ constructor({ persistence, codec, crypto, chunkSize = 256 * 1024 }) { + super(); if (chunkSize < 1024) { throw new Error('Chunk size must be at least 1024 bytes'); } @@ -23,18 +37,36 @@ export default class CasService { } /** - * Generates a SHA-256 hash for a buffer. + * Generates a SHA-256 hex digest for a buffer. * @private + * @param {Buffer} buf - Data to hash. + * @returns {Promise} 64-character hex digest. */ async _sha256(buf) { return await this.crypto.sha256(buf); } /** - * Helper to process an async iterable into chunks and store them. + * Stores a single buffer chunk in Git and appends its metadata to the manifest. * @private - * @param {AsyncIterable} source - * @param {Object} manifestData + * @param {Buffer} buf - The chunk data to store. + * @param {Object} manifestData - Mutable manifest accumulator. + */ + async _storeChunk(buf, manifestData) { + const digest = await this._sha256(buf); + const blob = await this.persistence.writeBlob(buf); + const entry = { index: manifestData.chunks.length, size: buf.length, digest, blob }; + manifestData.chunks.push(entry); + manifestData.size += buf.length; + this.emit('chunk:stored', { index: entry.index, size: entry.size, digest, blob }); + } + + /** + * Reads an async iterable source, splits it into fixed-size chunks, and stores each in Git. + * @private + * @param {AsyncIterable} source - The data source to chunk. + * @param {Object} manifestData - Mutable manifest accumulator. + * @throws {CasError} STREAM_ERROR if the source stream fails. */ async _chunkAndStore(source, manifestData) { let buffer = Buffer.alloc(0); @@ -42,49 +74,31 @@ export default class CasService { try { for await (const chunk of source) { buffer = Buffer.concat([buffer, chunk]); - while (buffer.length >= this.chunkSize) { - const chunkBuf = buffer.slice(0, this.chunkSize); + await this._storeChunk(buffer.slice(0, this.chunkSize), manifestData); buffer = buffer.slice(this.chunkSize); - - const digest = await this._sha256(chunkBuf); - const blob = await this.persistence.writeBlob(chunkBuf); - - manifestData.chunks.push({ - index: manifestData.chunks.length, - size: chunkBuf.length, - digest, - blob - }); - manifestData.size += chunkBuf.length; } } } catch (err) { - if (err instanceof CasError) {throw err;} - throw new CasError( + if (err instanceof CasError) { throw err; } + const casErr = new CasError( `Stream error during store: ${err.message}`, 'STREAM_ERROR', { chunksWritten: manifestData.chunks.length, originalError: err }, ); + if (this.listenerCount('error') > 0) { + this.emit('error', { code: casErr.code, message: casErr.message }); + } + throw casErr; } - // Process remaining buffer if (buffer.length > 0) { - const digest = await this._sha256(buffer); - const blob = await this.persistence.writeBlob(buffer); - - manifestData.chunks.push({ - index: manifestData.chunks.length, - size: buffer.length, - digest, - blob - }); - manifestData.size += buffer.length; + await this._storeChunk(buffer, manifestData); } } /** - * Validates that an encryption key is a 32-byte Buffer. + * Validates that an encryption key is a 32-byte Buffer or Uint8Array. * @private * @param {*} key * @throws {CasError} INVALID_KEY_TYPE if key is not a Buffer @@ -108,6 +122,11 @@ export default class CasService { /** * Encrypts a buffer using AES-256-GCM. + * @param {Object} options + * @param {Buffer} options.buffer - Plaintext data to encrypt. + * @param {Buffer} options.key - 32-byte encryption key. + * @returns {Promise<{ buf: Buffer, meta: { algorithm: string, nonce: string, tag: string, encrypted: boolean } }>} + * @throws {CasError} INVALID_KEY_TYPE | INVALID_KEY_LENGTH if the key is invalid. */ async encrypt({ buffer, key }) { this._validateKey(key); @@ -115,7 +134,13 @@ export default class CasService { } /** - * Decrypts a buffer. + * Decrypts a buffer. Returns the buffer unchanged if `meta.encrypted` is falsy. + * @param {Object} options + * @param {Buffer} options.buffer - Ciphertext to decrypt. + * @param {Buffer} options.key - 32-byte encryption key. + * @param {{ encrypted: boolean, algorithm: string, nonce: string, tag: string }} options.meta - Encryption metadata from the manifest. + * @returns {Promise} Decrypted plaintext. + * @throws {CasError} INTEGRITY_ERROR if authentication tag verification fails. */ async decrypt({ buffer, key, meta }) { if (!meta?.encrypted) { @@ -162,11 +187,22 @@ export default class CasService { await this._chunkAndStore(source, manifestData); } - return new Manifest(manifestData); + const manifest = new Manifest(manifestData); + this.emit('file:stored', { + slug, size: manifest.size, chunkCount: manifest.chunks.length, encrypted: !!encryptionKey, + }); + return manifest; } /** - * Creates a Git tree from a manifest. + * Creates a Git tree object from a manifest. + * + * The tree contains the serialized manifest file and one blob entry per chunk, + * keyed by its SHA-256 digest. + * + * @param {Object} options + * @param {import('../value-objects/Manifest.js').default} options.manifest - The file manifest. + * @returns {Promise} Git OID of the created tree. */ async createTree({ manifest }) { const serializedManifest = this.codec.encode(manifest.toJSON()); @@ -181,19 +217,11 @@ export default class CasService { } /** - * Restores a file from its manifest by reading and reassembling chunks. - * - * If the manifest has encryption metadata, decrypts the reassembled - * ciphertext using the provided key. - * - * @param {Object} options - * @param {import('../value-objects/Manifest.js').default} options.manifest - * @param {Buffer} [options.encryptionKey] - * @returns {Promise<{ buffer: Buffer, bytesWritten: number }>} - */ - /** - * Reads chunk blobs and verifies their SHA-256 digests. + * Reads chunk blobs from Git and verifies their SHA-256 digests. * @private + * @param {import('../value-objects/Chunk.js').default[]} chunks - Chunk metadata from the manifest. + * @returns {Promise} Verified chunk buffers in order. + * @throws {CasError} INTEGRITY_ERROR if any chunk digest does not match. */ async _readAndVerifyChunks(chunks) { const buffers = []; @@ -201,17 +229,35 @@ export default class CasService { const blob = await this.persistence.readBlob(chunk.blob); const digest = await this._sha256(blob); if (digest !== chunk.digest) { - throw new CasError( + const err = new CasError( `Chunk ${chunk.index} integrity check failed`, 'INTEGRITY_ERROR', { chunkIndex: chunk.index, expected: chunk.digest, actual: digest }, ); + if (this.listenerCount('error') > 0) { + this.emit('error', { code: err.code, message: err.message }); + } + throw err; } buffers.push(blob); + this.emit('chunk:restored', { index: chunk.index, size: blob.length, digest: chunk.digest }); } return buffers; } + /** + * Restores a file from its manifest by reading and reassembling chunks. + * + * If the manifest has encryption metadata, decrypts the reassembled + * ciphertext using the provided key. + * + * @param {Object} options + * @param {import('../value-objects/Manifest.js').default} options.manifest - The file manifest. + * @param {Buffer} [options.encryptionKey] - 32-byte key, required if manifest is encrypted. + * @returns {Promise<{ buffer: Buffer, bytesWritten: number }>} + * @throws {CasError} MISSING_KEY if manifest is encrypted but no key is provided. + * @throws {CasError} INTEGRITY_ERROR if chunk verification or decryption fails. + */ async restore({ manifest, encryptionKey }) { if (encryptionKey) { this._validateKey(encryptionKey); @@ -239,6 +285,9 @@ export default class CasService { }); } + this.emit('file:restored', { + slug: manifest.slug, size: buffer.length, chunkCount: manifest.chunks.length, + }); return { buffer, bytesWritten: buffer.length }; } @@ -342,9 +391,13 @@ export default class CasService { const blob = await this.persistence.readBlob(chunk.blob); const digest = await this._sha256(blob); if (digest !== chunk.digest) { + this.emit('integrity:fail', { + slug: manifest.slug, chunkIndex: chunk.index, expected: chunk.digest, actual: digest, + }); return false; } } + this.emit('integrity:pass', { slug: manifest.slug }); return true; } } diff --git a/src/domain/value-objects/Chunk.js b/src/domain/value-objects/Chunk.js index ded5589..27dacaf 100644 --- a/src/domain/value-objects/Chunk.js +++ b/src/domain/value-objects/Chunk.js @@ -2,9 +2,25 @@ import { ChunkSchema } from '../schemas/ManifestSchema.js'; import { ZodError } from 'zod'; /** - * Value object representing a content chunk. + * Immutable value object representing a single content chunk. + * + * Validated against {@link ChunkSchema} on construction. Properties are + * assigned via `Object.assign` and the instance is frozen. + * + * @property {number} index - Zero-based position within the manifest. + * @property {number} size - Chunk size in bytes. + * @property {string} digest - 64-character SHA-256 hex digest of the chunk data. + * @property {string} blob - Git OID of the stored blob. */ export default class Chunk { + /** + * @param {Object} data - Raw chunk data (validated via Zod). + * @param {number} data.index - Zero-based chunk index. + * @param {number} data.size - Chunk size in bytes. + * @param {string} data.digest - SHA-256 hex digest. + * @param {string} data.blob - Git blob OID. + * @throws {Error} If data fails schema validation. + */ constructor(data) { try { ChunkSchema.parse(data); diff --git a/src/domain/value-objects/Manifest.js b/src/domain/value-objects/Manifest.js index c74196f..14b4412 100644 --- a/src/domain/value-objects/Manifest.js +++ b/src/domain/value-objects/Manifest.js @@ -3,9 +3,22 @@ import Chunk from './Chunk.js'; import { ZodError } from 'zod'; /** - * Value object representing a file manifest. + * Immutable value object representing a file manifest. + * + * Validated against {@link ManifestSchema} on construction. Contains the slug, + * filename, total size, an ordered array of {@link Chunk} objects, and optional + * encryption metadata. */ export default class Manifest { + /** + * @param {Object} data - Raw manifest data (validated via Zod). + * @param {string} data.slug - Logical identifier for the stored asset. + * @param {string} data.filename - Original filename. + * @param {number} data.size - Total size in bytes. + * @param {Array<{ index: number, size: number, digest: string, blob: string }>} data.chunks - Chunk metadata. + * @param {{ algorithm: string, nonce: string, tag: string, encrypted: boolean }} [data.encryption] - Encryption metadata. + * @throws {Error} If data fails schema validation. + */ constructor(data) { try { ManifestSchema.parse(data); @@ -23,6 +36,10 @@ export default class Manifest { } } + /** + * Serializes the manifest to a plain object suitable for JSON/CBOR encoding. + * @returns {{ slug: string, filename: string, size: number, chunks: Array, encryption?: Object }} + */ toJSON() { return { slug: this.slug, diff --git a/src/infrastructure/adapters/BunCryptoAdapter.js b/src/infrastructure/adapters/BunCryptoAdapter.js index aa95fea..046889d 100644 --- a/src/infrastructure/adapters/BunCryptoAdapter.js +++ b/src/infrastructure/adapters/BunCryptoAdapter.js @@ -6,19 +6,25 @@ import CasError from '../../domain/errors/CasError.js'; import { createCipheriv, createDecipheriv } from 'node:crypto'; /** - * Bun-native implementation of CryptoPort. - * Uses Bun.CryptoHasher for fast SHA-256 and globalThis.crypto for random bytes. + * Bun-native {@link CryptoPort} implementation. + * + * Uses `Bun.CryptoHasher` for fast SHA-256 hashing, `globalThis.crypto` + * for random bytes, and Node's `createCipheriv`/`createDecipheriv` for + * AES-256-GCM (Bun's Node compat layer is heavily optimized for these APIs). */ export default class BunCryptoAdapter extends CryptoPort { + /** @override */ async sha256(buf) { return new CryptoHasher('sha256').update(buf).digest('hex'); } + /** @override */ randomBytes(n) { const uint8 = globalThis.crypto.getRandomValues(new Uint8Array(n)); return Buffer.from(uint8.buffer, uint8.byteOffset, uint8.byteLength); } + /** @override */ async encryptBuffer(buffer, key) { this.#validateKey(key); const nonce = this.randomBytes(12); @@ -31,6 +37,7 @@ export default class BunCryptoAdapter extends CryptoPort { }; } + /** @override */ async decryptBuffer(buffer, key, meta) { this.#validateKey(key); const nonce = Buffer.from(meta.nonce, 'base64'); @@ -40,6 +47,7 @@ export default class BunCryptoAdapter extends CryptoPort { return Buffer.concat([decipher.update(buffer), decipher.final()]); } + /** @override */ createEncryptionStream(key) { this.#validateKey(key); const nonce = this.randomBytes(12); @@ -74,6 +82,11 @@ export default class BunCryptoAdapter extends CryptoPort { return { encrypt, finalize }; } + /** + * Validates that a key is a 32-byte Buffer or Uint8Array. + * @param {Buffer|Uint8Array} key + * @throws {CasError} INVALID_KEY_TYPE | INVALID_KEY_LENGTH + */ #validateKey(key) { if (!Buffer.isBuffer(key) && !(key instanceof Uint8Array)) { throw new CasError( @@ -90,6 +103,12 @@ export default class BunCryptoAdapter extends CryptoPort { } } + /** + * Builds the encryption metadata object. + * @param {Buffer|Uint8Array} nonce - 12-byte AES-GCM nonce. + * @param {Buffer} tag - 16-byte GCM authentication tag. + * @returns {{ algorithm: string, nonce: string, tag: string, encrypted: boolean }} + */ #buildMeta(nonce, tag) { return { algorithm: 'aes-256-gcm', diff --git a/src/infrastructure/adapters/GitPersistenceAdapter.js b/src/infrastructure/adapters/GitPersistenceAdapter.js index c50c270..d6df72e 100644 --- a/src/infrastructure/adapters/GitPersistenceAdapter.js +++ b/src/infrastructure/adapters/GitPersistenceAdapter.js @@ -2,6 +2,7 @@ import { Policy } from '@git-stunts/alfred'; import GitPersistencePort from '../../ports/GitPersistencePort.js'; import CasError from '../../domain/errors/CasError.js'; +/** Default resilience policy: 30 s timeout wrapping 2 retries with exponential backoff. */ const DEFAULT_POLICY = Policy.timeout(30_000).wrap( Policy.retry({ retries: 2, @@ -12,13 +13,16 @@ const DEFAULT_POLICY = Policy.timeout(30_000).wrap( ); /** - * Implementation of GitPersistencePort using GitPlumbing. + * {@link GitPersistencePort} implementation backed by `@git-stunts/plumbing`. + * + * All Git I/O is wrapped with a configurable resilience {@link Policy} + * (timeout + retry by default). */ export default class GitPersistenceAdapter extends GitPersistencePort { /** * @param {Object} options - * @param {import('../../../plumbing/index.js').default} options.plumbing - * @param {import('@git-stunts/alfred').Policy} [options.policy] - Resilience policy for Git I/O + * @param {import('@git-stunts/plumbing').default} options.plumbing - GitPlumbing instance. + * @param {import('@git-stunts/alfred').Policy} [options.policy] - Resilience policy (defaults to 30 s timeout + 2 retries). */ constructor({ plumbing, policy }) { super(); @@ -26,6 +30,7 @@ export default class GitPersistenceAdapter extends GitPersistencePort { this.policy = policy ?? DEFAULT_POLICY; } + /** @override */ async writeBlob(content) { return this.policy.execute(() => this.plumbing.execute({ @@ -35,6 +40,7 @@ export default class GitPersistenceAdapter extends GitPersistencePort { ); } + /** @override */ async writeTree(entries) { return this.policy.execute(() => this.plumbing.execute({ @@ -44,6 +50,7 @@ export default class GitPersistenceAdapter extends GitPersistencePort { ); } + /** @override */ async readBlob(oid) { return this.policy.execute(async () => { const stream = await this.plumbing.executeStream({ @@ -55,6 +62,7 @@ export default class GitPersistenceAdapter extends GitPersistencePort { }); } + /** @override */ async readTree(treeOid) { return this.policy.execute(async () => { const output = await this.plumbing.execute({ diff --git a/src/infrastructure/adapters/NodeCryptoAdapter.js b/src/infrastructure/adapters/NodeCryptoAdapter.js index d72391e..fc5d107 100644 --- a/src/infrastructure/adapters/NodeCryptoAdapter.js +++ b/src/infrastructure/adapters/NodeCryptoAdapter.js @@ -6,14 +6,17 @@ import CasError from '../../domain/errors/CasError.js'; * Node.js implementation of CryptoPort using node:crypto. */ export default class NodeCryptoAdapter extends CryptoPort { + /** @override */ sha256(buf) { return createHash('sha256').update(buf).digest('hex'); } + /** @override */ randomBytes(n) { return randomBytes(n); } + /** @override */ encryptBuffer(buffer, key) { this.#validateKey(key); const nonce = randomBytes(12); @@ -26,6 +29,7 @@ export default class NodeCryptoAdapter extends CryptoPort { }; } + /** @override */ decryptBuffer(buffer, key, meta) { const nonce = Buffer.from(meta.nonce, 'base64'); const tag = Buffer.from(meta.tag, 'base64'); @@ -34,6 +38,7 @@ export default class NodeCryptoAdapter extends CryptoPort { return Buffer.concat([decipher.update(buffer), decipher.final()]); } + /** @override */ createEncryptionStream(key) { this.#validateKey(key); const nonce = randomBytes(12); @@ -61,7 +66,9 @@ export default class NodeCryptoAdapter extends CryptoPort { } /** + * Validates that a key is a 32-byte Buffer. * @param {Buffer} key + * @throws {CasError} INVALID_KEY_TYPE | INVALID_KEY_LENGTH */ #validateKey(key) { if (!Buffer.isBuffer(key)) { @@ -80,8 +87,10 @@ export default class NodeCryptoAdapter extends CryptoPort { } /** - * @param {Buffer} nonce - * @param {Buffer} tag + * Builds the encryption metadata object. + * @param {Buffer} nonce - 12-byte AES-GCM nonce. + * @param {Buffer} tag - 16-byte GCM authentication tag. + * @returns {{ algorithm: string, nonce: string, tag: string, encrypted: boolean }} */ #buildMeta(nonce, tag) { return { diff --git a/src/infrastructure/adapters/WebCryptoAdapter.js b/src/infrastructure/adapters/WebCryptoAdapter.js index a558471..c9c8cb9 100644 --- a/src/infrastructure/adapters/WebCryptoAdapter.js +++ b/src/infrastructure/adapters/WebCryptoAdapter.js @@ -2,10 +2,14 @@ import CryptoPort from '../../ports/CryptoPort.js'; import CasError from '../../domain/errors/CasError.js'; /** - * Web Crypto implementation of CryptoPort. - * Works in Deno and other environments supporting standard Web Crypto. + * {@link CryptoPort} implementation using the Web Crypto API. + * + * Works in Deno, browsers, and other environments supporting `globalThis.crypto.subtle`. + * Note: streaming encryption buffers all data internally because Web Crypto's + * AES-GCM is a one-shot API (the GCM tag is computed over the entire plaintext). */ export default class WebCryptoAdapter extends CryptoPort { + /** @override */ async sha256(buf) { const hashBuffer = await globalThis.crypto.subtle.digest('SHA-256', buf); return Array.from(new Uint8Array(hashBuffer)) @@ -13,10 +17,16 @@ export default class WebCryptoAdapter extends CryptoPort { .join(''); } + /** @override */ randomBytes(n) { - return globalThis.crypto.getRandomValues(new Uint8Array(n)); + const uint8 = globalThis.crypto.getRandomValues(new Uint8Array(n)); + if (globalThis.Buffer) { + return Buffer.from(uint8.buffer, uint8.byteOffset, uint8.byteLength); + } + return uint8; } + /** @override */ async encryptBuffer(buffer, key) { this.#validateKey(key); const nonce = this.randomBytes(12); @@ -40,6 +50,7 @@ export default class WebCryptoAdapter extends CryptoPort { }; } + /** @override */ async decryptBuffer(buffer, key, meta) { const nonce = this.#fromBase64(meta.nonce); const tag = this.#fromBase64(meta.tag); @@ -62,6 +73,7 @@ export default class WebCryptoAdapter extends CryptoPort { } } + /** @override */ createEncryptionStream(key) { this.#validateKey(key); const nonce = this.randomBytes(12); @@ -105,6 +117,11 @@ export default class WebCryptoAdapter extends CryptoPort { return { encrypt, finalize }; } + /** + * Imports a raw key for use with Web Crypto AES-GCM operations. + * @param {Buffer|Uint8Array} rawKey - 32-byte raw key material. + * @returns {Promise} + */ async #importKey(rawKey) { return globalThis.crypto.subtle.importKey( 'raw', @@ -115,6 +132,11 @@ export default class WebCryptoAdapter extends CryptoPort { ); } + /** + * Validates that a key is a 32-byte Buffer or Uint8Array. + * @param {Buffer|Uint8Array} key + * @throws {CasError} INVALID_KEY_TYPE | INVALID_KEY_LENGTH + */ #validateKey(key) { if (!globalThis.Buffer?.isBuffer(key) && !(key instanceof Uint8Array)) { throw new CasError( @@ -131,6 +153,12 @@ export default class WebCryptoAdapter extends CryptoPort { } } + /** + * Builds the encryption metadata object. + * @param {Uint8Array} nonce - 12-byte AES-GCM nonce. + * @param {Uint8Array} tag - 16-byte GCM authentication tag. + * @returns {{ algorithm: string, nonce: string, tag: string, encrypted: boolean }} + */ #buildMeta(nonce, tag) { return { algorithm: 'aes-256-gcm', @@ -140,6 +168,11 @@ export default class WebCryptoAdapter extends CryptoPort { }; } + /** + * Encodes binary data to base64, using Buffer when available. + * @param {Uint8Array} buf + * @returns {string} + */ #toBase64(buf) { if (globalThis.Buffer) { return Buffer.from(buf).toString('base64'); @@ -147,6 +180,11 @@ export default class WebCryptoAdapter extends CryptoPort { return globalThis.btoa(String.fromCharCode(...new Uint8Array(buf))); } + /** + * Decodes a base64 string to binary, using Buffer when available. + * @param {string} str + * @returns {Buffer|Uint8Array} + */ #fromBase64(str) { if (globalThis.Buffer) { return Buffer.from(str, 'base64'); diff --git a/src/infrastructure/codecs/CborCodec.js b/src/infrastructure/codecs/CborCodec.js index fc52cdd..ec0d1b4 100644 --- a/src/infrastructure/codecs/CborCodec.js +++ b/src/infrastructure/codecs/CborCodec.js @@ -1,15 +1,21 @@ import CodecPort from '../../ports/CodecPort.js'; import { encode, decode } from 'cbor-x'; +/** + * {@link CodecPort} implementation that serializes manifests as CBOR (binary). + */ export default class CborCodec extends CodecPort { + /** @override */ encode(data) { return encode(data); } + /** @override */ decode(buffer) { return decode(buffer); } + /** @override */ get extension() { return 'cbor'; } diff --git a/src/infrastructure/codecs/JsonCodec.js b/src/infrastructure/codecs/JsonCodec.js index d9fbd13..8f219f5 100644 --- a/src/infrastructure/codecs/JsonCodec.js +++ b/src/infrastructure/codecs/JsonCodec.js @@ -1,16 +1,22 @@ import CodecPort from '../../ports/CodecPort.js'; +/** + * {@link CodecPort} implementation that serializes manifests as pretty-printed JSON. + */ export default class JsonCodec extends CodecPort { + /** @override */ encode(data) { // Determine if we need to handle Buffers specially for JSON // For now, we assume data is JSON-safe or uses toJSON() methods return JSON.stringify(data, null, 2); } + /** @override */ decode(buffer) { return JSON.parse(buffer.toString('utf8')); } + /** @override */ get extension() { return 'json'; } diff --git a/src/ports/CodecPort.js b/src/ports/CodecPort.js index 6ad45d5..c7d1235 100644 --- a/src/ports/CodecPort.js +++ b/src/ports/CodecPort.js @@ -1,5 +1,6 @@ /** - * Interface for encoding and decoding manifest data. + * Abstract interface for encoding and decoding manifest data. + * @abstract */ export default class CodecPort { /** diff --git a/src/ports/CryptoPort.js b/src/ports/CryptoPort.js index 2386d0d..c898591 100644 --- a/src/ports/CryptoPort.js +++ b/src/ports/CryptoPort.js @@ -1,5 +1,6 @@ /** - * Port for cryptographic operations. + * Abstract port for cryptographic operations (hashing, random bytes, AES-256-GCM). + * @abstract */ export default class CryptoPort { /** diff --git a/src/ports/GitPersistencePort.js b/src/ports/GitPersistencePort.js index 97d5e6f..59352d2 100644 --- a/src/ports/GitPersistencePort.js +++ b/src/ports/GitPersistencePort.js @@ -1,9 +1,11 @@ /** - * Port for persisting data to Git. + * Abstract port for persisting data to Git's object database. + * @abstract */ export default class GitPersistencePort { /** - * @param {Buffer|string} content + * Writes content as a Git blob object. + * @param {Buffer|string} content - Data to store. * @returns {Promise} The Git OID of the stored blob. */ async writeBlob(_content) { @@ -11,7 +13,8 @@ export default class GitPersistencePort { } /** - * @param {string[]} entries - Lines for git mktree. + * Creates a Git tree object from formatted entries. + * @param {string[]} entries - Lines in `git mktree` format. * @returns {Promise} The Git OID of the created tree. */ async writeTree(_entries) { @@ -19,16 +22,18 @@ export default class GitPersistencePort { } /** - * @param {string} oid - * @returns {Promise} + * Reads a Git blob by its OID. + * @param {string} oid - Git object ID. + * @returns {Promise} The blob content. */ async readBlob(_oid) { throw new Error('Not implemented'); } /** - * @param {string} treeOid - * @returns {Promise>} + * Reads and parses a Git tree object. + * @param {string} treeOid - Git tree OID. + * @returns {Promise>} Parsed tree entries. */ async readTree(_treeOid) { throw new Error('Not implemented'); diff --git a/test/benchmark/cas.bench.js b/test/benchmark/cas.bench.js index 4679d67..06d4365 100644 --- a/test/benchmark/cas.bench.js +++ b/test/benchmark/cas.bench.js @@ -1,16 +1,214 @@ import { bench, describe } from 'vitest'; +import { createHash, randomBytes } from 'node:crypto'; import CasService from '../../src/domain/services/CasService.js'; import NodeCryptoAdapter from '../../src/infrastructure/adapters/NodeCryptoAdapter.js'; import JsonCodec from '../../src/infrastructure/codecs/JsonCodec.js'; +import CborCodec from '../../src/infrastructure/codecs/CborCodec.js'; +import Manifest from '../../src/domain/value-objects/Manifest.js'; -const mockPersistence = { - writeBlob: async () => 'oid', - writeTree: async () => 'oid', - readBlob: async () => Buffer.alloc(0), -}; +const crypto = new NodeCryptoAdapter(); + +function digestOf(seed) { + return createHash('sha256').update(seed).digest('hex'); +} + +function createMockPersistence() { + const store = new Map(); + return { + writeBlob: async (content) => { + const buf = Buffer.isBuffer(content) ? content : Buffer.from(content); + const oid = await crypto.sha256(buf); + store.set(oid, buf); + return oid; + }, + writeTree: async () => 'mock-tree-oid', + readBlob: async (oid) => { + const buf = store.get(oid); + if (!buf) { throw new Error(`Blob not found: ${oid}`); } + return buf; + }, + }; +} + +async function storeBuffer(service, buf, opts = {}) { + async function* source() { yield buf; } + return service.store({ + source: source(), + slug: opts.slug || 'bench', + filename: opts.filename || 'bench.bin', + encryptionKey: opts.encryptionKey, + }); +} + +// Pre-generate test buffers +const buf1KB = randomBytes(1024); +const buf1MB = randomBytes(1024 * 1024); +const buf10MB = randomBytes(10 * 1024 * 1024); +const encryptionKey = randomBytes(32); + +// --------------------------------------------------------------------------- +// Store benchmarks +// --------------------------------------------------------------------------- +describe('store – plaintext', () => { + const service = new CasService({ persistence: createMockPersistence(), crypto, codec: new JsonCodec() }); + + bench('1MB', async () => { await storeBuffer(service, buf1MB); }); + bench('10MB', async () => { await storeBuffer(service, buf10MB); }); +}); + +describe('store – encrypted', () => { + const service = new CasService({ persistence: createMockPersistence(), crypto, codec: new JsonCodec() }); + + bench('1MB', async () => { await storeBuffer(service, buf1MB, { encryptionKey }); }); + bench('10MB', async () => { await storeBuffer(service, buf10MB, { encryptionKey }); }); +}); + +// --------------------------------------------------------------------------- +// Restore benchmarks +// --------------------------------------------------------------------------- +describe('restore – plaintext', () => { + const service = new CasService({ persistence: createMockPersistence(), crypto, codec: new JsonCodec() }); + let m1MB; + let m10MB; + + bench('1MB', async () => { + if (!m1MB) { m1MB = await storeBuffer(service, buf1MB); } + await service.restore({ manifest: m1MB }); + }); -describe('CasService Benchmarks', () => { - bench('service initialization', () => { - new CasService({ persistence: mockPersistence, crypto: new NodeCryptoAdapter(), codec: new JsonCodec() }); + bench('10MB', async () => { + if (!m10MB) { m10MB = await storeBuffer(service, buf10MB); } + await service.restore({ manifest: m10MB }); }); }); + +describe('restore – encrypted', () => { + const service = new CasService({ persistence: createMockPersistence(), crypto, codec: new JsonCodec() }); + let m1MB; + let m10MB; + + bench('1MB', async () => { + if (!m1MB) { m1MB = await storeBuffer(service, buf1MB, { encryptionKey }); } + await service.restore({ manifest: m1MB, encryptionKey }); + }); + + bench('10MB', async () => { + if (!m10MB) { m10MB = await storeBuffer(service, buf10MB, { encryptionKey }); } + await service.restore({ manifest: m10MB, encryptionKey }); + }); +}); + +// --------------------------------------------------------------------------- +// createTree benchmarks +// --------------------------------------------------------------------------- +function makeManifest(chunkCount) { + return new Manifest({ + slug: 'bench', + filename: 'bench.bin', + size: chunkCount * 1024, + chunks: Array.from({ length: chunkCount }, (_, i) => ({ + index: i, size: 1024, digest: digestOf(`chunk-${i}`), blob: `blob-oid-${i}`, + })), + }); +} + +describe('createTree', () => { + const service = new CasService({ persistence: createMockPersistence(), crypto, codec: new JsonCodec() }); + const m10 = makeManifest(10); + const m100 = makeManifest(100); + const m1000 = makeManifest(1000); + + bench('10 chunks', async () => { await service.createTree({ manifest: m10 }); }); + bench('100 chunks', async () => { await service.createTree({ manifest: m100 }); }); + bench('1000 chunks', async () => { await service.createTree({ manifest: m1000 }); }); +}); + +// --------------------------------------------------------------------------- +// verifyIntegrity benchmarks +// --------------------------------------------------------------------------- +describe('verifyIntegrity – 10 chunks', () => { + const service = new CasService({ persistence: createMockPersistence(), crypto, codec: new JsonCodec(), chunkSize: 1024 }); + let manifest; + + bench('10 chunks', async () => { + if (!manifest) { manifest = await storeBuffer(service, randomBytes(10 * 1024)); } + await service.verifyIntegrity(manifest); + }); +}); + +describe('verifyIntegrity – 100 chunks', () => { + const service = new CasService({ persistence: createMockPersistence(), crypto, codec: new JsonCodec(), chunkSize: 1024 }); + let manifest; + + bench('100 chunks', async () => { + if (!manifest) { manifest = await storeBuffer(service, randomBytes(100 * 1024)); } + await service.verifyIntegrity(manifest); + }); +}); + +// --------------------------------------------------------------------------- +// Encrypt/decrypt benchmarks +// --------------------------------------------------------------------------- +describe('encrypt', () => { + const service = new CasService({ persistence: createMockPersistence(), crypto, codec: new JsonCodec() }); + + bench('1KB', async () => { await service.encrypt({ buffer: buf1KB, key: encryptionKey }); }); + bench('1MB', async () => { await service.encrypt({ buffer: buf1MB, key: encryptionKey }); }); + bench('10MB', async () => { await service.encrypt({ buffer: buf10MB, key: encryptionKey }); }); +}); + +describe('decrypt – 1KB', () => { + const service = new CasService({ persistence: createMockPersistence(), crypto, codec: new JsonCodec() }); + let enc; + + bench('1KB', async () => { + if (!enc) { enc = await service.encrypt({ buffer: buf1KB, key: encryptionKey }); } + await service.decrypt({ buffer: enc.buf, key: encryptionKey, meta: enc.meta }); + }); +}); + +describe('decrypt – 1MB', () => { + const service = new CasService({ persistence: createMockPersistence(), crypto, codec: new JsonCodec() }); + let enc; + + bench('1MB', async () => { + if (!enc) { enc = await service.encrypt({ buffer: buf1MB, key: encryptionKey }); } + await service.decrypt({ buffer: enc.buf, key: encryptionKey, meta: enc.meta }); + }); +}); + +describe('decrypt – 10MB', () => { + const service = new CasService({ persistence: createMockPersistence(), crypto, codec: new JsonCodec() }); + let enc; + + bench('10MB', async () => { + if (!enc) { enc = await service.encrypt({ buffer: buf10MB, key: encryptionKey }); } + await service.decrypt({ buffer: enc.buf, key: encryptionKey, meta: enc.meta }); + }); +}); + +// --------------------------------------------------------------------------- +// Codec benchmarks +// --------------------------------------------------------------------------- +const codecData = { + slug: 'bench', filename: 'bench.bin', size: 1024000, + chunks: Array.from({ length: 100 }, (_, i) => ({ + index: i, size: 10240, digest: digestOf(`c-${i}`), blob: `oid-${i}`, + })), +}; + +describe('JsonCodec', () => { + const codec = new JsonCodec(); + const encoded = codec.encode(codecData); + + bench('encode (100 chunks)', () => { codec.encode(codecData); }); + bench('decode (100 chunks)', () => { codec.decode(encoded); }); +}); + +describe('CborCodec', () => { + const codec = new CborCodec(); + const encoded = codec.encode(codecData); + + bench('encode (100 chunks)', () => { codec.encode(codecData); }); + bench('decode (100 chunks)', () => { codec.decode(encoded); }); +}); diff --git a/test/unit/domain/services/CasService.events.test.js b/test/unit/domain/services/CasService.events.test.js new file mode 100644 index 0000000..021845a --- /dev/null +++ b/test/unit/domain/services/CasService.events.test.js @@ -0,0 +1,220 @@ +import { describe, it, expect, vi } from 'vitest'; +import { randomBytes } from 'node:crypto'; +import CasService from '../../../../src/domain/services/CasService.js'; +import NodeCryptoAdapter from '../../../../src/infrastructure/adapters/NodeCryptoAdapter.js'; +import JsonCodec from '../../../../src/infrastructure/codecs/JsonCodec.js'; +import CasError from '../../../../src/domain/errors/CasError.js'; + +function setup() { + const crypto = new NodeCryptoAdapter(); + const blobStore = new Map(); + + const mockPersistence = { + writeBlob: vi.fn().mockImplementation(async (content) => { + const buf = Buffer.isBuffer(content) ? content : Buffer.from(content); + const oid = await crypto.sha256(buf); + blobStore.set(oid, buf); + return oid; + }), + writeTree: vi.fn().mockResolvedValue('mock-tree-oid'), + readBlob: vi.fn().mockImplementation(async (oid) => { + const buf = blobStore.get(oid); + if (!buf) { throw new Error(`Blob not found: ${oid}`); } + return buf; + }), + }; + + const service = new CasService({ + persistence: mockPersistence, + crypto, + codec: new JsonCodec(), + chunkSize: 1024, + }); + + return { crypto, blobStore, mockPersistence, service }; +} + +async function storeBuffer(svc, buf, opts = {}) { + async function* source() { yield buf; } + return svc.store({ + source: source(), + slug: opts.slug || 'test', + filename: opts.filename || 'test.bin', + encryptionKey: opts.encryptionKey, + }); +} + +describe('CasService events – chunk:stored', () => { + it('emits chunk:stored per chunk with correct payload', async () => { + const { service } = setup(); + const onChunkStored = vi.fn(); + service.on('chunk:stored', onChunkStored); + + await storeBuffer(service, randomBytes(2048)); + + expect(onChunkStored).toHaveBeenCalledTimes(2); + expect(onChunkStored).toHaveBeenNthCalledWith(1, expect.objectContaining({ + index: 0, size: 1024, digest: expect.any(String), blob: expect.any(String), + })); + expect(onChunkStored).toHaveBeenNthCalledWith(2, expect.objectContaining({ + index: 1, size: 1024, digest: expect.any(String), blob: expect.any(String), + })); + }); +}); + +describe('CasService events – file:stored', () => { + it('emits file:stored once with correct payload', async () => { + const { service } = setup(); + const onFileStored = vi.fn(); + service.on('file:stored', onFileStored); + + await storeBuffer(service, randomBytes(2048)); + + expect(onFileStored).toHaveBeenCalledTimes(1); + expect(onFileStored).toHaveBeenCalledWith(expect.objectContaining({ + slug: 'test', size: 2048, chunkCount: 2, encrypted: false, + })); + }); + + it('emits encrypted=true when encryption used', async () => { + const { service } = setup(); + const onFileStored = vi.fn(); + service.on('file:stored', onFileStored); + + await storeBuffer(service, randomBytes(1024), { encryptionKey: randomBytes(32) }); + + expect(onFileStored).toHaveBeenCalledWith(expect.objectContaining({ encrypted: true })); + }); +}); + +describe('CasService events – chunk:restored', () => { + it('emits chunk:restored per chunk with correct payload', async () => { + const { service } = setup(); + const manifest = await storeBuffer(service, randomBytes(2048)); + + const onChunkRestored = vi.fn(); + service.on('chunk:restored', onChunkRestored); + await service.restore({ manifest }); + + expect(onChunkRestored).toHaveBeenCalledTimes(2); + expect(onChunkRestored).toHaveBeenNthCalledWith(1, expect.objectContaining({ + index: 0, size: 1024, digest: expect.any(String), + })); + expect(onChunkRestored).toHaveBeenNthCalledWith(2, expect.objectContaining({ + index: 1, size: 1024, digest: expect.any(String), + })); + }); +}); + +describe('CasService events – file:restored', () => { + it('emits file:restored once with correct payload', async () => { + const { service } = setup(); + const manifest = await storeBuffer(service, randomBytes(2048)); + + const onFileRestored = vi.fn(); + service.on('file:restored', onFileRestored); + await service.restore({ manifest }); + + expect(onFileRestored).toHaveBeenCalledTimes(1); + expect(onFileRestored).toHaveBeenCalledWith(expect.objectContaining({ + slug: 'test', size: 2048, chunkCount: 2, + })); + }); +}); + +describe('CasService events – integrity:pass', () => { + it('emits integrity:pass on successful verification', async () => { + const { service } = setup(); + const manifest = await storeBuffer(service, randomBytes(2048)); + + const onPass = vi.fn(); + service.on('integrity:pass', onPass); + await service.verifyIntegrity(manifest); + + expect(onPass).toHaveBeenCalledTimes(1); + expect(onPass).toHaveBeenCalledWith(expect.objectContaining({ slug: 'test' })); + }); +}); + +describe('CasService events – integrity:fail', () => { + it('emits integrity:fail on chunk mismatch', async () => { + const { service, blobStore } = setup(); + const manifest = await storeBuffer(service, randomBytes(2048)); + + blobStore.set(manifest.chunks[0].blob, Buffer.from('corrupted')); + + const onFail = vi.fn(); + service.on('integrity:fail', onFail); + await service.verifyIntegrity(manifest); + + expect(onFail).toHaveBeenCalledTimes(1); + expect(onFail).toHaveBeenCalledWith(expect.objectContaining({ + slug: 'test', chunkIndex: 0, expected: expect.any(String), actual: expect.any(String), + })); + }); +}); + +describe('CasService events – error on restore integrity failure', () => { + it('emits error event on integrity failure during restore', async () => { + const { service, blobStore } = setup(); + const manifest = await storeBuffer(service, randomBytes(1024)); + + blobStore.set(manifest.chunks[0].blob, Buffer.from('corrupted')); + + const onError = vi.fn(); + service.on('error', onError); + + await expect(service.restore({ manifest })).rejects.toThrow(CasError); + + expect(onError).toHaveBeenCalledTimes(1); + expect(onError).toHaveBeenCalledWith(expect.objectContaining({ + code: expect.any(String), message: expect.any(String), + })); + }); +}); + +describe('CasService events – no listeners attached', () => { + it('store succeeds without listeners', async () => { + const { service } = setup(); + await expect(storeBuffer(service, randomBytes(2048))).resolves.toBeDefined(); + }); + + it('restore succeeds without listeners', async () => { + const { service } = setup(); + const manifest = await storeBuffer(service, randomBytes(1024)); + await expect(service.restore({ manifest })).resolves.toBeDefined(); + }); + + it('verifyIntegrity succeeds without listeners', async () => { + const { service } = setup(); + const manifest = await storeBuffer(service, randomBytes(1024)); + await expect(service.verifyIntegrity(manifest)).resolves.toBe(true); + }); +}); + +describe('CasService events – event count verification', () => { + it('emits 3 chunk:stored for 3-chunk file', async () => { + const { service } = setup(); + const listener = vi.fn(); + service.on('chunk:stored', listener); + await storeBuffer(service, randomBytes(3072)); + expect(listener).toHaveBeenCalledTimes(3); + }); + + it('emits 3 chunk:restored for 3-chunk file', async () => { + const { service } = setup(); + const manifest = await storeBuffer(service, randomBytes(3072)); + const listener = vi.fn(); + service.on('chunk:restored', listener); + await service.restore({ manifest }); + expect(listener).toHaveBeenCalledTimes(3); + }); + + it('emits 1 chunk:stored for sub-chunk file', async () => { + const { service } = setup(); + const listener = vi.fn(); + service.on('chunk:stored', listener); + await storeBuffer(service, randomBytes(512)); + expect(listener).toHaveBeenCalledTimes(1); + }); +});