context-core — Implementation Progress

Status: v0 core complete — test consolidation and verification function remain

The core library compiles, all 37 tests pass, and the three-phase selection pipeline (score → order → budget) is fully operational. The document model, cache builder, and selection engine match the normative specs. Dead code and duplicates have been cleaned up. All spec compliance gaps have been resolved (either by code fix or spec clarification). What remains is duplicate test consolidation, edge case coverage, and a standalone cache verification function.

Completed

Document Model (`document/`)

Cache System (`cache/`)

Selection Engine (`selection/`)

Types (`types/`)

Query — normalized query with raw + terms
SelectedDocument — final output document with score, tokens, why
SelectionMetadata — query, budget, tokens_used, counts
SelectionResult — top-level result container
ScoredDocument — internal reference-based scored document (avoids premature cloning)
ScoreDetails — internal scoring components
SelectionError — InvalidBudget, CacheError
DocumentId, DocumentVersion — identity and versioning types

Error Handling

DocumentError — InvalidUtf8
DocumentIdError — OutsideRoot, InvalidUtf8
CacheBuildError — Io, Serialization, OutputExists, FilenameCollision, InvalidVersionFormat, DuplicateDocumentId
SelectionError — InvalidBudget, CacheError

Cleanup

Removed duplicate selection/selector.rs (inlined copy of selection/mod.rs logic)
Removed duplicate selection/types.rs (copy of types/context_bundle.rs)
Removed duplicate document/id.rs and document/version.rs (copies of types/identifiers.rs)
Removed 6 legacy re-export shims (cache/builder.rs, cache/manifest.rs, cache/config.rs, cache/index.rs, selection/scorer.rs, selection/tokenizer.rs)
Removed migrated mcp/ module (MCP error types live in mcp-context-server)
Removed unused errors.rs (CoreError wrapper had no consumers)
Removed empty tests/mcp_error_schema.rs

Spec Compliance Fixes

CacheIndex serialization: added #[serde(transparent)] so index.json serializes as flat map (was wrapped in {"entries": {...}})
Duplicate document ID detection: CacheBuilder::build() now rejects duplicate IDs after sorting
Document version verification on load: load_documents() recomputes content hash and compares against manifest entry
document_model.md: metadata extraction and frontmatter parsing scoped to post-v0
context_selection.md: output schema aligned to normative context.resolve.md
context_selection.md: removed documents_excluded_by_score (v0 doesn't exclude by score)
milestone_zero.md: output contract fixed (metadata → selection, added missing fields, referenced normative spec)
milestone_zero.md: changed false "provenance" claim to "version and scoring explanation"

Test Coverage (37 tests, all passing)

Remaining Work

P1 — Functional gaps

Cache verification function — context_cache.md specifies a verification operation that checks:
1. Manifest exists and is valid JSON
2. Cache version matches recomputed hash
3. Every document file exists
4. Every document file hash matches its filename
5. No orphan files in documents/
No standalone verify_cache() function exists. Individual checks are partially covered by load_documents() (checks 1, 3, 4 via version verification) but there is no single function that runs all 5 checks and reports results. Needed by both the CLI inspect --verify and MCP inspect_cache tool.

P1 — Enterprise Ingestion Foundation (see `context-specs/plans/enterprise_ingest_plan.md` Phase 0)

DocumentSource trait + RawDocument type — Define connector interface in document::source module. All enterprise connectors implement this trait. RawDocument carries pre-ingestion content + metadata.
ConnectorError type — Error variants: AuthenticationFailed, FetchFailed, InvalidContent, PartialFetch.
Canonicalization utilities — document::canonicalize module: line ending normalization, trailing whitespace trimming, trailing empty line removal, Unicode NFC normalization. Deterministic ordering of all transforms.
FilesystemSource reference connector — Migrate existing walkdir-based ingestion to DocumentSource trait. Must produce byte-identical caches to current build path.
ingest_from_source() pipeline — Orchestrates: source.fetch_documents() → UTF-8 validation → Document::ingest(). Configurable error policy (skip-and-warn vs abort-all).
unicode-normalization dependency — Add with default-features = false for NFC normalization.

P2 — Test gaps

Duplicate test consolidation — Several test files contain identical or near-identical tests:
- cache_lifecycle.rs and document_model.rs share 5+ identical tests
- cache_manifest.rs duplicates 2 tests from cache_lifecycle.rs
- context_selection.rs and selection_logic.rs contain the same 3 tests
- determinism.rs and golden_serialization.rs share tests
- end_to_end_golden.rs duplicates tests from cache_lifecycle.rs
Consider consolidating to avoid maintenance burden and test confusion.
Cache rebuild determinism — No test verifies that building a cache twice from the same documents produces byte-identical manifest.json (the created_at timestamp will differ). The cache_version field will match, but the full file will not. This is spec-correct (created_at is informational) but should be explicitly tested.
Duplicate document ID test — No test exercises the new DuplicateDocumentId error path.
Version verification test — No test exercises the version mismatch detection in load_documents() (e.g., corrupt a document file after build, verify load fails).
Edge cases not covered:
- Empty document set (build cache with 0 documents)
- Single document cache
- Very large document (multi-MB content)
- Document with empty content ("")
- Query with special characters, punctuation
- Budget of 1 (smaller than any document)
- All documents have score 0.0

P3 — Nice to have

context inspect support — Expose an inspect_cache() function returning cache metadata (document count, total size, cache version, validity). Needed by the MCP inspect_cache tool and CLI.
Cache rebuild (force) — CacheBuilder rejects existing output dirs. A rebuild() method or --force equivalent that removes and rebuilds would match the spec's rebuild command.
Deserialize for Query — Query derives Clone and Debug but not Deserialize. Adding it would allow JSON deserialization of queries (useful for test fixtures).
Document field ordering guarantee — Spec says documents are serialized with fixed field order (id, version, source, content, metadata). Serde's default struct serialization preserves declaration order, which matches the spec. But this is implicit — a #[serde(rename_all)] or field reorder would silently break it. Consider adding a golden test that asserts field order explicitly.

Spec Issues — All Resolved

#	Issue	Resolution
1	`documents_excluded_by_score` in selection output	Removed from `context_selection.md`; `context.resolve.md` is normative and doesn't include it
2	`metadata` vs `selection` key in output	Updated `milestone_zero.md` to use `"selection"`
3	`cache_version` in output	Updated `milestone_zero.md` to match normative spec (no `cache_version`)
4	Automatic metadata extraction scope	Deferred to post-v0 in `document_model.md`
5	MCP error types — single source of truth	Deleted from context-core; MCP types live in `mcp-context-server`

File Inventory

context-core/
├── Cargo.toml
├── progress.md                      ← this file
├── spec_refs.md
├── src/
│   ├── lib.rs                       module declarations
│   │
│   ├── types/
│   │   ├── mod.rs                   re-exports
│   │   ├── identifiers.rs          DocumentId, DocumentVersion
│   │   └── context_bundle.rs       Query, SelectionResult, etc.
│   │
│   ├── document/
│   │   ├── mod.rs                   re-exports
│   │   ├── document.rs             Document struct + ingest()
│   │   ├── metadata.rs             Metadata, MetadataValue
│   │   └── parser.rs               placeholder (future parsing hooks)
│   │
│   ├── cache/
│   │   ├── mod.rs                   re-exports
│   │   ├── cache.rs                ContextCache (runtime read-only wrapper)
│   │   ├── versioning.rs           CacheManifest, CacheBuildConfig, CacheIndex
│   │   └── invalidation.rs         CacheBuilder (build logic)
│   │
│   ├── selection/
│   │   ├── mod.rs                   ContextSelector + three-phase pipeline
│   │   ├── ranking.rs              Scorer, TermFrequencyScorer, TokenCounter
│   │   ├── budgeting.rs            apply_budget (greedy selection)
│   │   └── filters.rs              placeholder (future filtering)
│   │
│   └── compression/
│       ├── mod.rs                   module declaration
│       └── summarizer.rs           placeholder (future compression)
│
└── tests/
    ├── cache_invariants.rs          2 tests — index sorting, collision
    ├── cache_lifecycle.rs           10 tests — determinism, config, corruption
    ├── cache_manifest.rs            2 tests — version determinism, config changes
    ├── context_selection.rs         3 tests — budget, sorting, ties
    ├── determinism.rs               4 tests — serialization + e2e determinism
    ├── document_model.rs            6 tests — document invariants
    ├── end_to_end_golden.rs         2 tests — manifest bytes, corruption
    ├── golden_selection_contract.rs 1 test — output structure
    ├── golden_selection_logic.rs    1 test — e2e selection determinism
    ├── golden_serialization.rs      2 tests — serialization snapshots
    ├── selection_invariants.rs      1 test — bounds + explainability
    └── selection_logic.rs           3 tests — budget, sorting, ties

Dependencies

[dependencies]
sha2 = "0.10"              # SHA-256 hashing
hex = "0.4"                # Hex encoding
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
thiserror = "1.0"          # Error derive macros
chrono = { version = "0.4", features = ["serde", "clock"], default-features = false }  # created_at timestamps

[dev-dependencies]
tempfile = "3.24.0"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

context-core — Implementation Progress

Status: v0 core complete — test consolidation and verification function remain

Completed

Document Model (`document/`)

Cache System (`cache/`)

Selection Engine (`selection/`)

Types (`types/`)

Error Handling

Cleanup

Spec Compliance Fixes

Test Coverage (37 tests, all passing)

Remaining Work

P1 — Functional gaps

P1 — Enterprise Ingestion Foundation (see `context-specs/plans/enterprise_ingest_plan.md` Phase 0)

P2 — Test gaps

P3 — Nice to have

Spec Issues — All Resolved

File Inventory

Dependencies

FilesExpand file tree

progress.md

Latest commit

History

progress.md

File metadata and controls

context-core — Implementation Progress

Status: v0 core complete — test consolidation and verification function remain

Completed

Document Model (document/)

Cache System (cache/)

Selection Engine (selection/)

Types (types/)

Error Handling

Cleanup

Spec Compliance Fixes

Test Coverage (37 tests, all passing)

Remaining Work

P1 — Functional gaps

P1 — Enterprise Ingestion Foundation (see context-specs/plans/enterprise_ingest_plan.md Phase 0)

P2 — Test gaps

P3 — Nice to have

Spec Issues — All Resolved

File Inventory

Dependencies

Document Model (`document/`)

Cache System (`cache/`)

Selection Engine (`selection/`)

Types (`types/`)

P1 — Enterprise Ingestion Foundation (see `context-specs/plans/enterprise_ingest_plan.md` Phase 0)