-
-
Notifications
You must be signed in to change notification settings - Fork 1
Bulky loads #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bulky loads #52
Conversation
- Fix chunk_meta ordering ledger mismatch (don’t auto-materialize tokenIds) - Fix records embeddings file resolution - Make tooling doctor treat missing pyright as warn unless explicitly enabled - Avoid DEP0190 by not passing args arrays with shell:true - Default dictionaries dir to unversioned cache root - Reduce clangd log noise; tree-sitter batch preload adjustments - Add BUILDLIST + build_index logs for repro
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8c03229976
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
* Centralize token classification to main thread * Tree-sitter: global VFS batched scheduler * Tree-sitter: add CSS config * tree sitter and tree sitter swift package patches fuck you and the person who does a poor job of maintaining this * tree-sitter native: enable javascript grammar smoke parse * tree-sitter native: enable typescript and tsx grammar smoke parse * tree-sitter native: enable python grammar smoke parse * tree-sitter native: enable json grammar smoke parse * tree-sitter native: enable yaml grammar smoke parse * tree-sitter native: enable toml grammar smoke parse * tree-sitter native: enable markdown grammar smoke parse * tree-sitter native: enable kotlin grammar smoke parse * tree-sitter native: enable csharp grammar smoke parse * tree-sitter native: enable c/clike grammar smoke parse * tree-sitter native: enable cpp grammar smoke parse * tree-sitter native: enable objc grammar smoke parse * tree-sitter native: enable go grammar smoke parse * tree-sitter native: enable rust grammar smoke parse * tree-sitter native: enable java grammar smoke parse * tree-sitter native: enable css grammar smoke parse * tree-sitter native: enable html grammar smoke parse * woodroadmap N0: lock native-only scheduler decisions * woodroadmap N1: migrate scheduler schema from wasmKey to grammarKey * woodroadmap N2: route scheduler planning through native targets * woodroadmap N3: switch scheduler executor to native-only parsing * woodroadmap N4: enforce scheduler-only stage1 tree-sitter contract * Add native tree-sitter scheduler N5 test suite * Complete N6 native-only tree-sitter cutover * tree-sitter: simplify native runtime and remove wasm-era caps * tree-sitter: normalize scheduler and worker wording * shared: harden regex serialization and subprocess arg quoting * tests: restore ann backends and regenerate inventories * tree-sitter: map jsx segments to javascript native grammar * embeddings: defer backend loading during runtime init * Finalize native tree-sitter scheduler fixes and CI bootstrap updates * Fix native tree-sitter JS/JSX grammar export resolution * Finalize native tree-sitter scheduler and CI probe updates * Unify cache root paths and harden embeddings/runtime diagnostics * Log unresolved import samples during relations resolution Collect unresolved import samples even when graph output is disabled and return them from resolveImportLinks. Emit bounded unresolved import sample lines in postScanImports alongside the aggregate import summary counts. * Fix CI cache-hit patching and harden native tree-sitter contracts Workflows now include patch files in node_modules cache keys and skip npm run patch on cache hits while still rebuilding native modules. Replaced the tree-sitter patch with a minimal source-only binding.gyp diff to avoid cross-platform patch-package failures. Upgraded scheduler native smoke/language tests to a shared per-language metadata contract suite covering fixture routing and chunk metadata invariants. * Reduce SourceKit LSP hover timeouts and honor provider config Removed the fixed 8s hover timeout cap in collectLspTypes and made hover timeout configurable per provider. Added sourcekit-specific timeout/retry/breaker/hover controls with safer defaults and enabled passthrough of tooling.sourcekit/pyright config in getToolingConfig. Validated with sourcekit provider fallback/output-shape and LSP enrichment tests. * Split script-coverage into grouped tests and fix summary parity fallback * Align CI/doc/tests with current native tree-sitter and cache behavior * Allowlist script-coverage env vars for config budget * cleanup * Improve SourceKit hover resilience and Swift signature parsing - add hover throttling, adaptive timeout disable, per-file budgets, and metrics in LSP collector - decouple SourceKit hover timeout, prefer non-asserts binary resolution, and add host-level concurrency gate for test runs - improve Swift signature parsing coverage to reduce hover dependence and add focused parser tests - include atomic JSON stream replace-path hardening updates in src/shared/json-stream/atomic.js * Ropiary hedge (#54) * Add duplication verification and detailed DUPEMAP execution plan - add DUPEMAP.md with frontloaded, dependency-gated phase structure and granular subphase tasks - add refreshed duplication_consolidation_report.md verification section with confirmed cluster statuses and additional findings - include All_Findings.md in this snapshot as requested * Expand findings execution plan and extract postinstall script Summary of included changes: - Added comprehensive findings expansion review in All_Findings.md (Part 5), including src/** full-coverage accounting, 9-batch parallel review split, and 18 confirmed path-level findings with severity/impact/fix direction. - Expanded DUPEMAP.md into a unified dedupe + findings remediation program with explicit F0-F9 phases, detailed subphases, touchpoints, tests, dependencies, and acceptance gates. - Reordered work into a unified, frontloaded wave sequence (U0-U5) so foundational cross-cutting work is executed first and downstream work is simplified. - Added mandatory D/F coupling (touch-once execution) to integrate deduplication and findings fixes in the same module families and avoid second-pass rewrites. - Added performance-focused planning refinements: perf budget artifact, baseline/delta capture, bounded-memory enforcement, hotspot-first prioritization, concurrency/backpressure contracts, and CI perf trend/top-offender gates. - Added explicit mapping for all Part 5 src/** findings to concrete remediation phases and touchpoints. - Updated indexer crash logger enablement to respect runtime.debugCrash gate in src/index/build/indexer/pipeline.js. - Replaced inline package.json postinstall command with dedicated tools/setup/postinstall.js script that preserves behavior while handling --omit=dev safely (skip when patch-package is unavailable). - Ran repository formatting (npm run format) before commit. * D0.1: add dupemap migration manifest baseline - Added docs/tooling/dupemap-migration-manifest.json with schemaVersion, clusters, migrations, banPatterns, and exceptions sections. - Seeded all 27 roadmap clusters plus 4 additional verified clusters with concrete legacy/canonical paths. - Added explicit exception semantics requiring reason + expiry phase and forbidding permanent exceptions. - Updated DUPEMAP.md to mark D0 in progress and D0.1 tasks complete. * Complete D0: shift to fix-first execution - Removed remaining D0 scanner/audit tooling work and finalized fix-first sequencing tasks (D0.2, D0.3). - Marked D0 exit criteria complete and recorded D0.DOC no-doc-change rationale with timestamp. - Updated phase summary status for D0 to completed. - Kept D0 focused on baseline mapping + execution kickoff, with no new scanner/audit scripts. * Reopen prematurely completed D0 checklist items Restore D0 status/checklists to in-progress so only implemented work is marked complete. * docs(dupemap): lock in D0.2 fix-first sequencing * docs(dupemap): complete D0.3 lane-only enforcement and close D0 * docs(dupemap): complete F0.1 findings mapping baseline * docs(dupemap): complete F0.2 ownership and closure criteria * docs(dupemap): complete F0.3 discipline gates and close F0 * feat(d1.1): consolidate upward walkups and path containment helpers * feat(d1.2): unify warn-once and lru cache primitives * feat(d1.3): unify disk-space and watch normalization primitives * feat(d1.4): centralize locks and misc primitives * feat(d1.5): add atomic write cache policy lifecycle primitives * feat(d2.1): unify merge helpers and shared JSONL readers * feat(d2.2): unify writer scaffolding and jsonl extension helpers * feat(d4.1): unify ann backend normalization and provider gating * feat(d4.2): unify api+mcp search request and meta normalization * feat(d4.3): consolidate api+mcp repo cache policy and manager * feat(d5.1): centralize tooling binary and typescript loader helpers * feat(d5.2): share signature splitters and read-signature helpers * feat(d5.3): share js/ts relations callee and location helpers * feat(d3.2): unify sqlite tool helpers and noop task factory * feat(d3.1): extract shared sqlite build core for artifacts and bundles * feat(d3.3): unify sqlite quantization and vocab helpers * feat(d3.4): consolidate lmdb utility helpers * feat(d6.1): extract shared chunking helper primitives * feat(d6.2): unify risk shared helper primitives * feat(d6.3): unify import candidates and map shared helpers * test(d7.1): extract shared ann pipeline scenario helper * test(d7.2): consolidate interprocedural flow cap scenarios * test(d7.3): consolidate vfs/sqlite streaming fixtures * build(bootstrap): rebuild optional native modules safely * test(d7.4): extract shared graph symbol sqlite harness fixtures * refactor(d7.5): dedupe map bench viewer and build options * refactor(d8.1): consolidate Ajv validation scaffolding * refactor(d8.2): consolidate download redirect fetch helper * chore(d8.3): lock migration sweeps docs sync and ci lanes * docs(dupemap): remove subphase d8.4 * Fix LSP param enrichment for clangd canonical signatures * Fix subprocess quoting and parity fixture path regressions * Preserve trailing-slash import semantics for relative resolution * Chainsaw bird (#55) * Complete F1 lifecycle/runtime correctness and contracts * Complete F2 language and chunking correctness contracts * Complete F3 artifact and storage I/O crash-safety * Complete F4 retrieval ANN embeddings correctness and boundedness * Complete F5 tooling LSP service resilience and diagnostics hygiene * Complete F6 map graph context-pack correctness and cleanup safety * Complete F7 security path and input hardening * Complete F8 contract evidence and src coverage lock * Enforce postinstall patching and relax native rebuild requirements * Cleanup * Close D7 roadmap gates and mark phase docs complete * Fix ci-lite VFS disk safety contract and reorder lane timings * Reorder ci lane from timings and refresh config inventory * Accelerate stage3 embeddings pipeline for sparse-mode builds * Reorder ci-long lane by latest timing data * Allowlist service subprocess config env vars * Reset ANN provider cooldown on empty successful queries * Preserve legacy findings wrapper handling in triage ingest * Skip stage4 promotion for explicit index-root runs * perf(indexing): speed up html/swift import hot paths * perf(lang): sweep collectors and relations hot paths * Add artifact I/O tracing and harden HTML/CSS import collectors * cleanup * Allow trace artifact env var in config budget allowlist * Route trace artifact flag through shared env config * Always run stage3 artifact validation for empty outputs * cleanup * Do not enforce default fetch timeout for downloads * Fix manifest-aware chunk metadata loading for triage builds * Keep isomap client imports within served /isomap modules * Harden manifest validation against transient file handoff * Honor parse-skip in scheduler plan and relax postinstall for omit-dev * Skip native rebuild on node_modules cache hits in CI workflows * Consolidate workflow contract coverage into one test
…eats into BULKY_LOADS # Conflicts: # src/index/build/file-processor/process-chunks/index.js # src/index/build/workers/indexer-worker.js
|
@codex pls review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 749e9459f7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
* Expand Phase 16.6 & 16.7 * Implement Stage1 token/postings core * Add Stage1 postings backpressure * Phase 16.6.3: Stage1 postings bench + regressions - Add postings-real + chargram benchmark contract tests - Add Stage1 chunk_meta/vocab_order determinism regression - Promote heap plateau + add Stage1 memory budget regression - Document Stage1 bench usage - Fix scheduler nested proc deadlock + vocab_order artifact payload * docs: update config inventory * Phase 16.7.1: Streamed graph_relations build * Phase 16.7.2: Filter index bitmaps + repo_map safer writes * Phase 16.7.3: Stage2 benches + regression tests * roadmap: update 16.6/16.7 checkboxes * Fix token postings meta schema and guard test * Fix sqlite-vec extension path resolution * roadmap: fix 16.8 embeddings touchpoints and tasks * roadmap: tighten 16.9 sqlite build tasks * roadmap: tighten 16.10 vfs throughput tasks * Fix segment chunking cache reuse for VFS virtualRange * roadmap: tighten 16.11 tree-sitter throughput tasks * roadmap: tighten 16.12 graph/context-pack throughput tasks * roadmap: tighten 16.9.3-16.12.3 bench/contract tasks * fix(validate): honor vocab_order fields + ledger phrase/chargram hashes * tests: exit after --help * ci: write junit to .testLogs * Bulky loads (#52) * embeddings(cache): fast reject + safe flush * embeddings: bounded writer queue + batch autotune * embeddings: add bench + determinism/memory tests * sqlite(build): bulk load transaction + multi-row inserts * sqlite(schema): contentless fts + index plan * sqlite(build): add bench + contract tests * Phase 16.10.1: VFS segment IO * Phase 16.10.2: VFS merge/compaction * Phase 16.10.3: VFS tests + bench contracts * Phase 16.11.1: tree-sitter runtime caching * Phase 16.11.2: tree-sitter scheduling + cache reuse * Phase 16.11.3: tree-sitter tests + bench * Phase 16.12.1: GraphStore CSR load * Phase 16.12.2: traversal + streaming context-pack * Phase 16.12.3: tests + bench * Fix ci-long timeouts + script coverage * Phase 16.7: graph_relations dedupe + excluded-file reject * Build triage: ordering drift, embeddings paths, tooling spawn - Fix chunk_meta ordering ledger mismatch (don’t auto-materialize tokenIds) - Fix records embeddings file resolution - Make tooling doctor treat missing pyright as warn unless explicitly enabled - Avoid DEP0190 by not passing args arrays with shell:true - Default dictionaries dir to unversioned cache root - Reduce clangd log noise; tree-sitter batch preload adjustments - Add BUILDLIST + build_index logs for repro * Roadmap: mark 16.14/16.6 complete * Stage2: schedule relations with build scheduler * Stage2: schedule relations IO and harden filter_index reuse * Phase 16.7: harden relations/filter_index and add regression tests * tools: add bench runner harness * Phase 16.15: bench harness + output contracts * Phase 16.15: usage checklist * tests: avoid env var in graph plateau gc child * Centralize token classification to main thread * Wood sitter (#53) * Centralize token classification to main thread * Tree-sitter: global VFS batched scheduler * Tree-sitter: add CSS config * tree sitter and tree sitter swift package patches fuck you and the person who does a poor job of maintaining this * tree-sitter native: enable javascript grammar smoke parse * tree-sitter native: enable typescript and tsx grammar smoke parse * tree-sitter native: enable python grammar smoke parse * tree-sitter native: enable json grammar smoke parse * tree-sitter native: enable yaml grammar smoke parse * tree-sitter native: enable toml grammar smoke parse * tree-sitter native: enable markdown grammar smoke parse * tree-sitter native: enable kotlin grammar smoke parse * tree-sitter native: enable csharp grammar smoke parse * tree-sitter native: enable c/clike grammar smoke parse * tree-sitter native: enable cpp grammar smoke parse * tree-sitter native: enable objc grammar smoke parse * tree-sitter native: enable go grammar smoke parse * tree-sitter native: enable rust grammar smoke parse * tree-sitter native: enable java grammar smoke parse * tree-sitter native: enable css grammar smoke parse * tree-sitter native: enable html grammar smoke parse * woodroadmap N0: lock native-only scheduler decisions * woodroadmap N1: migrate scheduler schema from wasmKey to grammarKey * woodroadmap N2: route scheduler planning through native targets * woodroadmap N3: switch scheduler executor to native-only parsing * woodroadmap N4: enforce scheduler-only stage1 tree-sitter contract * Add native tree-sitter scheduler N5 test suite * Complete N6 native-only tree-sitter cutover * tree-sitter: simplify native runtime and remove wasm-era caps * tree-sitter: normalize scheduler and worker wording * shared: harden regex serialization and subprocess arg quoting * tests: restore ann backends and regenerate inventories * tree-sitter: map jsx segments to javascript native grammar * embeddings: defer backend loading during runtime init * Finalize native tree-sitter scheduler fixes and CI bootstrap updates * Fix native tree-sitter JS/JSX grammar export resolution * Finalize native tree-sitter scheduler and CI probe updates * Unify cache root paths and harden embeddings/runtime diagnostics * Log unresolved import samples during relations resolution Collect unresolved import samples even when graph output is disabled and return them from resolveImportLinks. Emit bounded unresolved import sample lines in postScanImports alongside the aggregate import summary counts. * Fix CI cache-hit patching and harden native tree-sitter contracts Workflows now include patch files in node_modules cache keys and skip npm run patch on cache hits while still rebuilding native modules. Replaced the tree-sitter patch with a minimal source-only binding.gyp diff to avoid cross-platform patch-package failures. Upgraded scheduler native smoke/language tests to a shared per-language metadata contract suite covering fixture routing and chunk metadata invariants. * Reduce SourceKit LSP hover timeouts and honor provider config Removed the fixed 8s hover timeout cap in collectLspTypes and made hover timeout configurable per provider. Added sourcekit-specific timeout/retry/breaker/hover controls with safer defaults and enabled passthrough of tooling.sourcekit/pyright config in getToolingConfig. Validated with sourcekit provider fallback/output-shape and LSP enrichment tests. * Split script-coverage into grouped tests and fix summary parity fallback * Align CI/doc/tests with current native tree-sitter and cache behavior * Allowlist script-coverage env vars for config budget * cleanup * Improve SourceKit hover resilience and Swift signature parsing - add hover throttling, adaptive timeout disable, per-file budgets, and metrics in LSP collector - decouple SourceKit hover timeout, prefer non-asserts binary resolution, and add host-level concurrency gate for test runs - improve Swift signature parsing coverage to reduce hover dependence and add focused parser tests - include atomic JSON stream replace-path hardening updates in src/shared/json-stream/atomic.js * Ropiary hedge (#54) * Add duplication verification and detailed DUPEMAP execution plan - add DUPEMAP.md with frontloaded, dependency-gated phase structure and granular subphase tasks - add refreshed duplication_consolidation_report.md verification section with confirmed cluster statuses and additional findings - include All_Findings.md in this snapshot as requested * Expand findings execution plan and extract postinstall script Summary of included changes: - Added comprehensive findings expansion review in All_Findings.md (Part 5), including src/** full-coverage accounting, 9-batch parallel review split, and 18 confirmed path-level findings with severity/impact/fix direction. - Expanded DUPEMAP.md into a unified dedupe + findings remediation program with explicit F0-F9 phases, detailed subphases, touchpoints, tests, dependencies, and acceptance gates. - Reordered work into a unified, frontloaded wave sequence (U0-U5) so foundational cross-cutting work is executed first and downstream work is simplified. - Added mandatory D/F coupling (touch-once execution) to integrate deduplication and findings fixes in the same module families and avoid second-pass rewrites. - Added performance-focused planning refinements: perf budget artifact, baseline/delta capture, bounded-memory enforcement, hotspot-first prioritization, concurrency/backpressure contracts, and CI perf trend/top-offender gates. - Added explicit mapping for all Part 5 src/** findings to concrete remediation phases and touchpoints. - Updated indexer crash logger enablement to respect runtime.debugCrash gate in src/index/build/indexer/pipeline.js. - Replaced inline package.json postinstall command with dedicated tools/setup/postinstall.js script that preserves behavior while handling --omit=dev safely (skip when patch-package is unavailable). - Ran repository formatting (npm run format) before commit. * D0.1: add dupemap migration manifest baseline - Added docs/tooling/dupemap-migration-manifest.json with schemaVersion, clusters, migrations, banPatterns, and exceptions sections. - Seeded all 27 roadmap clusters plus 4 additional verified clusters with concrete legacy/canonical paths. - Added explicit exception semantics requiring reason + expiry phase and forbidding permanent exceptions. - Updated DUPEMAP.md to mark D0 in progress and D0.1 tasks complete. * Complete D0: shift to fix-first execution - Removed remaining D0 scanner/audit tooling work and finalized fix-first sequencing tasks (D0.2, D0.3). - Marked D0 exit criteria complete and recorded D0.DOC no-doc-change rationale with timestamp. - Updated phase summary status for D0 to completed. - Kept D0 focused on baseline mapping + execution kickoff, with no new scanner/audit scripts. * Reopen prematurely completed D0 checklist items Restore D0 status/checklists to in-progress so only implemented work is marked complete. * docs(dupemap): lock in D0.2 fix-first sequencing * docs(dupemap): complete D0.3 lane-only enforcement and close D0 * docs(dupemap): complete F0.1 findings mapping baseline * docs(dupemap): complete F0.2 ownership and closure criteria * docs(dupemap): complete F0.3 discipline gates and close F0 * feat(d1.1): consolidate upward walkups and path containment helpers * feat(d1.2): unify warn-once and lru cache primitives * feat(d1.3): unify disk-space and watch normalization primitives * feat(d1.4): centralize locks and misc primitives * feat(d1.5): add atomic write cache policy lifecycle primitives * feat(d2.1): unify merge helpers and shared JSONL readers * feat(d2.2): unify writer scaffolding and jsonl extension helpers * feat(d4.1): unify ann backend normalization and provider gating * feat(d4.2): unify api+mcp search request and meta normalization * feat(d4.3): consolidate api+mcp repo cache policy and manager * feat(d5.1): centralize tooling binary and typescript loader helpers * feat(d5.2): share signature splitters and read-signature helpers * feat(d5.3): share js/ts relations callee and location helpers * feat(d3.2): unify sqlite tool helpers and noop task factory * feat(d3.1): extract shared sqlite build core for artifacts and bundles * feat(d3.3): unify sqlite quantization and vocab helpers * feat(d3.4): consolidate lmdb utility helpers * feat(d6.1): extract shared chunking helper primitives * feat(d6.2): unify risk shared helper primitives * feat(d6.3): unify import candidates and map shared helpers * test(d7.1): extract shared ann pipeline scenario helper * test(d7.2): consolidate interprocedural flow cap scenarios * test(d7.3): consolidate vfs/sqlite streaming fixtures * build(bootstrap): rebuild optional native modules safely * test(d7.4): extract shared graph symbol sqlite harness fixtures * refactor(d7.5): dedupe map bench viewer and build options * refactor(d8.1): consolidate Ajv validation scaffolding * refactor(d8.2): consolidate download redirect fetch helper * chore(d8.3): lock migration sweeps docs sync and ci lanes * docs(dupemap): remove subphase d8.4 * Fix LSP param enrichment for clangd canonical signatures * Fix subprocess quoting and parity fixture path regressions * Preserve trailing-slash import semantics for relative resolution * Chainsaw bird (#55) * Complete F1 lifecycle/runtime correctness and contracts * Complete F2 language and chunking correctness contracts * Complete F3 artifact and storage I/O crash-safety * Complete F4 retrieval ANN embeddings correctness and boundedness * Complete F5 tooling LSP service resilience and diagnostics hygiene * Complete F6 map graph context-pack correctness and cleanup safety * Complete F7 security path and input hardening * Complete F8 contract evidence and src coverage lock * Enforce postinstall patching and relax native rebuild requirements * Cleanup * Close D7 roadmap gates and mark phase docs complete * Fix ci-lite VFS disk safety contract and reorder lane timings * Reorder ci lane from timings and refresh config inventory * Accelerate stage3 embeddings pipeline for sparse-mode builds * Reorder ci-long lane by latest timing data * Allowlist service subprocess config env vars * Reset ANN provider cooldown on empty successful queries * Preserve legacy findings wrapper handling in triage ingest * Skip stage4 promotion for explicit index-root runs * perf(indexing): speed up html/swift import hot paths * perf(lang): sweep collectors and relations hot paths * Add artifact I/O tracing and harden HTML/CSS import collectors * cleanup * Allow trace artifact env var in config budget allowlist * Route trace artifact flag through shared env config * Always run stage3 artifact validation for empty outputs * cleanup * Do not enforce default fetch timeout for downloads * Fix manifest-aware chunk metadata loading for triage builds * Keep isomap client imports within served /isomap modules * Harden manifest validation against transient file handoff * Honor parse-skip in scheduler plan and relax postinstall for omit-dev * Skip native rebuild on node_modules cache hits in CI workflows * Consolidate workflow contract coverage into one test * Add shell metachar quoting regression coverage for subprocess wrapper * Fix embeddings cache index lock/merge semantics * remove parity from ci-long * Fix postinstall patch/rebuild order and add install contract coverage
* Expand Phase 16.13 artifact pipeline tasks * Expand Phase 16.14 index state/file meta/minhash tasks * Phase 16.13 artifact IO offsets, swaps, and benches * Phase 16.14: file_meta loader + bench contracts * Roadmap: add full streaming tasks for 16.13/16.14 * Phase 16: streaming artifact IO + file_meta loaders * Run sqlite stage in-process and stop legacy cache purges * Fix artifact load materialization and cache roots * Phase 16.14.5 streaming file_meta * Fix JSONL array merge stack overflow * Concurrent backshots (#51) * Expand Phase 16.6 & 16.7 * Implement Stage1 token/postings core * Add Stage1 postings backpressure * Phase 16.6.3: Stage1 postings bench + regressions - Add postings-real + chargram benchmark contract tests - Add Stage1 chunk_meta/vocab_order determinism regression - Promote heap plateau + add Stage1 memory budget regression - Document Stage1 bench usage - Fix scheduler nested proc deadlock + vocab_order artifact payload * docs: update config inventory * Phase 16.7.1: Streamed graph_relations build * Phase 16.7.2: Filter index bitmaps + repo_map safer writes * Phase 16.7.3: Stage2 benches + regression tests * roadmap: update 16.6/16.7 checkboxes * Fix token postings meta schema and guard test * Fix sqlite-vec extension path resolution * roadmap: fix 16.8 embeddings touchpoints and tasks * roadmap: tighten 16.9 sqlite build tasks * roadmap: tighten 16.10 vfs throughput tasks * Fix segment chunking cache reuse for VFS virtualRange * roadmap: tighten 16.11 tree-sitter throughput tasks * roadmap: tighten 16.12 graph/context-pack throughput tasks * roadmap: tighten 16.9.3-16.12.3 bench/contract tasks * fix(validate): honor vocab_order fields + ledger phrase/chargram hashes * tests: exit after --help * ci: write junit to .testLogs * Bulky loads (#52) * embeddings(cache): fast reject + safe flush * embeddings: bounded writer queue + batch autotune * embeddings: add bench + determinism/memory tests * sqlite(build): bulk load transaction + multi-row inserts * sqlite(schema): contentless fts + index plan * sqlite(build): add bench + contract tests * Phase 16.10.1: VFS segment IO * Phase 16.10.2: VFS merge/compaction * Phase 16.10.3: VFS tests + bench contracts * Phase 16.11.1: tree-sitter runtime caching * Phase 16.11.2: tree-sitter scheduling + cache reuse * Phase 16.11.3: tree-sitter tests + bench * Phase 16.12.1: GraphStore CSR load * Phase 16.12.2: traversal + streaming context-pack * Phase 16.12.3: tests + bench * Fix ci-long timeouts + script coverage * Phase 16.7: graph_relations dedupe + excluded-file reject * Build triage: ordering drift, embeddings paths, tooling spawn - Fix chunk_meta ordering ledger mismatch (don’t auto-materialize tokenIds) - Fix records embeddings file resolution - Make tooling doctor treat missing pyright as warn unless explicitly enabled - Avoid DEP0190 by not passing args arrays with shell:true - Default dictionaries dir to unversioned cache root - Reduce clangd log noise; tree-sitter batch preload adjustments - Add BUILDLIST + build_index logs for repro * Roadmap: mark 16.14/16.6 complete * Stage2: schedule relations with build scheduler * Stage2: schedule relations IO and harden filter_index reuse * Phase 16.7: harden relations/filter_index and add regression tests * tools: add bench runner harness * Phase 16.15: bench harness + output contracts * Phase 16.15: usage checklist * tests: avoid env var in graph plateau gc child * Centralize token classification to main thread * Wood sitter (#53) * Centralize token classification to main thread * Tree-sitter: global VFS batched scheduler * Tree-sitter: add CSS config * tree sitter and tree sitter swift package patches fuck you and the person who does a poor job of maintaining this * tree-sitter native: enable javascript grammar smoke parse * tree-sitter native: enable typescript and tsx grammar smoke parse * tree-sitter native: enable python grammar smoke parse * tree-sitter native: enable json grammar smoke parse * tree-sitter native: enable yaml grammar smoke parse * tree-sitter native: enable toml grammar smoke parse * tree-sitter native: enable markdown grammar smoke parse * tree-sitter native: enable kotlin grammar smoke parse * tree-sitter native: enable csharp grammar smoke parse * tree-sitter native: enable c/clike grammar smoke parse * tree-sitter native: enable cpp grammar smoke parse * tree-sitter native: enable objc grammar smoke parse * tree-sitter native: enable go grammar smoke parse * tree-sitter native: enable rust grammar smoke parse * tree-sitter native: enable java grammar smoke parse * tree-sitter native: enable css grammar smoke parse * tree-sitter native: enable html grammar smoke parse * woodroadmap N0: lock native-only scheduler decisions * woodroadmap N1: migrate scheduler schema from wasmKey to grammarKey * woodroadmap N2: route scheduler planning through native targets * woodroadmap N3: switch scheduler executor to native-only parsing * woodroadmap N4: enforce scheduler-only stage1 tree-sitter contract * Add native tree-sitter scheduler N5 test suite * Complete N6 native-only tree-sitter cutover * tree-sitter: simplify native runtime and remove wasm-era caps * tree-sitter: normalize scheduler and worker wording * shared: harden regex serialization and subprocess arg quoting * tests: restore ann backends and regenerate inventories * tree-sitter: map jsx segments to javascript native grammar * embeddings: defer backend loading during runtime init * Finalize native tree-sitter scheduler fixes and CI bootstrap updates * Fix native tree-sitter JS/JSX grammar export resolution * Finalize native tree-sitter scheduler and CI probe updates * Unify cache root paths and harden embeddings/runtime diagnostics * Log unresolved import samples during relations resolution Collect unresolved import samples even when graph output is disabled and return them from resolveImportLinks. Emit bounded unresolved import sample lines in postScanImports alongside the aggregate import summary counts. * Fix CI cache-hit patching and harden native tree-sitter contracts Workflows now include patch files in node_modules cache keys and skip npm run patch on cache hits while still rebuilding native modules. Replaced the tree-sitter patch with a minimal source-only binding.gyp diff to avoid cross-platform patch-package failures. Upgraded scheduler native smoke/language tests to a shared per-language metadata contract suite covering fixture routing and chunk metadata invariants. * Reduce SourceKit LSP hover timeouts and honor provider config Removed the fixed 8s hover timeout cap in collectLspTypes and made hover timeout configurable per provider. Added sourcekit-specific timeout/retry/breaker/hover controls with safer defaults and enabled passthrough of tooling.sourcekit/pyright config in getToolingConfig. Validated with sourcekit provider fallback/output-shape and LSP enrichment tests. * Split script-coverage into grouped tests and fix summary parity fallback * Align CI/doc/tests with current native tree-sitter and cache behavior * Allowlist script-coverage env vars for config budget * cleanup * Improve SourceKit hover resilience and Swift signature parsing - add hover throttling, adaptive timeout disable, per-file budgets, and metrics in LSP collector - decouple SourceKit hover timeout, prefer non-asserts binary resolution, and add host-level concurrency gate for test runs - improve Swift signature parsing coverage to reduce hover dependence and add focused parser tests - include atomic JSON stream replace-path hardening updates in src/shared/json-stream/atomic.js * Ropiary hedge (#54) * Add duplication verification and detailed DUPEMAP execution plan - add DUPEMAP.md with frontloaded, dependency-gated phase structure and granular subphase tasks - add refreshed duplication_consolidation_report.md verification section with confirmed cluster statuses and additional findings - include All_Findings.md in this snapshot as requested * Expand findings execution plan and extract postinstall script Summary of included changes: - Added comprehensive findings expansion review in All_Findings.md (Part 5), including src/** full-coverage accounting, 9-batch parallel review split, and 18 confirmed path-level findings with severity/impact/fix direction. - Expanded DUPEMAP.md into a unified dedupe + findings remediation program with explicit F0-F9 phases, detailed subphases, touchpoints, tests, dependencies, and acceptance gates. - Reordered work into a unified, frontloaded wave sequence (U0-U5) so foundational cross-cutting work is executed first and downstream work is simplified. - Added mandatory D/F coupling (touch-once execution) to integrate deduplication and findings fixes in the same module families and avoid second-pass rewrites. - Added performance-focused planning refinements: perf budget artifact, baseline/delta capture, bounded-memory enforcement, hotspot-first prioritization, concurrency/backpressure contracts, and CI perf trend/top-offender gates. - Added explicit mapping for all Part 5 src/** findings to concrete remediation phases and touchpoints. - Updated indexer crash logger enablement to respect runtime.debugCrash gate in src/index/build/indexer/pipeline.js. - Replaced inline package.json postinstall command with dedicated tools/setup/postinstall.js script that preserves behavior while handling --omit=dev safely (skip when patch-package is unavailable). - Ran repository formatting (npm run format) before commit. * D0.1: add dupemap migration manifest baseline - Added docs/tooling/dupemap-migration-manifest.json with schemaVersion, clusters, migrations, banPatterns, and exceptions sections. - Seeded all 27 roadmap clusters plus 4 additional verified clusters with concrete legacy/canonical paths. - Added explicit exception semantics requiring reason + expiry phase and forbidding permanent exceptions. - Updated DUPEMAP.md to mark D0 in progress and D0.1 tasks complete. * Complete D0: shift to fix-first execution - Removed remaining D0 scanner/audit tooling work and finalized fix-first sequencing tasks (D0.2, D0.3). - Marked D0 exit criteria complete and recorded D0.DOC no-doc-change rationale with timestamp. - Updated phase summary status for D0 to completed. - Kept D0 focused on baseline mapping + execution kickoff, with no new scanner/audit scripts. * Reopen prematurely completed D0 checklist items Restore D0 status/checklists to in-progress so only implemented work is marked complete. * docs(dupemap): lock in D0.2 fix-first sequencing * docs(dupemap): complete D0.3 lane-only enforcement and close D0 * docs(dupemap): complete F0.1 findings mapping baseline * docs(dupemap): complete F0.2 ownership and closure criteria * docs(dupemap): complete F0.3 discipline gates and close F0 * feat(d1.1): consolidate upward walkups and path containment helpers * feat(d1.2): unify warn-once and lru cache primitives * feat(d1.3): unify disk-space and watch normalization primitives * feat(d1.4): centralize locks and misc primitives * feat(d1.5): add atomic write cache policy lifecycle primitives * feat(d2.1): unify merge helpers and shared JSONL readers * feat(d2.2): unify writer scaffolding and jsonl extension helpers * feat(d4.1): unify ann backend normalization and provider gating * feat(d4.2): unify api+mcp search request and meta normalization * feat(d4.3): consolidate api+mcp repo cache policy and manager * feat(d5.1): centralize tooling binary and typescript loader helpers * feat(d5.2): share signature splitters and read-signature helpers * feat(d5.3): share js/ts relations callee and location helpers * feat(d3.2): unify sqlite tool helpers and noop task factory * feat(d3.1): extract shared sqlite build core for artifacts and bundles * feat(d3.3): unify sqlite quantization and vocab helpers * feat(d3.4): consolidate lmdb utility helpers * feat(d6.1): extract shared chunking helper primitives * feat(d6.2): unify risk shared helper primitives * feat(d6.3): unify import candidates and map shared helpers * test(d7.1): extract shared ann pipeline scenario helper * test(d7.2): consolidate interprocedural flow cap scenarios * test(d7.3): consolidate vfs/sqlite streaming fixtures * build(bootstrap): rebuild optional native modules safely * test(d7.4): extract shared graph symbol sqlite harness fixtures * refactor(d7.5): dedupe map bench viewer and build options * refactor(d8.1): consolidate Ajv validation scaffolding * refactor(d8.2): consolidate download redirect fetch helper * chore(d8.3): lock migration sweeps docs sync and ci lanes * docs(dupemap): remove subphase d8.4 * Fix LSP param enrichment for clangd canonical signatures * Fix subprocess quoting and parity fixture path regressions * Preserve trailing-slash import semantics for relative resolution * Chainsaw bird (#55) * Complete F1 lifecycle/runtime correctness and contracts * Complete F2 language and chunking correctness contracts * Complete F3 artifact and storage I/O crash-safety * Complete F4 retrieval ANN embeddings correctness and boundedness * Complete F5 tooling LSP service resilience and diagnostics hygiene * Complete F6 map graph context-pack correctness and cleanup safety * Complete F7 security path and input hardening * Complete F8 contract evidence and src coverage lock * Enforce postinstall patching and relax native rebuild requirements * Cleanup * Close D7 roadmap gates and mark phase docs complete * Fix ci-lite VFS disk safety contract and reorder lane timings * Reorder ci lane from timings and refresh config inventory * Accelerate stage3 embeddings pipeline for sparse-mode builds * Reorder ci-long lane by latest timing data * Allowlist service subprocess config env vars * Reset ANN provider cooldown on empty successful queries * Preserve legacy findings wrapper handling in triage ingest * Skip stage4 promotion for explicit index-root runs * perf(indexing): speed up html/swift import hot paths * perf(lang): sweep collectors and relations hot paths * Add artifact I/O tracing and harden HTML/CSS import collectors * cleanup * Allow trace artifact env var in config budget allowlist * Route trace artifact flag through shared env config * Always run stage3 artifact validation for empty outputs * cleanup * Do not enforce default fetch timeout for downloads * Fix manifest-aware chunk metadata loading for triage builds * Keep isomap client imports within served /isomap modules * Harden manifest validation against transient file handoff * Honor parse-skip in scheduler plan and relax postinstall for omit-dev * Skip native rebuild on node_modules cache hits in CI workflows * Consolidate workflow contract coverage into one test * Add shell metachar quoting regression coverage for subprocess wrapper * Fix embeddings cache index lock/merge semantics * remove parity from ci-long * Fix postinstall patch/rebuild order and add install contract coverage * Cleanup * Fix packed minhash row buffer aliasing in artifact loader
* Add Phase 16.0 specs - draft scheduler, artifact IO, cache key, ledger, spill/merge, byte budget, and ordering specs - mark Phase 16.0 spec tasks complete in SWEET16 roadmap * Add build scheduler core config and tests * Wire build scheduler into stages * Integrate scheduler into embeddings pipeline - disable max-lines lint rule to allow larger pipeline changes - feed raw argv into build-embeddings config for scheduler resolution - add embeddings scheduler helper + per-file embedding pipeline - wire Stage3 embeddings compute/IO through scheduler queues with backpressure fallback - gate cache/artifact IO via scheduler queues and log starvation counts - add perf test for embeddings scheduler backpressure and update SWEET16 roadmap - document embeddings scheduler queues in concurrency/runtime and perf audit specs * Add scheduler benchmarks and perf tests * Add JSONL reader fast paths and tests * Unify offsets metadata and validation * Unify JSONL writer guards and pipeline test * Cache manifest/meta reads in loaders * Add trusted JSONL validation mode * Add unified cache key builder schema * Unify embeddings cache keys with schema * Unify cache keys for file_meta/import/VFS - add file_meta cache key metadata and spec - apply unified cache keys to import-resolution and VFS caches - invalidate import cache on file set changes and add regression test - document cache key usage in VFS specs and roadmap touchpoints * Harden artifact loader fallbacks - pass bounded concurrency to JSONL array loads - fall back to full scan when per-file index is invalid - mark Phase 16.2.4 loader tasks complete * Add artifact-io read benchmarks - add artifact-io-read benchmark for parallel JSONL loads - extend jsonl-offset-index to support real index paths - mark Phase 16.2.5 bench tasks complete * Version cache roots and add clear-cache - version cache root by cache key schema and purge legacy layouts - add cache-rebuild CLI/env flag and cache migration test - add tooling cache eviction and clear-cache command - update config schema and cache specs * Unify local cache key construction * Complete cache key bench and sampling * Add ordering ledger core * Add ordering helpers * Wire ordering helpers and ledger hashes * Validate ordering ledger mismatches * Add ordering ledger bench and hash tests * Add shared merge core foundation * Adopt shared merge core for postings * Adopt shared merge core for vfs/relations * Add byte budget policy plumbing * Phase 16.5.5 spill/merge benches and byte budget tests * Route cache rebuild flag through env helper * Split stage1 scheduler queues and clamp token pools * Guard cache rebuild purge per process * Piping my artifact (#50) * Expand Phase 16.13 artifact pipeline tasks * Expand Phase 16.14 index state/file meta/minhash tasks * Phase 16.13 artifact IO offsets, swaps, and benches * Phase 16.14: file_meta loader + bench contracts * Roadmap: add full streaming tasks for 16.13/16.14 * Phase 16: streaming artifact IO + file_meta loaders * Run sqlite stage in-process and stop legacy cache purges * Fix artifact load materialization and cache roots * Phase 16.14.5 streaming file_meta * Fix JSONL array merge stack overflow * Concurrent backshots (#51) * Expand Phase 16.6 & 16.7 * Implement Stage1 token/postings core * Add Stage1 postings backpressure * Phase 16.6.3: Stage1 postings bench + regressions - Add postings-real + chargram benchmark contract tests - Add Stage1 chunk_meta/vocab_order determinism regression - Promote heap plateau + add Stage1 memory budget regression - Document Stage1 bench usage - Fix scheduler nested proc deadlock + vocab_order artifact payload * docs: update config inventory * Phase 16.7.1: Streamed graph_relations build * Phase 16.7.2: Filter index bitmaps + repo_map safer writes * Phase 16.7.3: Stage2 benches + regression tests * roadmap: update 16.6/16.7 checkboxes * Fix token postings meta schema and guard test * Fix sqlite-vec extension path resolution * roadmap: fix 16.8 embeddings touchpoints and tasks * roadmap: tighten 16.9 sqlite build tasks * roadmap: tighten 16.10 vfs throughput tasks * Fix segment chunking cache reuse for VFS virtualRange * roadmap: tighten 16.11 tree-sitter throughput tasks * roadmap: tighten 16.12 graph/context-pack throughput tasks * roadmap: tighten 16.9.3-16.12.3 bench/contract tasks * fix(validate): honor vocab_order fields + ledger phrase/chargram hashes * tests: exit after --help * ci: write junit to .testLogs * Bulky loads (#52) * embeddings(cache): fast reject + safe flush * embeddings: bounded writer queue + batch autotune * embeddings: add bench + determinism/memory tests * sqlite(build): bulk load transaction + multi-row inserts * sqlite(schema): contentless fts + index plan * sqlite(build): add bench + contract tests * Phase 16.10.1: VFS segment IO * Phase 16.10.2: VFS merge/compaction * Phase 16.10.3: VFS tests + bench contracts * Phase 16.11.1: tree-sitter runtime caching * Phase 16.11.2: tree-sitter scheduling + cache reuse * Phase 16.11.3: tree-sitter tests + bench * Phase 16.12.1: GraphStore CSR load * Phase 16.12.2: traversal + streaming context-pack * Phase 16.12.3: tests + bench * Fix ci-long timeouts + script coverage * Phase 16.7: graph_relations dedupe + excluded-file reject * Build triage: ordering drift, embeddings paths, tooling spawn - Fix chunk_meta ordering ledger mismatch (don’t auto-materialize tokenIds) - Fix records embeddings file resolution - Make tooling doctor treat missing pyright as warn unless explicitly enabled - Avoid DEP0190 by not passing args arrays with shell:true - Default dictionaries dir to unversioned cache root - Reduce clangd log noise; tree-sitter batch preload adjustments - Add BUILDLIST + build_index logs for repro * Roadmap: mark 16.14/16.6 complete * Stage2: schedule relations with build scheduler * Stage2: schedule relations IO and harden filter_index reuse * Phase 16.7: harden relations/filter_index and add regression tests * tools: add bench runner harness * Phase 16.15: bench harness + output contracts * Phase 16.15: usage checklist * tests: avoid env var in graph plateau gc child * Centralize token classification to main thread * Wood sitter (#53) * Centralize token classification to main thread * Tree-sitter: global VFS batched scheduler * Tree-sitter: add CSS config * tree sitter and tree sitter swift package patches fuck you and the person who does a poor job of maintaining this * tree-sitter native: enable javascript grammar smoke parse * tree-sitter native: enable typescript and tsx grammar smoke parse * tree-sitter native: enable python grammar smoke parse * tree-sitter native: enable json grammar smoke parse * tree-sitter native: enable yaml grammar smoke parse * tree-sitter native: enable toml grammar smoke parse * tree-sitter native: enable markdown grammar smoke parse * tree-sitter native: enable kotlin grammar smoke parse * tree-sitter native: enable csharp grammar smoke parse * tree-sitter native: enable c/clike grammar smoke parse * tree-sitter native: enable cpp grammar smoke parse * tree-sitter native: enable objc grammar smoke parse * tree-sitter native: enable go grammar smoke parse * tree-sitter native: enable rust grammar smoke parse * tree-sitter native: enable java grammar smoke parse * tree-sitter native: enable css grammar smoke parse * tree-sitter native: enable html grammar smoke parse * woodroadmap N0: lock native-only scheduler decisions * woodroadmap N1: migrate scheduler schema from wasmKey to grammarKey * woodroadmap N2: route scheduler planning through native targets * woodroadmap N3: switch scheduler executor to native-only parsing * woodroadmap N4: enforce scheduler-only stage1 tree-sitter contract * Add native tree-sitter scheduler N5 test suite * Complete N6 native-only tree-sitter cutover * tree-sitter: simplify native runtime and remove wasm-era caps * tree-sitter: normalize scheduler and worker wording * shared: harden regex serialization and subprocess arg quoting * tests: restore ann backends and regenerate inventories * tree-sitter: map jsx segments to javascript native grammar * embeddings: defer backend loading during runtime init * Finalize native tree-sitter scheduler fixes and CI bootstrap updates * Fix native tree-sitter JS/JSX grammar export resolution * Finalize native tree-sitter scheduler and CI probe updates * Unify cache root paths and harden embeddings/runtime diagnostics * Log unresolved import samples during relations resolution Collect unresolved import samples even when graph output is disabled and return them from resolveImportLinks. Emit bounded unresolved import sample lines in postScanImports alongside the aggregate import summary counts. * Fix CI cache-hit patching and harden native tree-sitter contracts Workflows now include patch files in node_modules cache keys and skip npm run patch on cache hits while still rebuilding native modules. Replaced the tree-sitter patch with a minimal source-only binding.gyp diff to avoid cross-platform patch-package failures. Upgraded scheduler native smoke/language tests to a shared per-language metadata contract suite covering fixture routing and chunk metadata invariants. * Reduce SourceKit LSP hover timeouts and honor provider config Removed the fixed 8s hover timeout cap in collectLspTypes and made hover timeout configurable per provider. Added sourcekit-specific timeout/retry/breaker/hover controls with safer defaults and enabled passthrough of tooling.sourcekit/pyright config in getToolingConfig. Validated with sourcekit provider fallback/output-shape and LSP enrichment tests. * Split script-coverage into grouped tests and fix summary parity fallback * Align CI/doc/tests with current native tree-sitter and cache behavior * Allowlist script-coverage env vars for config budget * cleanup * Improve SourceKit hover resilience and Swift signature parsing - add hover throttling, adaptive timeout disable, per-file budgets, and metrics in LSP collector - decouple SourceKit hover timeout, prefer non-asserts binary resolution, and add host-level concurrency gate for test runs - improve Swift signature parsing coverage to reduce hover dependence and add focused parser tests - include atomic JSON stream replace-path hardening updates in src/shared/json-stream/atomic.js * Ropiary hedge (#54) * Add duplication verification and detailed DUPEMAP execution plan - add DUPEMAP.md with frontloaded, dependency-gated phase structure and granular subphase tasks - add refreshed duplication_consolidation_report.md verification section with confirmed cluster statuses and additional findings - include All_Findings.md in this snapshot as requested * Expand findings execution plan and extract postinstall script Summary of included changes: - Added comprehensive findings expansion review in All_Findings.md (Part 5), including src/** full-coverage accounting, 9-batch parallel review split, and 18 confirmed path-level findings with severity/impact/fix direction. - Expanded DUPEMAP.md into a unified dedupe + findings remediation program with explicit F0-F9 phases, detailed subphases, touchpoints, tests, dependencies, and acceptance gates. - Reordered work into a unified, frontloaded wave sequence (U0-U5) so foundational cross-cutting work is executed first and downstream work is simplified. - Added mandatory D/F coupling (touch-once execution) to integrate deduplication and findings fixes in the same module families and avoid second-pass rewrites. - Added performance-focused planning refinements: perf budget artifact, baseline/delta capture, bounded-memory enforcement, hotspot-first prioritization, concurrency/backpressure contracts, and CI perf trend/top-offender gates. - Added explicit mapping for all Part 5 src/** findings to concrete remediation phases and touchpoints. - Updated indexer crash logger enablement to respect runtime.debugCrash gate in src/index/build/indexer/pipeline.js. - Replaced inline package.json postinstall command with dedicated tools/setup/postinstall.js script that preserves behavior while handling --omit=dev safely (skip when patch-package is unavailable). - Ran repository formatting (npm run format) before commit. * D0.1: add dupemap migration manifest baseline - Added docs/tooling/dupemap-migration-manifest.json with schemaVersion, clusters, migrations, banPatterns, and exceptions sections. - Seeded all 27 roadmap clusters plus 4 additional verified clusters with concrete legacy/canonical paths. - Added explicit exception semantics requiring reason + expiry phase and forbidding permanent exceptions. - Updated DUPEMAP.md to mark D0 in progress and D0.1 tasks complete. * Complete D0: shift to fix-first execution - Removed remaining D0 scanner/audit tooling work and finalized fix-first sequencing tasks (D0.2, D0.3). - Marked D0 exit criteria complete and recorded D0.DOC no-doc-change rationale with timestamp. - Updated phase summary status for D0 to completed. - Kept D0 focused on baseline mapping + execution kickoff, with no new scanner/audit scripts. * Reopen prematurely completed D0 checklist items Restore D0 status/checklists to in-progress so only implemented work is marked complete. * docs(dupemap): lock in D0.2 fix-first sequencing * docs(dupemap): complete D0.3 lane-only enforcement and close D0 * docs(dupemap): complete F0.1 findings mapping baseline * docs(dupemap): complete F0.2 ownership and closure criteria * docs(dupemap): complete F0.3 discipline gates and close F0 * feat(d1.1): consolidate upward walkups and path containment helpers * feat(d1.2): unify warn-once and lru cache primitives * feat(d1.3): unify disk-space and watch normalization primitives * feat(d1.4): centralize locks and misc primitives * feat(d1.5): add atomic write cache policy lifecycle primitives * feat(d2.1): unify merge helpers and shared JSONL readers * feat(d2.2): unify writer scaffolding and jsonl extension helpers * feat(d4.1): unify ann backend normalization and provider gating * feat(d4.2): unify api+mcp search request and meta normalization * feat(d4.3): consolidate api+mcp repo cache policy and manager * feat(d5.1): centralize tooling binary and typescript loader helpers * feat(d5.2): share signature splitters and read-signature helpers * feat(d5.3): share js/ts relations callee and location helpers * feat(d3.2): unify sqlite tool helpers and noop task factory * feat(d3.1): extract shared sqlite build core for artifacts and bundles * feat(d3.3): unify sqlite quantization and vocab helpers * feat(d3.4): consolidate lmdb utility helpers * feat(d6.1): extract shared chunking helper primitives * feat(d6.2): unify risk shared helper primitives * feat(d6.3): unify import candidates and map shared helpers * test(d7.1): extract shared ann pipeline scenario helper * test(d7.2): consolidate interprocedural flow cap scenarios * test(d7.3): consolidate vfs/sqlite streaming fixtures * build(bootstrap): rebuild optional native modules safely * test(d7.4): extract shared graph symbol sqlite harness fixtures * refactor(d7.5): dedupe map bench viewer and build options * refactor(d8.1): consolidate Ajv validation scaffolding * refactor(d8.2): consolidate download redirect fetch helper * chore(d8.3): lock migration sweeps docs sync and ci lanes * docs(dupemap): remove subphase d8.4 * Fix LSP param enrichment for clangd canonical signatures * Fix subprocess quoting and parity fixture path regressions * Preserve trailing-slash import semantics for relative resolution * Chainsaw bird (#55) * Complete F1 lifecycle/runtime correctness and contracts * Complete F2 language and chunking correctness contracts * Complete F3 artifact and storage I/O crash-safety * Complete F4 retrieval ANN embeddings correctness and boundedness * Complete F5 tooling LSP service resilience and diagnostics hygiene * Complete F6 map graph context-pack correctness and cleanup safety * Complete F7 security path and input hardening * Complete F8 contract evidence and src coverage lock * Enforce postinstall patching and relax native rebuild requirements * Cleanup * Close D7 roadmap gates and mark phase docs complete * Fix ci-lite VFS disk safety contract and reorder lane timings * Reorder ci lane from timings and refresh config inventory * Accelerate stage3 embeddings pipeline for sparse-mode builds * Reorder ci-long lane by latest timing data * Allowlist service subprocess config env vars * Reset ANN provider cooldown on empty successful queries * Preserve legacy findings wrapper handling in triage ingest * Skip stage4 promotion for explicit index-root runs * perf(indexing): speed up html/swift import hot paths * perf(lang): sweep collectors and relations hot paths * Add artifact I/O tracing and harden HTML/CSS import collectors * cleanup * Allow trace artifact env var in config budget allowlist * Route trace artifact flag through shared env config * Always run stage3 artifact validation for empty outputs * cleanup * Do not enforce default fetch timeout for downloads * Fix manifest-aware chunk metadata loading for triage builds * Keep isomap client imports within served /isomap modules * Harden manifest validation against transient file handoff * Honor parse-skip in scheduler plan and relax postinstall for omit-dev * Skip native rebuild on node_modules cache hits in CI workflows * Consolidate workflow contract coverage into one test * Add shell metachar quoting regression coverage for subprocess wrapper * Fix embeddings cache index lock/merge semantics * remove parity from ci-long * Fix postinstall patch/rebuild order and add install contract coverage * Cleanup * Fix packed minhash row buffer aliasing in artifact loader * ci long reorder * Fail postinstall when required patches are unavailable * Refine SWEET16 phase 16.16 execution tasks * Add shared invariant helpers for ordering, tokens, and comparators * Route Stage1 token-id collision tracking through shared invariant helper * Use shared comparator invariant in merge validation path * Use shared determinism hashing helpers in index validation * Mark completed shared invariant wiring tasks in phase 16.16 * Add packed checksum framework and wire minhash write/read validation * Harden shard loading and add loader fallback/fuzz coverage * Document loader hardening and packed checksum behavior * Add phase 16.16 contract tests and mark roadmap progress * Implement versioned cache roots and coverage * Align truth ledger schema and ordering line hash validation * Align byte budget default overflow policy with spec * Enforce comparator invariants across merge adopters * Fail fast on Stage1 token-id collisions and surface in validation * Harden filter-index effectiveLang fallback handling * Use measured postings queue byte accounting with telemetry * Harden tree-sitter scheduler plan freshness checks * Add VFS fast-path telemetry and batched row loading * Stream and measure Stage4 chunk_meta ingest path * Index context-pack seed resolution over chunk_uid_map * Add parameterized bench contracts and per-bench schemas * Strengthen phase usage checklist phase-signal assertions * Close SWEET16 roadmap tasks with benchmark evidence * Update config inventory artifacts * Align filter-index segment-aware test with unknown-lang fallback * Reduce buildFilterIndex allocation overhead on hot path
Summary
Checklist