Skip to content

Add SynixSearch output contract#91

Merged
marklubin merged 4 commits intomainfrom
mark/synix-synixsearch-contract
Mar 7, 2026
Merged

Add SynixSearch output contract#91
marklubin merged 4 commits intomainfrom
mark/synix-synixsearch-contract

Conversation

@marklubin
Copy link
Owner

Summary

  • add SynixSearch as the canonical local search output backed by declared SearchSurfaces
  • keep SearchIndex as compatibility-only while moving docs, templates, planner/runtime, and CLI messaging to the new contract
  • refresh validator-heavy demo goldens and case text so replayed demos match the shipped search/validation behavior

Verification

  • git diff --check
  • uv run python -c "from synix.dev import verify_demos; verify_demos()"
  • uv run pytest

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Note

Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only)

Threat assessment — Medium risk: mostly a refactor/rename, but it creates a new “canonical” search output abstraction while still writing build/search.db, and the semantics (progressive vs complete materialization) are changing in subtle ways.

One-way doors

  1. New public API types: SearchSurface + SynixSearch as the “new way”
    Hard to reverse because docs/templates push users onto these names/shapes, and they’ll become the mental model for “how search works.”
    Safe to merge only if: you commit to backward compatibility for SearchIndex, define deprecation policy/timeline, and document the contract boundary (what’s stable vs internal).

  2. Projection type string identifiers: "synix_search" in planning/materialization/verify output
    Hard to reverse because golden outputs, CLI UX, and any downstream tooling parsing plan output will start depending on these labels.
    Safe only if: output schemas are versioned (or explicitly “not stable”), and JSON outputs (if any) are updated and versioned accordingly.

  3. Implicit artifact contract: still defaulting to build/search.db while claiming it’s not the contract
    This is a trap: you say the DB/schema isn’t public, but your CLI (synix search) and verify checks still hardcode build/search.db and table search_index. Users will treat it as stable anyway.
    Safe only if: you actually introduce an indirection (manifest pointer, or a stable API to locate the DB), and stop hardcoding table names in CLI/verify.

Findings

  1. src/synix/core/models.py: SynixSearch copies surface.sources/modes/embedding_config at construction time
    Failure mode: surface mutates later (common in Python pipelines), but SynixSearch becomes stale and silently diverges from the surface. That’s nasty to debug. Prefer referencing surface dynamically, or freezing + hashing and validating at build time.
    Severity: [warning]

  2. src/synix/build/runner.py: changed materialization semantics (SearchIndex progressive, SynixSearch complete)
    Failure mode: any transform that expects local search to exist mid-build now depends on which projection class you used. That’s coupling-by-type, not by capability. If build-time retrieval is the point of SearchSurface, the runner should materialize the surface progressively (or explicitly when first needed), independent of the output adapter.
    Severity: [warning]

  3. src/synix/build/runner.py + src/synix/cli/search_commands.py + src/synix/build/verify.py: hardcoded build/search.db and table name
    Failure mode: SynixSearch(output_path=...) can write elsewhere, but synix search and verify still look only at build/search.db. That means user config produces a “working” build with a “broken” CLI. Also verify checks schema table search_index regardless of adapter name.
    Severity: [critical]

  4. src/synix/build/projections.py: get_projection() returns SearchIndexProjection for both "search_index" and "synix_search"
    Failure mode: naming suggests different adapters, but implementation is identical. That’s fine internally, but then don’t pretend it’s a separate abstraction unless you also isolate config/schema/paths. Right now it’s lipstick + extra complexity.
    Severity: [minor]

  5. src/synix/build/plan.py: treating SynixSearch as having embedding_config and hashing it
    Risk: you’ve duplicated embedding config state in multiple places (surface and projection). If they drift, plan/cache decisions become inconsistent. This is another symptom of “copy at init” in SynixSearch.
    Severity: [warning]

  6. src/synix/build/verify.py: aliasing search_indexsynix_search
    Risk: you changed the check name but kept legacy alias. Good. But the implementation is still “does search.db contain search_index table.” That’s not “SynixSearch,” it’s “SQLite adapter v1.” Mismatch between naming and actual contract.
    Severity: [minor]

  7. src/synix/build/validators.py: deterministic ordering added
    This is one of the few clear wins: stable ordering reduces golden churn. But the sort keys include violation.message and JSON-dumped metadata; if messages include nondeterministic text, you still get churn.
    Severity: [minor]

Missing

  • Tests proving SynixSearch(output_path=...) is respected by CLI search and verify (currently it is not).
  • Documentation for: how synix search locates the active search output when multiple surfaces/projections exist.
  • A migration/deprecation note: what happens to existing projects relying on progressive SearchIndex for mid-build retrieval?
  • Any explicit statement of stability for CLI table output vs --json (README mentions runs list JSON versioning; nothing similar for plan/search/verify).

Verdict

BlockSynixSearch.output_path is effectively ignored by CLI/verify, and the change introduces type-based behavior differences (progressive vs complete) without a capability-level contract. Fix those, then it’s shippable.

Review parameters
  • Model: gpt-5.2
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 10,000 lines (of 11,078 total — truncated)
  • Prompt: .github/prompts/openai_review.md
  • Timestamp: 2026-03-06T23:39:53Z

@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary — This PR introduces SynixSearch as the canonical local search output layer, separating it from the legacy SearchIndex. Pipelines now declare a SearchSurface (build-time capability) and wire a SynixSearch to it for the user-facing SQLite output. All templates, golden files, CLI labels, and verify checks are updated to match. A secondary change adds stable sorting to validator violations and artifact gathering.

Alignment — This is a clean step toward the Search Surface RFC separation described in the docs. DESIGN.md §6.1 distinguishes projection artifacts from build-time capabilities; this PR makes that distinction concrete in the type system. SynixSearch delegates its sources, search, and embedding_config to the underlying SearchSurface, keeping the surface as the single source of truth — consistent with the content-addressed, DAG-oriented model. The SearchIndex compatibility path is preserved, matching the pre-1.0 evolution stance.

Observations

  1. [positive] SynixSearch.__init__ copies config from the surface (list(surface.sources), list(surface.modes), dict(surface.embedding_config)), preventing shared-mutable-state bugs. Clean delegation. (models.py)

  2. [positive] The stable violation sort key (_violation_sort_key) and sorted artifact gathering in _gather_artifacts fix what was clearly non-deterministic golden output ordering. The massive golden file churn in template 04 is entirely re-ordering — the right fix.

  3. [concern] SynixSearch hardcodes self.embedding_config = dict(surface.embedding_config) but SearchSurface.__init__ sets self.embedding_config to {} when none is provided. This means SynixSearch will always have a truthy embedding_config dict even when empty. _compute_embedding_config_hash returns None only when not proj.embedding_config — an empty dict is falsy in Python, so this works, but it's fragile. If anyone adds a default key it breaks silently. (models.py:311, runner.py:1040)

  4. [concern] _search_projection_db_path for SynixSearch uses proj.output_path if set, but there's no validation that the path is inside build_dir. A user passing an absolute path could write the DB outside the build directory, which would survive synix clean. (runner.py:102)

  5. [question] SynixSearch waits for all source layers before materializing (unlike SearchIndex which materializes progressively). This changes behavior for pipelines migrating from SearchIndex — transforms using ctx.search(...) mid-build won't see partial results from SynixSearch. Is this intentional? The SearchSurface still materializes progressively so maybe it's fine, but the interaction isn't documented. (runner.py:816-818)

  6. [nit] _materialize_projection has the condition if _is_search_output_projection(proj) or isinstance(proj, SearchSurface) — but _is_search_output_projection already returns False for SearchSurface, so this reads oddly. A helper like _needs_search_indexer would be clearer. (runner.py:993)

  7. [positive] The verify check alias system (aliases = {"search_index": "synix_search"}) is a nice backward-compat touch for anyone with scripts calling synix verify --checks search_index.

  8. [nit] The README example still shows SearchIndex in the compatibility note but the import line removes it. Someone copy-pasting the example won't have SearchIndex imported if they need the compat path. (README.md:112)

  9. [positive] Golden test coverage is thorough — build, plan, explain, rebuild, verify, and validate outputs all updated across 6 templates. The demo normalizer regex was updated to match the new label.

  10. [question] Both synix_search and search_index resolve to the same SearchIndexProjection class via get_projection. Is there a plan for them to diverge, or is the naming split purely for CLI/plan labeling? (projections.py:42)

Verdict — Good incremental step: it cleanly separates build-time search capability from the user-facing search output, advances the Search Surface RFC, and the deterministic violation ordering fix is overdue. The db-path validation gap and progressive-vs-complete materialization behavior change deserve a follow-up.

Review parameters
  • Model: claude-opus-4-6
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 10,000 lines (of 11,078 total — truncated)
  • Prompt: .github/prompts/claude_review.md
  • Timestamp: 2026-03-06T23:39:56Z

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

Note

Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only)

Threat assessment — Medium risk: this PR introduces a new public API concept (SynixSearch) and changes CLI behavior around search outputs; the direction is reasonable but there are a couple of correctness holes and a few “API one-way doors” being created without enough guardrails/tests.

One-way doors

  1. New public API types: SearchSurface + SynixSearch split

    • Why hard to reverse: Users will code against this graph shape (SynixSearch(depends_on=[surface]), plus default naming conventions like "search"). Undoing/renaming later becomes churn and forces migrations in docs/templates and pipelines.
    • Safe to merge if: You commit to these names/contracts through v1.0, or you explicitly mark them experimental in code/docstrings and provide a stable compatibility layer (not just “SearchIndex still works”).
  2. CLI contract: synix search default resolution + --projection <name>

    • Why hard to reverse: People will rely on implicit selection rules and flag naming. Also you’re calling it --projection, but it’s specifically “search output projection”; that naming will stick.
    • Safe to merge if: Resolution rules are documented in CLI help + docs, and you add tests that lock the selection precedence (single output, prefer synix_search named search, ambiguity error).
  3. Persisting db_path into .projection_cache.json

    • Why hard to reverse: You’re now treating cache metadata as a quasi-registry for later CLI resolution. Once written, old builds will be queried by new code.
    • Safe to merge if: The cache schema is versioned or at least tolerant (you did tolerate missing/invalid JSON), and you guarantee relative-path semantics and normalization across platforms.

Findings

  1. src/synix/core/models.py: SynixSearch copies surface.embedding_config unconditionally

    • Pattern: self.embedding_config = dict(surface.embedding_config)
    • Failure mode: If SearchSurface.embedding_config is None (which is plausible for fulltext-only), dict(None) raises TypeError. This is a runtime crash on a very normal configuration.
    • Severity: [critical]
  2. src/synix/cli/info_commands.py: still hardcodes build/search.db

    • Pattern: search_db = build_path / "search.db" and counts search_index table
    • Failure mode: With multiple outputs or non-default SynixSearch.output_path, status will lie (“not built”) even when a valid search output exists. This is exactly the scenario you introduced.
    • Severity: [warning]
  3. README.md: CLI flag mismatch (--projection vs text “pass --projection ” but earlier says “--projection” and one place says “--projection ”; also says “If ... pass --projection ” but earlier line says “pass --projection ” while the diff text says “pass --projection <name>” yet an earlier sentence says “pass --projection <name>” but at top it says “pass --projection <name>” and in one place it says --projection but the user-facing text in README says “pass --projection <name>”; ok. The real issue: it says “If a build has several local search outputs, pass --projection <name>” but the CLI option is added under synix search only; other commands (verify/status) don’t accept it.

    • Failure mode: Docs imply global concept; implementation is partial/inconsistent.
    • Severity: [minor]
  4. src/synix/build/runner.py: progressive vs complete materialization semantics changed

    • Pattern: SearchIndex progressive, SynixSearch waits for full closure.
    • Failure mode: If any transforms rely on querying local search during build, they likely need SearchSurface readiness, not SynixSearch. But you’re still materializing SearchSurface as a projection-like thing and using the same indexer. There’s hidden coupling here: if SearchSurface is supposed to be “build-time capability”, it shouldn’t depend on “projection cache” semantics at all.
    • Severity: [warning]
  5. src/synix/build/search_outputs.py: silent failure on corrupt cache

    • Pattern: _load_projection_cache swallows JSON/OSError and returns {}.
    • Failure mode: A corrupted cache makes the CLI fall back to legacy build/search.db if it exists, or “no search output” otherwise, with no warning. That can cause confusing misreads and incorrect querying when multiple outputs exist but metadata is unreadable.
    • Severity: [minor]
  6. src/synix/build/validators.py: ordering changes will reshuffle outputs

    • Pattern: Sorting violations/artifacts for stable output.
    • Risk: Fine, but this is behavioral; you updated golden files, but I don’t see unit tests asserting deterministic order beyond snapshot tests.
    • Severity: [minor]

Missing

  • Unit tests for:
    • SynixSearch with SearchSurface(modes=["fulltext"], embedding_config=None) (would catch the dict(None) crash).
    • resolve_search_output() precedence and ambiguity error cases.
    • Path traversal / escaping tests for output_path and db_path normalization (Windows paths especially).
  • CLI status (synix status) update to use resolve_search_output() instead of hardcoding search.db.
  • Doc update in CLI reference section for synix search to mention --projection explicitly (README has prose but the command table doesn’t).

Verdict

Block until the dict(surface.embedding_config) crash is fixed and synix status is made consistent with the new multi-output search resolution; otherwise you’re shipping a new public API that breaks on common configs and a CLI that misreports state.

Review parameters
  • Model: gpt-5.2
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 10,000 lines (of 11,600 total — truncated)
  • Prompt: .github/prompts/openai_review.md
  • Timestamp: 2026-03-07T00:07:22Z

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary

This PR introduces SynixSearch as the canonical local search output layer, separating it from SearchSurface (build-time retrieval capability). All templates and docs migrate from the single SearchIndex pattern to a two-step SearchSurface → SynixSearch pattern. It also adds stable violation/artifact sort ordering for validators, a new search_outputs.py resolution module, and the --projection CLI flag for disambiguating multiple search outputs.

Alignment

Strong fit. DESIGN.md §6.1 distinguishes projection artifacts from search artifacts, and §1.5 emphasizes that the mutable build surface is not the source of truth. This PR cleanly separates the build-time search capability (SearchSurface) from the materialized output (SynixSearch), which is exactly the layering the Search Surface RFC and immutable snapshot architecture demand. The SynixSearch.output_path constraint (must stay under build dir) respects the mutable-surface boundary. SearchIndex remains as compatibility sugar — consistent with the project's pre-1.0 iteration stance.

Observations

  1. [positive] search_outputs.py — Clean resolution module with explicit disambiguation logic, legacy fallback to build/search.db, and a proper error type. The priority cascade (prefer synix_search named search, then singleton) is sensible and well-documented.

  2. [positive] Stable violation ordering in validators.py via _violation_sort_key — explains the massive golden file reorderings. This is the right fix; non-deterministic output ordering in validation JSON was a correctness problem for snapshot testing.

  3. [positive] Stable artifact gathering with _artifact_sort_key across all _gather_artifacts paths. Consistent with DESIGN.md's audit determinism principle.

  4. [concern] _search_projection_db_path in runner.py — the path-escape check uses resolved_build_dir = build_dir.resolve(strict=False). If build_dir doesn't exist yet (first build), resolve(strict=False) on some platforms may produce unexpected results with symlinks. This seems fine for the common case but worth a comment.

  5. [concern] SynixSearch.__init__ copies surface.embedding_config via dict(surface.embedding_config) — but SearchSurface.embedding_config defaults to {} (empty dict from the existing model). If surface.embedding_config is None somehow, this would raise TypeError. The existing SearchSurface constructor does default it, but defensive dict(surface.embedding_config or {}) would be safer.

  6. [question] The SynixSearch layer depends on its surface (depends_on=[surface]) but also copies sources from the surface. Is depends_on used for DAG ordering while sources drives materialization? The dual reference could confuse contributors — a docstring clarifying this would help.

  7. [nit] _is_search_output_projection excludes SearchSurface but _search_projection_type handles it. The asymmetry in these helpers is slightly confusing — the name suggests it covers all search projections, but it specifically means "output projections only."

  8. [positive] search_commands.py now properly closes both SearchIndexProjection and SearchIndex connections via .close() / finally blocks — fixes what looks like a pre-existing resource leak.

  9. [concern] The golden file changes for template 04 are enormous (~80% of the diff by line count) and are purely violation reordering. This is correct but makes the PR very hard to review visually. Consider whether golden files for non-deterministic LLM-dependent outputs should use looser matching.

  10. [positive] README.md explains the new pattern clearly: SearchSurface is the capability, SynixSearch is the output, SearchIndex is legacy. The --projection flag guidance is good.

  11. [nit] verify.py renames the check to synix_search but keeps _check_search_index as a legacy alias. The alias map approach is clean, but the docstring on the alias function could mention it's for backward compatibility with --checks search_index.

Verdict

This is a well-structured incremental step that correctly separates build-time search capability from output materialization, maintains backward compatibility, and fixes non-determinism in validation output — all aligned with the project's stated architecture.

Review parameters
  • Model: claude-opus-4-6
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 10,000 lines (of 11,600 total — truncated)
  • Prompt: .github/prompts/claude_review.md
  • Timestamp: 2026-03-07T00:07:30Z

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

Note

Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only)

Threat assessment — Medium-high risk: this PR changes the public mental model and default CLI behavior around search outputs, and it persists new metadata that future versions will be stuck supporting.

One-way doors

  1. New public API types: SearchSurface + SynixSearch (vs SearchIndex)

    • Why hard to reverse: Users will bake these class names, constructor params (surface=, output_path=), and semantics (surface as capability, output as projection) into pipelines and docs.
    • Safe to merge if: you commit to a deprecation window for SearchIndex, add an explicit versioning story for projection metadata/schema, and document migration + compatibility guarantees.
  2. New CLI flag contract: synix search --projection <name> + default resolution rules

    • Why hard to reverse: CLI UX becomes documented and scripted; the resolution heuristic (“the one named search”, prefer synix_search) becomes implicit API.
    • Safe to merge if: resolution rules are fully specified, stable, and tested for ambiguous builds; and the flag name matches your terminology (“projection” vs “output” vs “surface”) consistently.
  3. .projection_cache.json becomes a cross-feature metadata source

    • Why hard to reverse: You’re now using this file not just for caching but also for discovery/routing (db_path, projection_type). Any change breaks CLI/search/verify.
    • Safe to merge if: you schema-version that file (top-level schema_version), validate reads strictly (don’t silently ignore JSON errors), and provide fallback/migration behavior.

Findings

  1. src/synix/build/search_outputs.py swallows JSON/IO errors and treats as “no metadata”

    • Risk: Corrupt cache causes silent fallback to legacy behavior; users get confusing “no search output” or query wrong DB without knowing metadata is broken.
    • Severity: [warning]
  2. src/synix/build/runner.py uses Path.resolve(...).is_relative_to(...) for containment

    • Risk: resolve(strict=False) does not protect against symlink tricks inside build_dir (a symlinked child can point outside). You’re claiming a security property (“must stay under build directory”) that isn’t actually enforced robustly.
    • Severity: [warning]
  3. Projection cache now stores db_path relative to build dir, but no normalization/consistency

    • File/pattern: runner._search_output_cache_metadata() + search_outputs._normalized_output_path()
    • Risk: Mixed absolute/relative formats across platforms and older caches will create brittle behavior; relative_to() in runner will throw if path isn’t under build dir (good) but will also differ depending on cwd/realpath.
    • Severity: [minor]
  4. src/synix/build/projections.py maps both search_index and synix_search to SearchIndexProjection

    • Risk: This is semantic theater: “SynixSearch” claims to decouple schema/file from contract, but it’s the same adapter + same table expectations. You’re not actually buying abstraction; you’re adding types and metadata complexity.
    • Severity: [warning]
  5. src/synix/build/runner.py changes materialization timing semantics

    • Pattern: progressive materialization only for SearchIndex, but SynixSearch waits for full closure.
    • Risk: Any transforms that previously depended (implicitly) on progressive availability of build/search.db will break if migrated to SynixSearch. The docs mention build-time retrieval via SearchSurface, but I don’t see enforcement preventing a transform from reading the DB directly.
    • Severity: [warning]
  6. Terminology mismatch in README

    • File: README.md
    • Risk: It says “pass --projection <name>” when the object is a projection output, but the rest of the system calls it projection/output/surface inconsistently. Also: synix search talking about “projection” while the class is SynixSearch and capability is SearchSurface is confusing and will calcify.
    • Severity: [minor]
  7. src/synix/build/verify.py still verifies table name search_index

    • Risk: If SynixSearch ever changes schema (the stated goal), verify will become wrong. You’re baking the legacy schema into “canonical” verification.
    • Severity: [warning]
  8. Validator ordering change may invalidate downstream golden expectations

    • File: src/synix/build/validators.py (ViolationReport.to_dict() sorting; artifact gathering sorting)
    • Risk: Any consumers expecting original ordering will see diffs; might be fine, but there’s no explicit mention in docs/changelog.
    • Severity: [minor]

Missing

  • Tests for resolve_search_output() resolution matrix: (0 outputs, 1 output, many outputs, includes both search_index + synix_search, name collisions, missing db_path, db_path escaping build dir).
  • A schema/version field for .projection_cache.json and explicit compatibility policy.
  • Documentation update on synix.dev/website copy (still references SearchIndex heavily) and DESIGN.md alignment (design doc uses “artifact surface=search”; now you have surface vs output split).
  • Clear migration guide: “old pipeline with SearchIndex -> new pipeline with SearchSurface + SynixSearch” including behavioral differences (progressive materialization).

Verdict

Ship with fixes — the direction is reasonable, but metadata/versioning and error handling around projection discovery are too flimsy to merge as-is, and the containment claim is weaker than advertised.

Review parameters
  • Model: gpt-5.2
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 10,000 lines (of 11,812 total — truncated)
  • Prompt: .github/prompts/openai_review.md
  • Timestamp: 2026-03-07T00:17:44Z

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary

This PR introduces SynixSearch as the canonical local search output layer, separating the build-time search capability (SearchSurface) from the user-facing search materialization (SynixSearch). All templates and pipelines are migrated from SearchIndex to SearchSurface + SynixSearch. A new search_outputs.py module handles CLI resolution of which search database to query, including multi-output disambiguation. Validators gain stable sort ordering for deterministic output.

Alignment

Strong alignment with the Search Surface RFC direction implied by DESIGN.md's artifact/projection split. The separation between SearchSurface (build-time capability, DAG participant) and SynixSearch (materialization output) maps cleanly to DESIGN.md's distinction between Steps and Artifacts. SearchIndex is preserved as compatibility sugar — consistent with the "architecture evolution without data loss" principle. The output_path constraint (must stay inside build dir) respects the mutable-local-surface vs. immutable-snapshot split from §1.5. The stable violation ordering in validators advances audit determinism (Appendix C).

Observations

  1. [positive] search_outputs.py — Clean new module with well-defined resolution logic: single output auto-resolves, multiple outputs require --projection, legacy search.db fallback when no metadata exists. The SearchOutputSpec frozen dataclass and SearchOutputResolutionError are good API boundaries.

  2. [positive] The _search_projection_db_path enforces path containment (is_relative_to check), preventing path traversal. This is the right security posture for user-supplied output_path.

  3. [concern] runner.py:_materialize_projection line: if _is_search_output_projection(proj) or isinstance(proj, SearchSurface)_is_search_output_projection returns True for SearchIndex and SynixSearch but not SearchSurface. This condition is correct but the or clause is easy to misread. A comment would help.

  4. [concern] search_commands.pySearchIndexProjection is instantiated for keyword mode, SearchIndex (from search module) for semantic/hybrid. The keyword path now calls search_projection.close() explicitly, but SearchIndex.close() in the hybrid path is in a finally. The asymmetry is fine but the keyword path doesn't have a finally — if query() throws, the connection leaks.

  5. [question] SynixSearch copies surface.sources, surface.modes, and surface.embedding_config in __init__. If the surface is mutated after construction, the SynixSearch instance goes stale. Is there a reason to copy rather than delegate? Pipeline add() ordering could make this a subtle bug source.

  6. [positive] Stable violation ordering (_violation_sort_key) and stable artifact gathering (_artifact_sort_key) in validators — this directly enables deterministic golden file comparisons, explaining the massive golden file reordering in the diff. Good investment for test stability.

  7. [nit] PROJECTION_CACHE_FILE was moved from runner.py to search_outputs.py — runner imports it from there. Fine, but the constant is now in a module named search_outputs while it serves all projection types. Minor naming tension.

  8. [positive] SynixSearch materialization waits for all sources (complete mode), unlike SearchIndex which materializes progressively. This is the right default for a canonical output — you don't want partial search results as the final artifact.

  9. [question] The golden file changes for template 04 are enormous (thousands of lines of violation reordering). Are these regenerated automatically, or is there risk of manual editing drift?

  10. [nit] build_commands.py line 70: elif proj_type in {"synix_search", "flat_file"}SearchSurface would fall through to the else branch and get no trigger. Is that intentional? Surfaces are materialized separately, so probably yes, but worth confirming.

Verdict

This is a well-structured incremental step that cleanly separates build-time search capability from user-facing search output, advances deterministic validation output, maintains backward compatibility, and includes thorough golden file updates — a solid architectural refinement for the pre-1.0 phase.

Review parameters
  • Model: claude-opus-4-6
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 10,000 lines (of 11,812 total — truncated)
  • Prompt: .github/prompts/claude_review.md
  • Timestamp: 2026-03-07T00:17:55Z

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

Note

Red Team Review — OpenAI GPT-5.2 | Adversarial review (docs + diff only)

Threat assessment — Medium risk: mostly a refactor/rename, but it introduces a new “search output discovery” mechanism and new public API types that will be hard to unwind if wrong.

One-way doors

  1. New public API types: SearchSurface + SynixSearch as the taught path
    Hard to reverse because templates/docs/README now enshrine these names and mental model (“surface capability” vs “output”), and users will write pipelines around them.
    Safe to merge if: you’re confident this abstraction survives v1.0, and you publish a deprecation policy + timeline for SearchIndex (even if it’s “no earlier than 1.0”).

  2. CLI flag --projection and output-selection rules
    Hard to reverse because CLI UX becomes muscle memory and scripts depend on deterministic selection behavior.
    Safe to merge if: selection rules are exhaustively tested and documented in one canonical place (docs + synix search --help), and error messages are stable.

  3. .projection_cache.json becomes a quasi-registry for search outputs
    You say it’s internal, but you’re now depending on it for CLI behavior (resolve_search_output). That makes it de-facto a contract.
    Safe to merge if: you version this metadata (even a tiny schema_version), and you guarantee forward/backward parsing (ignore unknown fields, tolerate missing fields).

Findings

  1. src/synix/build/search_outputs.py — silent corruption fallback on invalid JSON
    _load_projection_cache() returns {} on JSON parse errors and IO errors, causing list_search_outputs() to fall back to build/search.db or “no outputs”. That can mask real breakage and query the wrong DB.
    Severity: [warning]

  2. src/synix/build/runner.py — progressive vs complete materialization changes semantics
    Comment says “SearchIndex progressive; SynixSearch waits for full source closure”. This changes build-time availability for retrieval during builds if anything depended (even accidentally) on early materialization. The docs imply SearchSurface is for build-time retrieval, but you still materialize it as SQLite under build/surfaces/—unclear when that’s ready relative to transforms that “uses=[surface]”.
    Severity: [warning]

  3. src/synix/build/runner.pyPath.is_relative_to portability
    _search_projection_db_path() and _normalized_output_path() use Path.is_relative_to, which is Python 3.9+. If Synix targets 3.8 anywhere, this is an instant runtime crash. No evidence of supported Python versions in diff.
    Severity: [critical] (if 3.8 supported), otherwise [minor]

  4. src/synix/build/projections.py — misleading adapter registry naming
    get_projection() returns SearchIndexProjection for both "search_index" and "synix_search". That means “SynixSearch” is not a different implementation, just a different config path/name. Fine, but the type split risks being pure ceremony. If the intent is future pluggability, you need a real boundary (interface/adapter) now, not just an alias string.
    Severity: [minor]

  5. src/synix/core/models.pySynixSearch duplicates surface fields at construction time
    SynixSearch.__init__ copies surface.sources, surface.modes, surface.embedding_config into its own fields. If the surface is mutated after creation (Python-first API encourages mutation), SynixSearch becomes stale silently.
    Severity: [warning]

  6. src/synix/cli/search_commands.py — inconsistent naming: --projection but help text says “pass --projection ”, README says “--projection”, but earlier text says “pass --projection otherwise, pass--projection ”; also README still says “otherwise, pass --projection <name>” but one spot says “pass --projection <name>” and another says “pass --projection”. Minor, but user-facing.
    Severity: [minor]

  7. src/synix/build/verify.py — check aliasing is half-done
    You map "search_index" -> "synix_search" but still keep _check_search_index() as an alias. This is messy and risks duplicated documentation/CLI output drift. Pick one canonical check name and keep aliasing only at CLI parsing boundary.
    Severity: [minor]

  8. src/synix/build/validators.py — ordering changes alter golden outputs broadly
    Sorting violations and artifacts is fine, but the PR includes massive golden churn. This raises regression risk: you might be masking real semantic diffs in validator behavior behind ordering changes.
    Severity: [warning]

Missing

  • Tests specifically for resolve_search_output() selection rules (single output, multiple outputs, name collisions, invalid metadata, explicit --projection, legacy fallback).
  • A versioned schema for .projection_cache.json (at least schema_version).
  • Clarification of supported Python versions (to justify Path.is_relative_to).
  • A doc/rationale for why SynixSearch is a distinct type if it’s currently the same SQLite projection implementation.

Verdict

Ship with fixes — concept is directionally OK, but block until output-resolution logic is tested and the SynixSearch/SearchSurface mutability/staleness issue is addressed (or explicitly forbidden/validated).

Review parameters
  • Model: gpt-5.2
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 10,000 lines (of 12,191 total — truncated)
  • Prompt: .github/prompts/openai_review.md
  • Timestamp: 2026-03-07T00:36:59Z

@github-actions
Copy link
Contributor

github-actions bot commented Mar 7, 2026

Note

Architectural Review — Claude Opus | Blind review (docs + diff only)

Summary — Introduces SynixSearch as the canonical local search output, decoupling it from SearchSurface (build-time capability) and deprecating SearchIndex to compatibility sugar. Adds a search_outputs module for CLI resolution of which search DB to query, and stabilizes violation/artifact ordering in validators. All templates and golden files are updated.

Alignment — Strong fit. DESIGN.md §1.5 explicitly states projection/release state lives in the mutable local surface pending an explicit release/adapter layer. SynixSearch is that adapter slice — it separates "what is searchable during the build" (SearchSurface) from "what the CLI queries after the build" (SynixSearch). This respects the immutable-vs-mutable split and moves toward the Search Surface RFC's vision. The SearchIndex compatibility path with explicit migration window follows the "no silent breaks" principle.

Observations

  1. [positive] search_outputs.py — Clean resolution logic with well-documented rules, path-escape validation, legacy fallback, and a proper error type. The 5-step resolution algorithm is documented identically in README, pipeline-api.md, and code.

  2. [positive] Stable violation ordering via _violation_sort_key and sorted artifact gathering in _gather_artifacts — this explains the massive golden file churn. Previously non-deterministic output order is now reproducible, which directly serves the "audit determinism" design decision.

  3. [concern] SynixSearch.__init__ copies surface.sources, surface.modes, and surface.embedding_config eagerly. If someone mutates the surface after constructing the SynixSearch, the two diverge silently. A @property delegation or a note in the docstring would prevent confusion.

  4. [concern] _search_projection_db_path resolves paths with strict=False and validates containment. But _search_output_cache_metadata stores relative paths via db_path.relative_to(build_dir.resolve()). If build_dir is a symlink, resolve() follows it, but relative_to on the read side in _normalized_output_path also resolves — this should be consistent, but symlink edge cases could cause cache misses.

  5. [question] SearchIndexProjection is instantiated with a db_path arg in the search command's keyword path (SearchIndexProjection(build_dir, db_path)), but the old code passed only build_dir. Is SearchIndexProjection.__init__ already updated to accept positional db_path? Not visible in this diff.

  6. [nit] _is_search_output_projection returns True for SearchIndex and SynixSearch but the _materialize_projection code also handles SearchSurface via isinstance(proj, SearchSurface) separately. The naming suggests "search output" means user-facing, but the helper is only used in some paths — consider renaming to _is_local_search_projection for clarity.

  7. [positive] The search_commands.py changes add search_projection.close() and search_index.close() in a finally block — fixing what looks like a pre-existing resource leak.

  8. [positive] Verify check renamed from search_index to synix_search with backward alias preserved. Good compatibility handling.

  9. [question] The golden file diffs for template 04 are enormous (thousands of lines of violation reordering). Are these generated deterministically now, or could they flip again if artifact ordering changes upstream? The sort key uses label → violation_type → field → claim_key → message → metadata_json, which looks stable.

  10. [nit] case.py in template 04 changes the fix output mask from r"." to r"^.*$" — both match everything, but the new one is more explicit. Minor improvement.

Verdict — This is a well-executed architectural step that cleanly separates build-time search capabilities from user-facing search outputs, with proper compatibility paths, and the violation-ordering fix removes a real source of non-determinism. Good incremental step.

Review parameters
  • Model: claude-opus-4-6
  • Context: README.md, DESIGN.md, synix.dev, PR diff
  • Diff size: 10,000 lines (of 12,191 total — truncated)
  • Prompt: .github/prompts/claude_review.md
  • Timestamp: 2026-03-07T00:37:13Z

@marklubin marklubin merged commit 44ef150 into main Mar 7, 2026
12 checks passed
@marklubin marklubin deleted the mark/synix-synixsearch-contract branch March 7, 2026 00:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant