Skip to content

Comments

feat(server): canonicalize transcripts at ingest v1#618

Merged
wileland merged 3 commits intodevelopfrom
codex/implement-transcript-canonicalization-at-ingest
Feb 20, 2026
Merged

feat(server): canonicalize transcripts at ingest v1#618
wileland merged 3 commits intodevelopfrom
codex/implement-transcript-canonicalization-at-ingest

Conversation

@wileland
Copy link
Owner

Motivation

  • Prevent transcript identity wobble by canonicalizing user transcripts at ingest and making the canonical form the hashing source-of-truth.
  • Preserve raw user text while storing a deterministic, idempotent canonical view and a stable transcriptHash so receipts and downstream workers remain stable.
  • Apply canonicalization consistently at all write boundaries (upload placeholder, GraphQL add/update, scribe worker) and let reflection worker consume the persisted canonical view.

Description

  • Add a canonicalization v1 helper and helpers in server/models/Entry.js: canonicalizeTranscriptV1, sha256Hex, buildCanonicalTranscriptPayload, and TRANSCRIPT_CANONICALIZATION_VERSION_V1.
  • Extend the Entry model schema with rawTranscript, canonicalTranscript, transcriptHash, and canonicalizationVersion and update setTranscriptState to populate these fields on transcript writes.
  • Wire ingest-time canonicalization into all write paths: server/routes/upload.js (placeholder entry creation), GraphQL addEntry/updateEntry (server/graphql/resolvers/index.js), and scribe worker persistence (server/src/workers/scribe.worker.js) to consistently store the canonical payload and version.
  • Update reflection worker (server/src/workers/reflection.worker.js) to prefer persisted canonicalTranscript when canonicalizationVersion === '1' and fall back to canonicalizeTranscriptV1 for legacy entries, and include canonical fields in recent-history queries.
  • Add/adjust deterministic tests and test mocks: canonicalization/idempotency and hashing tests in server/models/__tests__/Entry.test.js, and updated worker/route tests to mock or exercise the new named exports and behavior.

Testing

  • Ran the task sanity check: node -e "JSON.parse(require('fs').readFileSync('codex/tasks/latest.json','utf8')); console.log('latest.json ok')" which printed latest.json ok (success).
  • Ran preflight: node scripts/codex_preflight.mjs --ci which passed (Codex Web context; preflight warnings about missing remote were expected and non-blocking).
  • Ran server tests: pnpm -C server test and observed the test suite complete with the updated test surface (final run: 219 tests passed, 5 skipped), indicating the canonicalization behavior and worker integration are covered and green.
  • Ran linter pnpm -C server lint:ci which failed in this environment due to an ESLint/AJV runtime incompatibility (TypeError: Cannot set properties of undefined (setting 'defaultMeta')) that is unrelated to these code changes; lint failure is environmental and not caused by the canonicalization edits.

Codex Task

@github-actions
Copy link

github-actions bot commented Feb 20, 2026

Audit

Audit artifacts not found.

Schema

No drift detected ✅

Codex

No Codex outputs yet.

@wileland wileland changed the base branch from codex/phase1-ingest-canonicalization-exec-2026-02-20 to develop February 20, 2026 12:38
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5c9c9a888

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@wileland wileland merged commit c415346 into develop Feb 20, 2026
3 checks passed
@wileland wileland deleted the codex/implement-transcript-canonicalization-at-ingest branch February 20, 2026 15:23
wileland added a commit that referenced this pull request Feb 20, 2026
* phase0-spine-lockdown-2026-02-19: lock meaning-spine contracts, enforce unique offsets, harden emission (#617)

* chore(codex): phase0 spine lockdown task spec

* chore(codex): tighten phase0 spine lockdown spec for codex web

* fix(server): lock meaning spine contracts and receipt validation

* feat(server): canonicalize transcripts at ingest v1 (#618)

* codex(task): phase1 ingest transcript canonicalization v1

* feat(server): canonicalize transcripts at ingest v1

* fix(codex): align latest.json locks/scope; fold transcript whitespace safely
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant