Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 30 additions & 34 deletions codex/tasks/latest.json
Original file line number Diff line number Diff line change
@@ -1,49 +1,45 @@
{
"task_id": "meaning-contract-firewall-2026-02-19",
"title": "Meaning Contract firewall + narrative toggle + telemetry determinism",
"summary": "Harden the Meaning Output Contract at the reflection boundary: do not persist non-generic meaning without validated receipts; add an in-scope narrative toggle enforced deterministically; ensure Langfuse remains no-op/stubbed in tests and never produces outbound/flaky handles. Add/adjust tests to prevent regressions.",
"task_id": "phase0-spine-lockdown-2026-02-19",
"title": "Phase 0 Spine Lockdown: freeze contract vocab, kill ambiguous receipt offsets, harden emission + narrative policy",
"base_branch": "develop",
"branch_name": "codex/meaning-contract-firewall-exec-2026-02-19",
"branch_name": "codex/phase0-spine-lockdown-exec-2026-02-19",
"summary": "Seal the Meaning Spine by freezing contract reason codes, enforcing unique-match offset inference (ambiguity=poison), hardening validateReceipt (strict V1 never falls through), ensuring ENTRY_ANALYZED emits contract+sanitized cards only (no raw reflection text), and locking narrative toggle behind a shared policy utility that callers cannot override. Add/adjust regression tests to prevent drift.",
"repo_scope": [
"codex/tasks/latest.json",
"server/src/workers/reflection.worker.js",
"server/src/utils/truthValidator.js",
"server/tests/receipt.v1.test.js",
"server/tests/outboundPolicy.test.ts",
"server/tests/msw/handlers/infra.ts",
"server/utils/langfuse.js",
"server/utils/withLangfuseTrace.js"
"server/src/utils/**",
"server/src/workers/__tests__/**",
"server/tests/**",
"docs/testing-doctrine.md"
],
"agents_involved": ["codex-web"],
"risk_level": "medium",
"agents_involved": ["codex_web"],
"risk_level": "low",
"tests_to_run": [
"node -e \"JSON.parse(require('fs').readFileSync('codex/tasks/latest.json','utf8')); console.log('latest.json ok')\"",
"node scripts/codex_preflight.mjs --ci",
"pnpm --filter server test tests/receipt.v1.test.js",
"pnpm --filter server test tests/outboundPolicy.test.ts"
"pnpm -C server test"
],
"constraints": [
"Codex Web environment: do NOT run git network commands (fetch/pull/push/clone). Use the UI 'Create PR' button only if there is a real diff.",
"No changes outside repo_scope. If a necessary fix is out of scope, STOP and report it with a minimal Repair Manifest (file list + reason).",
"Avoid formatting-only churn. Changes must be functional and proven by tests.",
"Determinism: tests must not access the real network. If a network attempt occurs, it must fail loudly and deterministically.",
"Do not add new runtime dependencies unless absolutely required; prefer small pure functions and unit tests.",
"If any file outside repo_scope becomes modified (e.g., pnpm-workspace.yaml), revert it immediately and continue. Do not include out-of-scope diffs in PR.",
"Narrative toggle MUST be implemented in-scope as an env flag (NARRATIVE_ENABLED). Do not search for an existing toggle elsewhere; define and enforce it here."
"CODEX_WEB: Do NOT run git network commands (no git fetch/pull/push/clone). Use the UI “Create PR” button if a PR is needed.",
"CODEX_WEB_HEAD: In Codex Web, the checked-out branch name may be 'work'. Do NOT treat HEAD name mismatch as stale. Locks+canary are the source of truth.",
"ANTI-COP-OUT: No diff => no PR. If no actionable work exists, stop and report evidence.",
"SCOPE: Do not modify files outside repo_scope. If out-of-scope issues are found, produce a Repair Manifest instead of changing them.",
"ALIGNMENT: Print task_id/base_branch/branch_name/canary from latest.json before doing any work.",
"EVIDENCE_BUNDLE: Provide evidence in 4 phases: Alignment, Work-Exists Gate, Change Proof, Tests.",
"PR_BASE: Ensure PR base branch is develop (not another codex/* branch). Do not create draft PRs.",
"NO_PLACEHOLDERS: Do not create empty directories or placeholder files. Only create files with real content and tests.",
"NO_NETWORK: Tests must not touch real external network services."
],
"acceptance_checks": [
"Alignment Evidence: print task_id/base_branch/branch_name/canary/repo_scope/tests_to_run from codex/tasks/latest.json at the start.",
"Work-Exists Gate: identify exact code paths enforcing receipts + storage/emission boundaries; cite file+line targets before editing.",
"Meaning Contract: if bloom cards are empty after receipt validation, do NOT persist raw model reflectionText; persist a deterministic generic placeholder instead (no non-generic claims without receipts).",
"Narrative toggle: implement NARRATIVE_ENABLED env flag in server/src/workers/reflection.worker.js. When NARRATIVE_ENABLED='false', drop NARRATIVE/narrative cards and ensure they are not persisted/emitted; add a focused test proving this.",
"Telemetry determinism: Langfuse remains no-op in test/CI; outbound policy tests remain deterministic and green.",
"Change Proof: show git status -sb and git diff --stat after edits.",
"Tests: all commands in tests_to_run pass.",
"No diff => no PR. If no changes are needed, stop and explain why with evidence."
"Alignment Evidence: show codex/tasks/latest.json values for task_id, base_branch, branch_name, and canary.",
"Alignment Evidence: print `git rev-parse --abbrev-ref HEAD` and `git rev-parse HEAD` for evidence; do NOT stop on SHA mismatch.",
"Work-Exists Gate: prove target symbols exist via grep or file navigation; if not found, stop and report: findReceiptOffsets (or equivalent), emitEntryAnalyzed callsite/payload, sanitizeBloomCardsWithContract boundary, validateReceipt in server/src/utils/truthValidator.js (or its imported helpers).",
"Freeze contract reason codes: add a shared constants module and replace raw string comparisons/assignments in Meaning Spine paths touched by this task.",
"Unique Match Rule: any transcript-search offset inference must return null on ambiguous multi-occurrence matches (firstIndex !== lastIndex). Ambiguity must drop the receipt/card safely and be reflected in contract/dropped reasons.",
"validateReceipt hardening: strict V1 path must not fall through to weaker matching if offsets fail; invalid shapes return explicit failure reasons and do not throw.",
"Emission hardening: ENTRY_ANALYZED payload must contain sanitized cards AND the Meaning Contract ledger; payload must not include raw reflection text anywhere.",
"Tests: add/adjust regression tests that fail if raw model output leaks into emission serialization; add/adjust tests verifying ambiguous quote matches are dropped.",
"Proof: include git status -sb and git diff --stat after changes; run tests_to_run and report results. (Run `pnpm -w test` locally after PR if desired.)"
],
"_meta": {
"canary": "meaning.contract.firewall.2026-02-19.canary.a1",
"created_at": "2026-02-19",
"source": "handoff+preflight-discovery"
}
"canary": "CANARY_PHASE0_SPINE_LOCKDOWN_2026_02_19"
}
19 changes: 19 additions & 0 deletions server/src/utils/meaningSpineContracts.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
export const CONTRACT_REASONS = Object.freeze({
OK: 'OK',
NO_RECEIPTS: 'NO_RECEIPTS',
NARRATIVE_FILTERED: 'NARRATIVE_FILTERED',
MALFORMED_INPUT: 'MALFORMED_INPUT',
});

export const RECEIPT_VALIDATION_REASONS = Object.freeze({
EMPTY_TRANSCRIPT: 'EMPTY_TRANSCRIPT',
INVALID_RECEIPT_SHAPE: 'INVALID_RECEIPT_SHAPE',
MISSING_REQUIRED_FIELDS: 'MISSING_REQUIRED_FIELDS',
QUOTE_TOO_SHORT: 'QUOTE_TOO_SHORT',
TRANSCRIPT_HASH_MISMATCH: 'TRANSCRIPT_HASH_MISMATCH',
OFFSET_MISMATCH: 'OFFSET_MISMATCH',
OFFSET_AMBIGUOUS: 'OFFSET_AMBIGUOUS',
NOT_FOUND: 'NOT_FOUND',
QUOTE_HASH_MISMATCH: 'QUOTE_HASH_MISMATCH',
VALID: 'VALID',
});
10 changes: 10 additions & 0 deletions server/src/utils/narrativePolicy.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
const parseEnvFlag = (value, defaultEnabled = true) => {
if (value == null) return defaultEnabled;
const v = String(value).trim().toLowerCase();
if (v === '') return defaultEnabled;
if (v === 'false' || v === '0' || v === 'no' || v === 'off') return false;
if (v === 'true' || v === '1' || v === 'yes' || v === 'on') return true;
return defaultEnabled;
};

export const isNarrativeEnabled = () => parseEnvFlag(process.env.NARRATIVE_ENABLED, true);
32 changes: 22 additions & 10 deletions server/src/utils/truthValidator.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
import { createHash } from 'node:crypto';

import { RECEIPT_VALIDATION_REASONS } from './meaningSpineContracts.js';

/**
* Escapes special characters for use in a regular expression.
* This prevents injection attacks and ensures characters like ? or . are treated literally.
Expand Down Expand Up @@ -87,12 +89,22 @@ const hasV1RequiredFields = (receipt) => {
export const validateReceipt = (transcript, receiptOrQuote) => {
const normalizedTranscript = normalizeReceiptText(transcript);
if (!normalizedTranscript) {
return { ok: false, reason: 'EMPTY_TRANSCRIPT' };
return { ok: false, reason: RECEIPT_VALIDATION_REASONS.EMPTY_TRANSCRIPT };
}

const isObjectReceipt = receiptOrQuote && typeof receiptOrQuote === 'object';
const isV1Receipt = isObjectReceipt && receiptOrQuote.version === 'v1';

if (isObjectReceipt && !isV1Receipt) {
return { ok: false, reason: RECEIPT_VALIDATION_REASONS.INVALID_RECEIPT_SHAPE };
}

if (isV1Receipt && receiptOrQuote.offsetInference === 'AMBIGUOUS_MATCH') {
return { ok: false, reason: RECEIPT_VALIDATION_REASONS.OFFSET_AMBIGUOUS };
}

const isV1Receipt = receiptOrQuote && typeof receiptOrQuote === 'object' && receiptOrQuote.version === 'v1';
if (isV1Receipt && !hasV1RequiredFields(receiptOrQuote)) {
return { ok: false, reason: 'MISSING_REQUIRED_FIELDS' };
return { ok: false, reason: RECEIPT_VALIDATION_REASONS.MISSING_REQUIRED_FIELDS };
}

const quote = isV1Receipt ? resolveReceiptQuote(receiptOrQuote) : receiptOrQuote;
Expand All @@ -101,15 +113,15 @@ export const validateReceipt = (transcript, receiptOrQuote) => {
// A quote must be at least 16 characters to be considered a valid receipt.
// This strict limit prevents agents from anchoring on common phrases like "i feel" or "it was".
if (normalizedQuote.length < MIN_QUOTE_LENGTH) {
return { ok: false, reason: 'QUOTE_TOO_SHORT' };
return { ok: false, reason: RECEIPT_VALIDATION_REASONS.QUOTE_TOO_SHORT };
}

if (isV1Receipt) {
const expectedTranscriptHash = String(receiptOrQuote.transcriptHash || '').toLowerCase();
const actualTranscriptHash = sha256Hex(normalizedTranscript);

if (actualTranscriptHash !== expectedTranscriptHash) {
return { ok: false, reason: 'TRANSCRIPT_HASH_MISMATCH' };
return { ok: false, reason: RECEIPT_VALIDATION_REASONS.TRANSCRIPT_HASH_MISMATCH };
}

const offsets = resolveOffsets(receiptOrQuote);
Expand All @@ -120,13 +132,13 @@ export const validateReceipt = (transcript, receiptOrQuote) => {
offsets.end > String(transcript || '').length;

if (hasInvalidOffsets) {
return { ok: false, reason: 'OFFSET_MISMATCH' };
return { ok: false, reason: RECEIPT_VALIDATION_REASONS.OFFSET_MISMATCH };
}

const transcriptSlice = String(transcript || '').slice(offsets.start, offsets.end);
const normalizedSlice = normalizeReceiptText(transcriptSlice);
if (normalizedSlice !== normalizedQuote) {
return { ok: false, reason: 'OFFSET_MISMATCH' };
return { ok: false, reason: RECEIPT_VALIDATION_REASONS.OFFSET_MISMATCH };
}
}

Expand All @@ -136,19 +148,19 @@ export const validateReceipt = (transcript, receiptOrQuote) => {
const pattern = new RegExp(`(?:^|\\s)${escapedQuote}(?:$|\\s)`, 'i');

if (!pattern.test(normalizedTranscript)) {
return { ok: false, reason: 'NOT_FOUND' };
return { ok: false, reason: RECEIPT_VALIDATION_REASONS.NOT_FOUND };
}

if (isV1Receipt) {
const expectedQuoteHash = String(receiptOrQuote.quoteHash || '').toLowerCase();
const actualQuoteHash = sha256Hex(normalizedQuote);

if (actualQuoteHash !== expectedQuoteHash) {
return { ok: false, reason: 'QUOTE_HASH_MISMATCH' };
return { ok: false, reason: RECEIPT_VALIDATION_REASONS.QUOTE_HASH_MISMATCH };
}
}

return { ok: true, reason: 'VALID' };
return { ok: true, reason: RECEIPT_VALIDATION_REASONS.VALID };
};

export default validateReceipt;
28 changes: 28 additions & 0 deletions server/src/workers/__tests__/reflection.worker.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,7 @@ describe('reflection.worker', () => {
entryId: expect.any(String),
userId: 'user-1',
cardsCreatedCount: expect.any(Number),
contract: expect.objectContaining({ hasReceiptedMeaning: true }),
meaning: expect.objectContaining({
structuredData: expect.objectContaining({
bloom_cards: expect.any(Array),
Expand All @@ -336,6 +337,33 @@ describe('reflection.worker', () => {
expect(result).toEqual(doneTask);
});



it('never emits raw reflection text in ENTRY_ANALYZED payload serialization', async () => {
const runningTask = { _id: taskId, entryId, status: 'running' };
const doneTask = { _id: taskId, entryId, status: 'done' };

mocks.findByIdAndUpdateMock.mockResolvedValueOnce(runningTask).mockResolvedValueOnce(doneTask);
mocks.findByIdMock.mockReturnValue(
mockLeanResult({ _id: entryId, userId: 'user-1', transcript: 'hello world transcript' }),
);

const rawModelText = 'RAW_REFLECTION_SHOULD_NOT_BE_EMITTED_12345';
mocks.reflectEntryWithContextMock.mockResolvedValueOnce({
reply: `${rawModelText}
${JSON.stringify([{ type: 'reflection', headline: 'Safe headline', confidence: 0.9, receipts: [{ quote: 'hello world transcript' }] }])}`,
traceId: 'trace-reflect-raw',
});

await handleReflectionJob({ data: { entryId, taskId } } as any);

const emittedPayload = mocks.emitEntryAnalyzedMock.mock.calls.at(-1)?.[0];
expect(emittedPayload).toBeTruthy();
expect(JSON.stringify(emittedPayload)).not.toContain(rawModelText);
expect(emittedPayload?.meaning?.text).toBeUndefined();
expect(emittedPayload?.meaning?.summary).toBeUndefined();
});

it('blocks observer context when calibrated confidence is below the threshold', async () => {
mocks.findByIdAndUpdateMock
.mockResolvedValueOnce({ _id: taskId, entryId, status: 'running' })
Expand Down
Loading
Loading