From e7df1f9a64209bf713493e97a46251d98506c1e4 Mon Sep 17 00:00:00 2001 From: Aleksandr Lozhkovoi Date: Sun, 22 Feb 2026 06:39:45 +0100 Subject: [PATCH] feat: improve semantic response quality with truthfulness guardrails --- .cursor-plugin/plugin.json | 2 +- CHANGELOG.md | 4 ++++ agents/flutter-app-builder.md | 8 ++++++++ agents/flutter-code-reviewer.md | 7 +++++++ agents/flutter-mobile-release-manager.md | 7 +++++++ agents/flutter-test-writer.md | 7 +++++++ plugin.json | 2 +- rules/flutter-plugin-policy-priority.mdc | 6 ++++++ skills/build-flutter-features/SKILL.md | 2 ++ skills/debug-flutter-issues/SKILL.md | 1 + skills/integrate-firebase/SKILL.md | 2 ++ skills/migrate-flutter-code/SKILL.md | 2 ++ skills/release-mobile-apps/SKILL.md | 2 ++ skills/review-flutter-code/SKILL.md | 3 +++ skills/setup-flutter-environment/SKILL.md | 1 + skills/sync-official-flutter-ai-rules/SKILL.md | 2 ++ skills/update-flutter-dependencies/SKILL.md | 2 ++ skills/write-flutter-tests/SKILL.md | 2 ++ 18 files changed, 60 insertions(+), 2 deletions(-) diff --git a/.cursor-plugin/plugin.json b/.cursor-plugin/plugin.json index cba6527..65db417 100644 --- a/.cursor-plugin/plugin.json +++ b/.cursor-plugin/plugin.json @@ -1,7 +1,7 @@ { "name": "flutter-cursor-plugin", "displayName": "Flutter Cursor Plugin", - "version": "1.10.5", + "version": "1.10.6", "description": "Open-source Cursor plugin for end-to-end Flutter development and testing with Dart MCP, Figma MCP, practical architecture patterns, and reliable test workflows.", "author": { "name": "Aleksandr Lozhkovoi", diff --git a/CHANGELOG.md b/CHANGELOG.md index 09ff9a8..f862440 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -29,6 +29,10 @@ - command: `commands/setup-flutter-environment.md` - skill: `skills/setup-flutter-environment/SKILL.md` - Simplified command prompts by removing repeated guardrails boilerplate from canonical command files. +- Strengthened semantic output quality across agents/skills: + - explicit truthfulness policy (`planned/not executed` wording when no command evidence exists) + - required missing-inputs/assumptions notes for partial context + - required next steps and confidence/residual risk coverage in output contracts ## 1.10.0 diff --git a/agents/flutter-app-builder.md b/agents/flutter-app-builder.md index a8d6312..795af16 100644 --- a/agents/flutter-app-builder.md +++ b/agents/flutter-app-builder.md @@ -31,9 +31,17 @@ Primary agent for Flutter feature development. - Add/update tests proportionally to behavior changes. - Prefer incremental, reviewable changes over large rewrites. +## Semantic quality defaults + +- Never claim commands were executed if no command output is available. +- If context is missing, explicitly list the missing inputs before proposing deep changes. +- Separate confirmed facts from assumptions. +- End with 1-3 concrete next steps for the user. + ## Output expectations 1. Selected route/skill and reason. 2. Scope and files touched. 3. Validation commands and results. 4. Risks or follow-up steps. +5. Missing inputs or assumptions (if any). diff --git a/agents/flutter-code-reviewer.md b/agents/flutter-code-reviewer.md index 5932bdd..9bdba49 100644 --- a/agents/flutter-code-reviewer.md +++ b/agents/flutter-code-reviewer.md @@ -23,6 +23,12 @@ Dedicated agent for code review and conventions. - Test gaps and brittle assertions. - Accessibility and localization risks. +## Semantic quality defaults + +- If review scope is missing, request it before deep findings. +- Mark each finding as confirmed from evidence vs inferred from limited context. +- Never imply security scans were executed unless command output is available. + ## Output expectations 1. Findings first, ordered by severity. @@ -30,3 +36,4 @@ Dedicated agent for code review and conventions. 3. Security findings included explicitly. 4. Validation evidence (commands/scans/checks performed). 5. Residual risks/testing gaps summary. +6. Confidence/assumption note when evidence is partial. diff --git a/agents/flutter-mobile-release-manager.md b/agents/flutter-mobile-release-manager.md index b78d5b5..7ef10af 100644 --- a/agents/flutter-mobile-release-manager.md +++ b/agents/flutter-mobile-release-manager.md @@ -19,9 +19,16 @@ Dedicated agent for mobile app publishing readiness. - iOS App Store-ready archive and signing checks. - Versioning, release notes, privacy declarations, and submission gating. +## Semantic quality defaults + +- Do not mark a platform "ready" without explicit build/check evidence. +- If evidence is missing, return `BLOCKED` and list exact data needed. +- Keep blockers actionable and ordered by release impact. + ## Output expectations 1. Android readiness status. 2. iOS readiness status. 3. Validation evidence (commands/artifacts/checklists). 4. Blocking issues before submission. +5. Next unblock steps. diff --git a/agents/flutter-test-writer.md b/agents/flutter-test-writer.md index fc3ee41..74390ba 100644 --- a/agents/flutter-test-writer.md +++ b/agents/flutter-test-writer.md @@ -26,9 +26,16 @@ Main router for Flutter test tasks. - For Patrol E2E tests, cover critical user journeys only (slow lane), keep unit/widget tests as fast lane. - Run only impacted tests before finishing. +## Semantic quality defaults + +- Do not present pseudo-code as "implemented tests" unless files/patches were actually created. +- If repository context is missing, provide a minimal test scaffold and explicitly mark assumptions. +- Always include remaining coverage gaps, not only happy path suggestions. + ## Output expectations 1. Test type selected (widget/bloc/integration) and reason. 2. Files changed and template used. 3. Validation commands run and pass/fail result. 4. Remaining coverage gaps. +5. Next test step for the user. diff --git a/plugin.json b/plugin.json index 62374e6..ff9124e 100644 --- a/plugin.json +++ b/plugin.json @@ -1,7 +1,7 @@ { "name": "flutter-cursor-plugin", "displayName": "Flutter Cursor Plugin", - "version": "1.10.5", + "version": "1.10.6", "description": "Open-source Cursor plugin for end-to-end Flutter development and testing with Dart MCP, Figma MCP, practical architecture patterns, and reliable test workflows.", "author": "Aleksandr Lozhkovoi", "license": "MIT", diff --git a/rules/flutter-plugin-policy-priority.mdc b/rules/flutter-plugin-policy-priority.mdc index 4d8b4e2..333a219 100644 --- a/rules/flutter-plugin-policy-priority.mdc +++ b/rules/flutter-plugin-policy-priority.mdc @@ -15,6 +15,12 @@ This file is the high-priority policy layer for this plugin. - Conflict rule: if official guidance conflicts with project policy, project policy wins. - Do not patch synced official files to enforce project policy. +## Truthfulness policy + +- Never state or imply that actions are completed without command output or concrete file diff evidence. +- In planning/simulation mode, use explicit wording: `planned`, `expected`, `not executed`. +- If evidence is missing, return status as `PENDING` or `BLOCKED` instead of `DONE`. + ## Architecture and state-management policy - Project First: follow the existing project architecture and state-management choice. diff --git a/skills/build-flutter-features/SKILL.md b/skills/build-flutter-features/SKILL.md index 7263760..6f5524b 100644 --- a/skills/build-flutter-features/SKILL.md +++ b/skills/build-flutter-features/SKILL.md @@ -31,6 +31,7 @@ Use this skill for non-test Flutter development tasks. - Restrict changes to the requested feature/module unless explicitly expanded. - Do not mix unrelated refactors with feature delivery. +- Do not claim implementation is complete unless concrete file changes or command outputs are provided. ## Required output @@ -38,6 +39,7 @@ Use this skill for non-test Flutter development tasks. 2. Files changed by layer (presentation/domain/data). 3. Validation commands run and results. 4. Residual risks or follow-up TODOs. +5. Missing inputs/assumptions (if context is incomplete). ## Required references diff --git a/skills/debug-flutter-issues/SKILL.md b/skills/debug-flutter-issues/SKILL.md index 2cfc118..520fcd1 100644 --- a/skills/debug-flutter-issues/SKILL.md +++ b/skills/debug-flutter-issues/SKILL.md @@ -24,6 +24,7 @@ Use for compiler/build/runtime failures. - Do not propose a fix without a reproducible command or clear log evidence. - Keep fixes minimal and limited to the failing layer unless a cross-layer root cause is proven. - Call out unknowns explicitly instead of guessing when logs are incomplete. +- Include one preventive follow-up even when the fix is minimal. ## Output format diff --git a/skills/integrate-firebase/SKILL.md b/skills/integrate-firebase/SKILL.md index dfa0d7b..ea7053c 100644 --- a/skills/integrate-firebase/SKILL.md +++ b/skills/integrate-firebase/SKILL.md @@ -28,6 +28,8 @@ Use this skill for end-to-end Firebase integration in Flutter apps. - Keep service wrappers injectable and testable. - Add error handling and fallback behavior for remote dependencies. - Validate behavior in both debug and release-capable builds. +- Do not claim Android/iOS integration is complete without naming changed config files. +- In simulation/planning mode, never use `integrated/completed`; use `planned/not executed`. ## Required output diff --git a/skills/migrate-flutter-code/SKILL.md b/skills/migrate-flutter-code/SKILL.md index 6e11936..a0a1f29 100644 --- a/skills/migrate-flutter-code/SKILL.md +++ b/skills/migrate-flutter-code/SKILL.md @@ -21,6 +21,7 @@ Use for framework/API/state-management migrations. - Do not mix unrelated refactors with migration work. - Keep intermediate states buildable when possible. - Prefer codemod-like repetitive edits over ad hoc changes. +- Attach validation status to each migration batch. ## Required output @@ -28,3 +29,4 @@ Use for framework/API/state-management migrations. 2. Batch-by-batch changes summary. 3. Validation commands/results per batch. 4. Breaking changes and rollback notes. +5. Next batch recommendation. diff --git a/skills/release-mobile-apps/SKILL.md b/skills/release-mobile-apps/SKILL.md index 1da0e55..a07511f 100644 --- a/skills/release-mobile-apps/SKILL.md +++ b/skills/release-mobile-apps/SKILL.md @@ -32,6 +32,7 @@ Use this skill for Android/iOS store publishing preparation. - Do not mark release ready without artifact build evidence. - Keep Android/iOS signing and versioning checks explicit. - Flag missing compliance metadata as blockers, not warnings. +- When evidence is missing, return `BLOCKED` instead of speculative readiness. ## Required output @@ -39,6 +40,7 @@ Use this skill for Android/iOS store publishing preparation. 2. iOS readiness status (+ artifact/archive status). 3. Validation commands run and outcomes. 4. Blocking gaps before submission. +5. Immediate next actions to unblock release. ## Required references diff --git a/skills/review-flutter-code/SKILL.md b/skills/review-flutter-code/SKILL.md index d51a34e..6bbe626 100644 --- a/skills/review-flutter-code/SKILL.md +++ b/skills/review-flutter-code/SKILL.md @@ -32,12 +32,15 @@ Use for PR/diff/code review requests. - Do not provide a deep review without explicit target scope (PR diff, range, or file list). - Tie each finding to concrete code evidence and expected behavioral impact. - Keep findings prioritized by severity and user risk, not by style preference. +- Distinguish confirmed findings from inferred risks when evidence is partial. +- Do not claim scans/commands were run without output evidence. ## Output format - Findings first, ordered by severity. - File references for each finding. - Brief residual risk/testing gap summary. +- Confidence/assumption note when applicable. ## Required references diff --git a/skills/setup-flutter-environment/SKILL.md b/skills/setup-flutter-environment/SKILL.md index ae51b42..caadf29 100644 --- a/skills/setup-flutter-environment/SKILL.md +++ b/skills/setup-flutter-environment/SKILL.md @@ -31,6 +31,7 @@ Use this skill when a project needs a clean, reproducible Flutter setup before i - Do not claim setup is complete while `flutter doctor` still has unresolved blockers for requested target platforms. - Keep setup changes minimal and reversible; avoid unrelated dependency upgrades. - If a required platform is out of scope (for example iOS on a non-iOS task), report it explicitly instead of forcing changes. +- Do not say `done/completed` without command evidence. ## Required output diff --git a/skills/sync-official-flutter-ai-rules/SKILL.md b/skills/sync-official-flutter-ai-rules/SKILL.md index cb79fb7..6d7a29e 100644 --- a/skills/sync-official-flutter-ai-rules/SKILL.md +++ b/skills/sync-official-flutter-ai-rules/SKILL.md @@ -31,6 +31,8 @@ Use this workflow to keep plugin guidance aligned with upstream Flutter AI rules - Do not enforce plugin policy by patching official content after sync. - Use `rules/flutter-plugin-policy-priority.mdc` for higher-priority policy and conflict resolution. - Prefer `4k` unless there is a clear reason to switch to `10k` or `1k`. +- Do not claim sync completed unless command output is available. +- In simulation/planning mode, status must be `PENDING` and include `not executed` note. ## Required output diff --git a/skills/update-flutter-dependencies/SKILL.md b/skills/update-flutter-dependencies/SKILL.md index 0af5eaf..96f4299 100644 --- a/skills/update-flutter-dependencies/SKILL.md +++ b/skills/update-flutter-dependencies/SKILL.md @@ -39,6 +39,7 @@ Use this skill for SDK and package upgrades that must stay stable and reviewable - If failures cascade, split into two PRs: - Flutter SDK upgrade - package upgrade and fixes +- Always include before/after version snapshot and explicit rollback trigger. ## Required output @@ -47,3 +48,4 @@ Use this skill for SDK and package upgrades that must stay stable and reviewable 3. Validation commands run and their result. 4. Files changed for compatibility fixes. 5. Rollback instructions. +6. Known remaining risks after upgrade. diff --git a/skills/write-flutter-tests/SKILL.md b/skills/write-flutter-tests/SKILL.md index c08673b..cc1a554 100644 --- a/skills/write-flutter-tests/SKILL.md +++ b/skills/write-flutter-tests/SKILL.md @@ -46,6 +46,7 @@ Use this skill as the single entry point for Flutter test work. - Prefer deterministic tests over time-dependent assertions. - Keep test setup local unless shared helpers already exist. - Avoid broad snapshot/golden assertions unless explicitly requested. +- Do not present sample test snippets as completed repository changes without file-level confirmation. ## Required output @@ -53,3 +54,4 @@ Use this skill as the single entry point for Flutter test work. 2. Files created/updated. 3. Test commands run and results. 4. Flakiness risks or missing coverage notes. +5. Next test to add (single highest-value gap).