Handit-AI · handit-ai · Aug 20, 2025
diff --git a/docs/ai-assessment.md b/docs/ai-assessment.md
@@ -0,0 +1,74 @@
+# AI Review – Prompt Best Practices
+
+**Service:** Invoice Copilot  
+**Model:** gpt-4o-mini (prod)  
+**Repo scan date:** 2025-08-20  
+**Assessor:** handit assess (repo-only, no production data)
+
+## 1) Scorecard
+
+| Criterion                  | Data Analysis Assistant |
+|----------------------------|-------------------------|
+| Context & Rationale (0–5)  | 2                       |
+| Format/Output Contract     | 1                       |
+| Examples & Edge Cases      | 0                       |
+| Determinism & Guardrails   | 1                       |
+| Testability                | 1                       |
+
+**Overall Risk:** 🔴 High (single prompt)
+
+## 2) Risk Heatmap (top issues)
+
+- **Lack of explicit output format** → May lead to inconsistent outputs (**🔴 High**)
+- **No examples or edge cases provided** → Reduces reliability in varied scenarios (**🔴 High**)
+- **Minimal context and rationale** → Limits alignment with user needs (**🔶 Medium**)
+
+## 3) Findings & Improvement Levers
+
+### A. Prompt: Data Analysis Assistant
+
+#### Original
+
+```
+You are a helpful assistant that specializes in data analysis.
+...
+```
+
+#### Strengths
+
+- Assigns a clear role or persona to the AI, which can help in maintaining a consistent style and expertise.
+
+#### Risks
+
+- The prompt lacks explicit instructions on the desired output format, which can lead to inconsistent responses.
+- No examples or edge cases are provided, reducing the model's ability to handle varied scenarios effectively.
+- Minimal context and rationale are given, which may limit the model's alignment with specific user needs.
+- Absence of determinism and guardrails could lead to unpredictable outputs.
+
+#### Improvement Levers (not fixes)
+
+- **Add explicit output format instructions**: Specify the structure or sections required in the response to improve consistency (Best Practice 5).
+- **Include examples and edge cases**: Provide few-shot examples to guide the model in handling different scenarios (Best Practice 3).
+- **Enhance context and rationale**: Offer more background information to better align the model's responses with user expectations (Best Practice 2).
+- **Implement guardrails**: Set explicit constraints on length, tone, and format to ensure predictable outputs (Best Practice 9).
+
+#### Evidence (static scan)
+The prompt is minimalistic and lacks detailed instructions or examples, which are critical for ensuring reliable and consistent AI behavior.  
+→ Potential production impact includes inconsistent data analysis outputs and reduced user satisfaction.
+
+## 4) Next Steps (with handit)
+
+1. **Connect handit** (1-line setup)
+2. **Run baseline evals** on your real logs: Evaluate consistency and alignment with user needs.
+3. **handit will experiment** with improvements automatically.
+4. **If a candidate passes** tests on your real data, handit opens the PR.
+
+## 5) Business Impact (expected if improved)
+
+- **Increased consistency and reliability** in AI outputs, leading to higher user satisfaction and trust.
+- **Improved handling of diverse scenarios** through the inclusion of examples and edge cases, enhancing the model's versatility.
+- **Better alignment with user needs** by providing more context and rationale, potentially increasing the adoption and effectiveness of the service.
+
+---
+
+*This PR was automatically generated by handit.ai Autonomous engineer*