Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 74 additions & 0 deletions docs/ai-assessment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# AI Review – Prompt Best Practices

**Service:** Invoice Copilot
**Model:** gpt-4o-mini (prod)
**Repo scan date:** 2025-08-20
**Assessor:** handit assess (repo-only, no production data)

## 1) Scorecard

| Criterion | Data Analysis Assistant |
|----------------------------|-------------------------|
| Context & Rationale (0–5) | 2 |
| Format/Output Contract | 1 |
| Examples & Edge Cases | 0 |
| Determinism & Guardrails | 1 |
| Testability | 1 |

**Overall Risk:** 🔴 High (single prompt)

## 2) Risk Heatmap (top issues)

- **Lack of explicit output format** → May lead to inconsistent outputs (**🔴 High**)
- **No examples or edge cases provided** → Reduces reliability in varied scenarios (**🔴 High**)
- **Minimal context and rationale** → Limits alignment with user needs (**🔶 Medium**)

## 3) Findings & Improvement Levers

### A. Prompt: Data Analysis Assistant

#### Original

```
You are a helpful assistant that specializes in data analysis.
...
```

#### Strengths

- Assigns a clear role or persona to the AI, which can help in maintaining a consistent style and expertise.

#### Risks

- The prompt lacks explicit instructions on the desired output format, which can lead to inconsistent responses.
- No examples or edge cases are provided, reducing the model's ability to handle varied scenarios effectively.
- Minimal context and rationale are given, which may limit the model's alignment with specific user needs.
- Absence of determinism and guardrails could lead to unpredictable outputs.

#### Improvement Levers (not fixes)

- **Add explicit output format instructions**: Specify the structure or sections required in the response to improve consistency (Best Practice 5).
- **Include examples and edge cases**: Provide few-shot examples to guide the model in handling different scenarios (Best Practice 3).
- **Enhance context and rationale**: Offer more background information to better align the model's responses with user expectations (Best Practice 2).
- **Implement guardrails**: Set explicit constraints on length, tone, and format to ensure predictable outputs (Best Practice 9).

#### Evidence (static scan)
The prompt is minimalistic and lacks detailed instructions or examples, which are critical for ensuring reliable and consistent AI behavior.
→ Potential production impact includes inconsistent data analysis outputs and reduced user satisfaction.

## 4) Next Steps (with handit)

1. **Connect handit** (1-line setup)
2. **Run baseline evals** on your real logs: Evaluate consistency and alignment with user needs.
3. **handit will experiment** with improvements automatically.
4. **If a candidate passes** tests on your real data, handit opens the PR.

## 5) Business Impact (expected if improved)

- **Increased consistency and reliability** in AI outputs, leading to higher user satisfaction and trust.
- **Improved handling of diverse scenarios** through the inclusion of examples and edge cases, enhancing the model's versatility.
- **Better alignment with user needs** by providing more context and rationale, potentially increasing the adoption and effectiveness of the service.

---

*This PR was automatically generated by handit.ai Autonomous engineer*