Make it easy to run evaluation directly from this repo #2233

pamelafox · 2024-12-13T01:13:19Z

Purpose

This PR makes it easier to run evaluations by bringing the evaluation SDK and tools directly into the repo. These scripts still use the ai-rag-chat-evaluator for its custom evaluation metrics and evaluation review CLI tools, but I've moved the ground truth generation directly into the repo, as I've found that it is often very specific to the needs of the repo.

This PR uses RAGAS for ground truth data generation, which works by constructing a knowledge graph and scenarios. That's a different approach from azure-ai-generative, what we used to use, but that SDK is now deprecated, and this RAGAS approach seems to produce good questions.

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[ ] Yes
[X] No

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[X] Yes - I need to update the evaluation tutorial!
[ ] No

Type of change

[ ] Bugfix
[X] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

The current tests all pass (python -m pytest).
I added tests that prove my fix is effective or that my feature works
I ran python -m pytest --cov to verify 100% coverage of added lines
I ran python -m mypy to check for type errors
I either used the pre-commit hooks or ran ruff and black manually on my code.

docs/evaluation.md

evals/generate_ground_truth.py

evals/ground_truth.jsonl

evals/requirements.txt

evals/generate_ground_truth.py

…eval

pamelafox added 9 commits December 11, 2024 09:29

Updating docs

34cc860

Update requirements.txt

42cd5d1

Update diagram

98bc256

Add typing extensions explicitly

897f4bf

Adding ground truth generation

e757646

Merge branch 'main' into evals

8849656

Add evaluate flow as well

314e2f7

Add RAGAS

dd6780e

Add RAGAS

8a70c5b

mattgotteiner reviewed Jan 8, 2025

View reviewed changes

docs/evaluation.md Outdated Show resolved Hide resolved

mattgotteiner reviewed Jan 8, 2025

View reviewed changes

docs/evaluation.md Show resolved Hide resolved

mattgotteiner reviewed Jan 8, 2025

View reviewed changes

evals/generate_ground_truth.py Outdated Show resolved Hide resolved

mattgotteiner reviewed Jan 8, 2025

View reviewed changes

evals/ground_truth.jsonl Outdated Show resolved Hide resolved

pamelafox commented Jan 9, 2025

View reviewed changes

evals/requirements.txt Outdated Show resolved Hide resolved

pamelafox commented Jan 9, 2025

View reviewed changes

evals/generate_ground_truth.py Outdated Show resolved Hide resolved

pamelafox mentioned this pull request Jan 10, 2025

This AI sample lacks risk & safety evaluation implementation #2262

Open

3 tasks

pamelafox added 6 commits February 6, 2025 15:23

Merge branch 'main' into evals

9b5ca5c

Remove simulator

6471747

Improvements to RAGAS code

a839b33

More logging, save knowledge graph after transforms

2305fc1

Update baseline, add citations matched metric, use separate venv for …

24425ce

…eval

Update the requirements to latest tag

5e314ee

pamelafox changed the title ~~WIP: Bring evaluation more tightly into the repo~~ Make it easy to run evaluation directly from this repo Feb 8, 2025

mattgotteiner approved these changes Feb 8, 2025

View reviewed changes

Logger fixes

02f4ba8

pamelafox merged commit a7dfc64 into Azure-Samples:main Feb 10, 2025
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it easy to run evaluation directly from this repo #2233

Make it easy to run evaluation directly from this repo #2233

pamelafox commented Dec 13, 2024 •

edited

Loading

Make it easy to run evaluation directly from this repo #2233

Make it easy to run evaluation directly from this repo #2233

Conversation

pamelafox commented Dec 13, 2024 • edited Loading

Purpose

Does this introduce a breaking change?

Does this require changes to learn.microsoft.com docs?

Type of change

Code quality checklist

pamelafox commented Dec 13, 2024 •

edited

Loading