RAG Eval Harness

Tiny, reproducible evaluation harness for RAG systems (golden set + metrics).

What it does

Runs a golden dataset (JSONL)
Computes retrieval metrics: recall@k, MRR
Produces a shareable report: report.json, report.md (+ report.png)

This is meant to be a lightweight “regression test” for RAG: run it before/after changes to know if retrieval got better or worse.

Quickstart

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Minimal metrics output
python -m rag_eval_harness.cli run --dataset data/golden.sample.jsonl --k 5

# Generate a report pack
python -m rag_eval_harness.cli run --dataset data/golden.sample.jsonl --k 5 --report-dir reports/latest

Example output:

{"recall@k": 1.0, "mrr": 0.75, "n": 5}

How to interpret metrics

recall@k: did the expected chunk(s) show up in the top‑k results?
MRR: how high was the first relevant chunk ranked? (higher is better)

Dataset format

Each line is a JSON object:

{"id":"q1","question":"...","gold_chunks":["docA#3"],"retrieved":["docA#3","docX#1"]}

Notes:

This harness is intentionally vector-DB agnostic.
Your ingestion/retrieval pipeline should write retrieved so we can score deterministically.

CI

The included GitHub Action runs a smoke evaluation on each push/PR.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
data		data
rag_eval_harness		rag_eval_harness
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Eval Harness

What it does

Quickstart

How to interpret metrics

Dataset format

CI

About

Uh oh!

Releases

Packages

Languages

savinoo/rag-eval-harness

Folders and files

Latest commit

History

Repository files navigation

RAG Eval Harness

What it does

Quickstart

How to interpret metrics

Dataset format

CI

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages