llm-evals

Because we should all have our own set of LLM evals. Blog post

Installation

Python stuff:

uv sync

Node stuff:

npm install

just:

brew install just

Run them all:

just eval-all

Run a specific one:

just eval CONFIG

where CONFIG is "social-media-insults" for example.

To view the dashboard (the version published at https://kschaul.com/llm-evals/):

just dev

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
.github/workflows		.github/workflows
src		src
.env		.env
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Justfile		Justfile
README.md		README.md
eval.py		eval.py
observablehq.config.js		observablehq.config.js
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml
test_eval.py		test_eval.py
uv.lock		uv.lock