Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
da895ea
Changed default storage to in-memory. SQLite still available via config.
mattpocock Oct 19, 2025
2efa48e
Remove problematic backend-only-constants imports
github-actions[bot] Oct 19, 2025
59677dd
Fixed CI properly
mattpocock Oct 19, 2025
9514115
Merge branch 'main' of https://github.com/mattpocock/evalite into v1
mattpocock Oct 19, 2025
48a5581
Huge move from evals -> suites, and results -> evals
mattpocock Oct 19, 2025
26073ea
Added changeset
mattpocock Oct 19, 2025
54c9ccb
Removed streaming text support from tasks.
mattpocock Oct 19, 2025
3bd27cf
feat: Support .env files by default via dotenv/config
github-actions[bot] Oct 19, 2025
8133042
feat: Support setupFiles from vitest.config.ts with evalite.config.ts…
github-actions[bot] Oct 19, 2025
4f8ec7a
Fixes after cherrypick
mattpocock Oct 19, 2025
c624aca
Formatting
mattpocock Oct 19, 2025
d135997
Docs updates
mattpocock Oct 19, 2025
15f3dc2
Docs updates
mattpocock Oct 20, 2025
5d8f0b2
feat: Add scorer utilities for LLM and embedding-based evaluations
cantemizyurek Oct 21, 2025
1b5787a
feat: Integrate new faithfulness scorer and update dependencies
cantemizyurek Oct 21, 2025
b48f5a7
Merge pull request #1 from cantemizyurek/faithfulness
cantemizyurek Oct 21, 2025
bd6e54f
feat: Add AnswerSimilarity scorer for evaluating semantic similarity
cantemizyurek Oct 22, 2025
b6bac1c
feat: Add evaluation script for Answer Similarity
cantemizyurek Oct 22, 2025
eafa00c
Merge pull request #2 from cantemizyurek/answer-similarity
cantemizyurek Oct 22, 2025
4f788c3
feat: Add Context Recall Scorer
cantemizyurek Oct 22, 2025
e6448d2
feat: Add evaluation script for RAG Context Recall
cantemizyurek Oct 22, 2025
652591e
refactor: Update scorers to use 'expected' instead of 'input.reference'
cantemizyurek Oct 22, 2025
156a06d
refactor: Remove failedToScore utility and replace with error in scorers
cantemizyurek Oct 22, 2025
a719796
refactor: Update scoring schemas to use jsonSchema and remove zod dep…
cantemizyurek Oct 22, 2025
efdffa4
refactor: Simplify answerSimilarity scorer by removing threshold logi…
cantemizyurek Oct 22, 2025
189ef3e
refactor: rename embedding to embeddingModel clearer
cantemizyurek Oct 22, 2025
d5f243d
refactor: update embedding property to embeddingModel for clarity
cantemizyurek Oct 22, 2025
d031484
refactor: Introduce Scorers namespace with types for LLM and embeddin…
cantemizyurek Oct 22, 2025
68bcde5
refactor: Move SingleTurnSample and EvaluationSample types to Scorers…
cantemizyurek Oct 22, 2025
3535516
refactor: Update Evalite types to support userInput structure. And ad…
cantemizyurek Oct 22, 2025
9280c09
refactor: Export utility functions for sample type checks in scorers
cantemizyurek Oct 22, 2025
136eda6
feat: Add sheet overlay backdrop for evaluation routes
cantemizyurek Oct 22, 2025
23e698c
fix: Update layout for ResultComponent to ensure minimum height is ma…
cantemizyurek Oct 22, 2025
39f0099
Create real-phones-join.md
mattpocock Oct 22, 2025
9423bf2
Merge pull request #255 from cantemizyurek/sheet-overlay-backdrop
mattpocock Oct 22, 2025
dff3a17
refactor: Rename retrievedContexts to groundTruth in scoring interfac…
cantemizyurek Oct 22, 2025
41fdd32
feat: Swap from React Markdown to Streamdown
cantemizyurek Oct 22, 2025
dc08a76
refactor: Consolidate scorer creation functions and enhance structure
cantemizyurek Oct 23, 2025
811ca8a
refactor: Enhance scorer options structure with SingleTurnFn and Mult…
cantemizyurek Oct 23, 2025
7abecd4
fix: Fix evaluation input types to fit new format
cantemizyurek Oct 23, 2025
f32d0c9
refactor: Simplify function signatures in contextRecall and faithfuln…
cantemizyurek Oct 23, 2025
f092e68
Changed default storage to in-memory. SQLite still available via config.
mattpocock Oct 19, 2025
751ed07
Remove problematic backend-only-constants imports
github-actions[bot] Oct 19, 2025
57883fb
Fixed CI properly
mattpocock Oct 19, 2025
54f9618
Huge move from evals -> suites, and results -> evals
mattpocock Oct 19, 2025
07541f6
Added changeset
mattpocock Oct 19, 2025
58cb7a6
Removed streaming text support from tasks.
mattpocock Oct 19, 2025
5c15a8d
feat: Support .env files by default via dotenv/config
github-actions[bot] Oct 19, 2025
2433940
feat: Support setupFiles from vitest.config.ts with evalite.config.ts…
github-actions[bot] Oct 19, 2025
6a25f86
Fixes after cherrypick
mattpocock Oct 19, 2025
938ef45
Formatting
mattpocock Oct 19, 2025
0519544
Docs updates
mattpocock Oct 19, 2025
43dbbd8
Docs updates
mattpocock Oct 20, 2025
53b2cd1
feat: Add sheet overlay backdrop for evaluation routes
cantemizyurek Oct 22, 2025
563791d
fix: Update layout for ResultComponent to ensure minimum height is ma…
cantemizyurek Oct 22, 2025
df9484b
Create real-phones-join.md
mattpocock Oct 22, 2025
a24889f
Merge branch 'v1' of https://github.com/mattpocock/evalite into scorers
mattpocock Oct 23, 2025
f67f215
Merge pull request #251 from cantemizyurek/scorers
mattpocock Oct 23, 2025
ef66bc9
refactor: change codeblocks theme to dark+ and light+
cantemizyurek Oct 24, 2025
adfb2a6
Merge pull request #257 from cantemizyurek/swap-reactmarkdown
mattpocock Oct 24, 2025
b719276
fix: round millisecond durations to avoid floating point precision di…
github-actions[bot] Oct 24, 2025
02fdd76
Merge pull request #267 from mattpocock/claude/issue-265-20251024-1549
mattpocock Oct 24, 2025
32f0dd2
refactor: Simplify scorer factory API (#262)
mattpocock Oct 25, 2025
7db742d
Fix mismatch between input and output types. (#263)
cantemizyurek Oct 25, 2025
403481b
Enhance dark theme (#274)
cantemizyurek Oct 27, 2025
491e80c
Add Search functionality (#277)
cantemizyurek Oct 28, 2025
0d5e185
Add Tool Call Accuracy Scorer (#269)
cantemizyurek Oct 28, 2025
26ed748
Add watch mode test infrastructure
mattpocock Oct 28, 2025
dfdb619
Merge branch 'main' of https://github.com/mattpocock/evalite into v1
mattpocock Oct 29, 2025
0d16521
Add .editorconfig file
tyom Oct 26, 2025
b42641a
Return vitest instance when returning with !shouldKeepRunning
tyom Oct 29, 2025
5f09dd9
Introduce ESLint and add typecheck npm script
tyom Oct 26, 2025
e99fc1c
Add missing break in switch case
tyom Oct 29, 2025
205a215
Fix ESLint issues
tyom Oct 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/0000-export-command-change.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": major
---

Export command now uses the storage specified in the config and auto-runs if empty.
5 changes: 5 additions & 0 deletions .changeset/0000-in-memory-default.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": major
---

Changed default storage to in-memory. SQLite still available via config.
5 changes: 5 additions & 0 deletions .changeset/0000-remove-streaming.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": minor
---

Removed streaming text support from tasks. Process streams before returning from task() (e.g., await result.text for AI SDK).
5 changes: 5 additions & 0 deletions .changeset/0234-auto-dotenv-support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": minor
---

Support .env files by default via dotenv/config. Environment variables from .env files are now automatically loaded without any configuration needed. Users no longer need to manually add `setupFiles: ["dotenv/config"]` to their evalite.config.ts.
5 changes: 5 additions & 0 deletions .changeset/long-olives-give.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite": major
---

Moved storage API from evals -> suites, results -> evals. This will likely cause issues for existing SQLite databases when released, so will need migration.
5 changes: 5 additions & 0 deletions .changeset/real-phones-join.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite-ui": patch
---

Added an overlay to the backdrop when viewing a trace
5 changes: 5 additions & 0 deletions .changeset/wet-clocks-camp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"evalite-ui": minor
---

Add the ability to search and filter evals in the UI
21 changes: 21 additions & 0 deletions .editorconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# http://editorconfig.org

root = true

[*]
charset = utf-8
end_of_line = lf
indent_style = space
insert_final_newline = true
trim_trailing_whitespace = true
indent_size = 2

[*.md]
trim_trailing_whitespace = false

[*.py]
indent_size = 4

[Makefile]
indent_style = tabs
indent_size = 2
120 changes: 89 additions & 31 deletions apps/evalite-docs/astro.config.mts
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,27 @@ import starlight from "@astrojs/starlight";

// https://astro.build/config
export default defineConfig({
redirects: {
"/what-is-evalite": "/guides/what-is-evalite",
"/quickstart": "/guides/quickstart",
"/guides/environment-variables": "/guides/configuration",
"/guides/skipping": "/tips/only-run-certain-evals",
"/guides/customizing-the-ui": "/tips/customize-the-ui",
"/guides/ci": "/tips/run-evals-on-ci-cd",
"/guides/running-programmatically": "/tips/run-evals-programmatically",
"/examples/ai-sdk": "/tips/vercel-ai-sdk",
"/guides/traces": "/tips/adding-traces",
"/guides/variant-comparison": "/tips/comparing-different-approaches",
"/guides/multi-modal": "/tips/images-and-media",
"/guides/cli": "/tips/watch-mode",
"/tips/skip-evals-during-development": "/tips/only-run-certain-evals",
"/tips/track-individual-llm-calls": "/tips/adding-traces",
"/tips/integrate-with-ai-sdk": "/tips/vercel-ai-sdk",
"/tips/work-with-images-and-media": "/tips/images-and-media",
"/tips/use-watch-mode-effectively": "/tips/watch-mode",
"/tips/set-score-thresholds": "/tips/score-thresholds",
"/tips/run-specific-eval-files": "/tips/only-run-certain-evals",
},
integrations: [
starlight({
title: "Evalite",
Expand Down Expand Up @@ -72,75 +93,112 @@ export default defineConfig({
},
sidebar: [
{
label: "Getting Started",
label: "Guides",
items: [
{
label: "What Is Evalite?",
slug: "what-is-evalite",
slug: "guides/what-is-evalite",
},
{
label: "Quickstart",
slug: "quickstart",
slug: "guides/quickstart",
},
{
label: "Scorers",
slug: "guides/scorers",
},
{
label: "Configuration",
slug: "guides/configuration",
},
],
},
{
label: "Guides",
label: "Tips",
items: [
{
label: "Environment Variables",
slug: "guides/environment-variables",
label: "Only Run Certain Evals",
slug: "tips/only-run-certain-evals",
},
{
label: "Scorers",
slug: "guides/scorers",
label: "Customize The UI",
slug: "tips/customize-the-ui",
},
{
label: "Traces",
slug: "guides/traces",
label: "CI/CD",
slug: "tips/run-evals-on-ci-cd",
},
{
label: "A/B Testing",
slug: "guides/variant-comparison",
label: "Adding Traces",
slug: "tips/adding-traces",
},
{
label: "Multi-Modal",
slug: "guides/multi-modal",
label: "Vercel AI SDK",
slug: "tips/vercel-ai-sdk",
},
{
label: "Configuration",
slug: "guides/configuration",
label: "Comparing Different Approaches",
slug: "tips/comparing-different-approaches",
},
{
label: "Run Evals Programmatically",
slug: "tips/run-evals-programmatically",
},
{
label: "Images And Media",
slug: "tips/images-and-media",
},
{
label: "Run Same Eval Multiple Times",
slug: "tips/run-same-eval-multiple-times",
},
{
label: "Watch Mode",
slug: "tips/watch-mode",
},
{
label: "Streams",
slug: "guides/streams",
label: "Score Thresholds",
slug: "tips/score-thresholds",
},
],
},
{
label: "Reference",
items: [
{
label: "evalite()",
slug: "api/evalite",
},
{
label: "CLI",
slug: "guides/cli",
slug: "api/cli",
},
{
label: "Running Programmatically",
slug: "guides/running-programmatically",
label: "defineConfig()",
slug: "api/define-config",
},
{
label: "CI/CD",
slug: "guides/ci",
label: "createScorer()",
slug: "api/create-scorer",
},
{
label: "Skipping Evals",
slug: "guides/skipping",
label: "EvaliteFile",
slug: "api/evalite-file",
},
{
label: "Traces",
slug: "api/traces",
},
{
label: "Customizing The UI",
slug: "guides/customizing-the-ui",
label: "runEvalite()",
slug: "api/run-evalite",
},
{
label: "Storage",
slug: "api/storage",
},
],
},
{
label: "Integrations",
items: [{ label: "Vercel AI SDK", slug: "examples/ai-sdk" }],
},
],
}),
],
Expand Down
Loading