-
Notifications
You must be signed in to change notification settings - Fork 43
Set up ESLint for the repo and fix lint issues #282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v1
Are you sure you want to change the base?
Conversation
- Remove unused DB_LOCATION import from test-utils.ts - Replace FILES_LOCATION import with local constant in files.test.ts Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>
- Add dotenv as a dependency - Create env-setup-file module that imports dotenv/config - Export env-setup-file as 'evalite/env-setup-file' - Automatically prepend env-setup-file to setupFiles array - Update documentation to reflect automatic .env loading - Update example config to remove manual dotenv setup Fixes mattpocock#234 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>
… precedence - Add loadVitestSetupFiles() to load setupFiles from vitest.config.ts - Merge setupFiles from both configs with evalite.config.ts taking precedence - Add tests for vitest.config.ts setupFiles support and precedence - setupFiles execution order: env-setup-file -> vitest -> evalite Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>
- Export new `evalite/scorers` module with factory functions - Add `createLLMBasedScorer` for model-dependent scorers - Add `createEmbeddingBasedScorer` for embedding-dependent scorers - Introduce `EvaluationSample` type with query, contexts, and reference fields Part of mattpocock#250
- Added a new `faithfulness` scorer to evaluate model responses against retrieved contexts. - Introduced utility functions for scoring and context handling. - Updated `package.json` to include `zod` version 4.1.12 as a dependency. - Updated `pnpm-lock.yaml` to reflect changes in dependencies and versions. Part of mattpocock#250
Add Faithfulness
- Introduced a new `answerSimilarity` scorer to assess the semantic similarity between a ground truth answer and a generated answer. - The scorer utilizes embedding models to compute cosine similarity and includes an optional threshold for binary output. - Updated the `scorers` module to export the new `answerSimilarity` scorer. Part of mattpocock#250
Add Answer Similarity Scorer
- Introduced a new `contextRecall` scorer to evaluate how much of a generated answer can be attributed to retrieved contexts. - Updated the `scorers` module to export the new `contextRecall` scorer. Part of mattpocock#250
…c and updating metadata format
…g based scorers, and context recall and faithfulness classifications
… namespace for better organization
…d Multi Turn Sample Type
…intained and grows as necessary
Add Scorers module
feat: Swap from React Markdown to Streamdown
…splay issues Fixes mattpocock#265 - durations were displaying with full floating point precision like '3.090249999999997ms' instead of rounding to '3ms'. Updated formatTime() to use Math.round() for millisecond values. Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>
…251024-1549 fix: round millisecond durations to avoid floating point precision display issues
* refactor: Simplify scorer factory API - Remove `createBaseScorer`, consolidate to `createLLMScorer`/`createEmbeddingScorer` - Add generic `TExpected` type for type-safe expected data - Replace `singleTurn`/`multiTurn` with single `scorer` function - Rename utils to `isSingleTurnInput`/`isMultiTurnInput` - Update all scorers (faithfulness, answerSimilarity, contextRecall) to new API - Fix example.eval.ts: textStream -> text 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * refactor: Move inline scorer types to Evalite.Scorers namespace Move inline expected data types from answer-similarity, context-recall, and faithfulness scorers to the Evalite.Scorers namespace in types.ts for better type organization and discoverability. Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com> * Formatting --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>
* refactor: Simplify scorer factory API - Remove `createBaseScorer`, consolidate to `createLLMScorer`/`createEmbeddingScorer` - Add generic `TExpected` type for type-safe expected data - Replace `singleTurn`/`multiTurn` with single `scorer` function - Rename utils to `isSingleTurnInput`/`isMultiTurnInput` - Update all scorers (faithfulness, answerSimilarity, contextRecall) to new API - Fix example.eval.ts: textStream -> text 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * refactor: Move inline scorer types to Evalite.Scorers namespace Move inline expected data types from answer-similarity, context-recall, and faithfulness scorers to the Evalite.Scorers namespace in types.ts for better type organization and discoverability. Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com> * refactor: Update scorer types and utility functions for output handling - Renamed input types in Evalite.Scorers namespace to reflect output handling: SingleTurnInput to SingleTurnOutput, MultiTurnInput to MultiTurnOutput, and updated related types accordingly. - Modified scorer implementations in context-recall and faithfulness to use new output types. - Updated utility functions to check for output types instead of input types, enhancing clarity and consistency in the scoring logic. * Formatting * Trigger CI re-check --------- Co-authored-by: Matt Pocock <mattpocockvoice@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>
* feat: update tailwind.css with new dark theme colors Solves: mattpocock#272 * fix: remove unnecessary class from active sidebar item styling
* feat: integrate search functionality - Implemented search functionality in the main application layout, allowing users to filter evaluations based on search queries. - Updated routes to support search parameters using Zod for validation. Closes: mattpocock#271 * Create wet-clocks-camp.md --------- Co-authored-by: Matt Pocock <mattpocockvoice@gmail.com>
* refactor: Simplify scorer factory API - Remove `createBaseScorer`, consolidate to `createLLMScorer`/`createEmbeddingScorer` - Add generic `TExpected` type for type-safe expected data - Replace `singleTurn`/`multiTurn` with single `scorer` function - Rename utils to `isSingleTurnInput`/`isMultiTurnInput` - Update all scorers (faithfulness, answerSimilarity, contextRecall) to new API - Fix example.eval.ts: textStream -> text 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * refactor: Move inline scorer types to Evalite.Scorers namespace Move inline expected data types from answer-similarity, context-recall, and faithfulness scorers to the Evalite.Scorers namespace in types.ts for better type organization and discoverability. Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com> * refactor: Update scorer types and utility functions for output handling - Renamed input types in Evalite.Scorers namespace to reflect output handling: SingleTurnInput to SingleTurnOutput, MultiTurnInput to MultiTurnOutput, and updated related types accordingly. - Modified scorer implementations in context-recall and faithfulness to use new output types. - Updated utility functions to check for output types instead of input types, enhancing clarity and consistency in the scoring logic. * feat: Implement Tool Call Accuracy scorer Related mattpocock#250 --------- Co-authored-by: Matt Pocock <mattpocockvoice@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>
- Extended loadFixture() to return Vitest instance with Symbol.asyncDispose - Added triggerWatchModeRerun() helper using vitest.waitForTestRunEnd() - Added disableServer option to runEvalite() to prevent port conflicts - runEvalite() now returns Vitest instance - Fixed ai-sdk-traces.test.ts to use await using 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This fixes the TS errors.
- Include ESLint 9 as root dependency - Set up ESLint to lint the whole repo - Extend the root config and add a few package-specific plugins for Evalite UI - Add a consistent `typecheck` npm script for type checking across the repo Use can now use `pnpm lint` in root and UI app and `pnpm typecheck` anywhere in the repo. Use `pnpm lint --fix` to attempt to fix the issues.
|
|
@tyom is attempting to deploy a commit to the Skill Recordings Team on Vercel. A member of the Team first needs to authorize it. |
a5f098c to
8c4667c
Compare
Originally proposed in #276. Resubmitting against v1 branch as requested.
typechecknpm script for type checking across the repoUse can now use
pnpm lintin root and UI app andpnpm typecheckanywhere in the repo.Use
pnpm lint --fixto attempt to fix the issues.Fix errors (mostly removing unused imports and variables). I left some prefixed with
_to serve as a reminder of function parameters.Also add EditorConfig file to help maintain consistent code style for those who use it.