Skip to content
This repository has been archived by the owner on Feb 15, 2025. It is now read-only.

EPIC: LeapfrogAI Evaluations v1.1 #1171

Open
6 tasks
jalling97 opened this issue Oct 1, 2024 · 0 comments
Open
6 tasks

EPIC: LeapfrogAI Evaluations v1.1 #1171

jalling97 opened this issue Oct 1, 2024 · 0 comments
Assignees
Labels
EPIC ⚔️ EPIC issue to consolidate several sub-issues

Comments

@jalling97
Copy link
Contributor

jalling97 commented Oct 1, 2024

LeapfrogAI Evaluations v1.1

Description

Now that a baseline evaluations framework for LeapfrogAI exists, it needs to be further expanded to meet the needs of the product and mission-success teams.

Feedback has been provided with the common themes needing to be addressed:

  • Some evaluations (primarily NIAH) are always passing 100% and as such, are not helpful for tracking growth over time
  • Some NIAH and QA evals are not leveraging the full chunk data in RAG responses and as such are not evaluating RAG to the extent it should be
  • Evaluation results are not currently being stored anywhere
  • The current implementation of LFAI evals is very specific to the OpenAI way of handling RAG, and therefore the evaluations can't be run against custom RAG pipelines (a delivery concern).
  • MMLU results suspiciously sometimes return the same score for multiple topics, indicating a potential problem with the evaluation 🐛

Completion Criteria

@jalling97 jalling97 added the EPIC ⚔️ EPIC issue to consolidate several sub-issues label Oct 1, 2024
@jalling97 jalling97 self-assigned this Oct 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
EPIC ⚔️ EPIC issue to consolidate several sub-issues
Projects
None yet
Development

No branches or pull requests

1 participant