Athina is building monitoring and evaluation tools for LLM developers.
Documentation | Quick Start | Running Evals
We have a library of preset evaluators, but you can also write custom evaluators within the Athina framework.
- Context Contains Enough Information: Detect bad or insufficient retrievals.
- Does Response Answer Query: Detect incomplete or irrelevant responses.
- Response Faithfulness: Detect when responses are deviating from the provided context.
- Summarization Accuracy: Detect hallucinations and mistakes in summaries
- Grading Criteria: If X, then fail. Otherwise pass.
- Custom Prompt: Custom prompt for LLM-powered evaluation.
- RAGAS: A set of evaluators that return RAGAS metrics.
- and more...
Results can also be viewed and tracked on our platform.
Documentation | Demo Video | Sign Up
- UI for monitoring and visibility into your LLM inferences.
- Run evals automatically against logged inferences in production.
- Track cost, token usage, response times, feedback, pass rate and other eval metrics.
- Analytics segmented by Customer ID, Model, Prompt, Environment, and More.
- Topic Classification
- Data Exports
- ... and more
Contact hello@athina.ai if you have any questions.