From b6874b7fad41ac39149e2aa1bb807eaf8bc67149 Mon Sep 17 00:00:00 2001 From: Andrei Lopatenko Date: Sat, 16 Nov 2024 22:06:23 -0800 Subject: [PATCH] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 7f041b7..f8ac1f3 100644 --- a/README.md +++ b/README.md @@ -83,6 +83,7 @@ My view on LLM Evaluation: [Deck](LLMEvaluation.pdf), and [SF Big Analytics and --- ### Evaluation Software - [EleutherAI LLM Evaluation Harness ](https://github.com/EleutherAI/lm-evaluation-harness) +- Eureka, Microsoft, A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings. [github](https://github.com/microsoft/eureka-ml-insights) Sep 2024 [arxiv](https://arxiv.org/abs/2409.10566) - [OpenAI Evals]( https://github.com/openai/evals) - [ConfidentAI DeepEval](https://github.com/confident-ai/deepeval) - [MTEB](https://huggingface.co/spaces/mteb/leaderboard)