From b6874b7fad41ac39149e2aa1bb807eaf8bc67149 Mon Sep 17 00:00:00 2001
From: Andrei Lopatenko <alopatenko@gmail.com>
Date: Sat, 16 Nov 2024 22:06:23 -0800
Subject: [PATCH] Update README.md

---
 README.md | 1 +
 1 file changed, 1 insertion(+)

diff --git a/README.md b/README.md
index 7f041b7..f8ac1f3 100644
--- a/README.md
+++ b/README.md
@@ -83,6 +83,7 @@ My view on LLM Evaluation: [Deck](LLMEvaluation.pdf), and [SF Big Analytics and
 ---
 ### Evaluation Software
 - [EleutherAI LLM Evaluation Harness ](https://github.com/EleutherAI/lm-evaluation-harness)
+- Eureka, Microsoft, A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings. [github](https://github.com/microsoft/eureka-ml-insights) Sep 2024 [arxiv](https://arxiv.org/abs/2409.10566)
 - [OpenAI Evals]( https://github.com/openai/evals)
 - [ConfidentAI DeepEval](https://github.com/confident-ai/deepeval)
 - [MTEB](https://huggingface.co/spaces/mteb/leaderboard)