lm-evaluation-harness

Here are 4 public repositories matching this topic...

burcgokden / lm-evaluation-harness-with-PLDR-LLM-kvg-cache

Fork of LM Evaluation Harness Suite for evaluating benchmarks in paper titled "PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference"

nlp natural-language-processing pytorch large-language-models llm lm-evaluation-harness pldr-llm

Updated Feb 25, 2025
Python

SeoBuAs / KMMLU-Pro-Redux-eval

Star

한국어 전문/기술 자격 시험 벤치마크 KMMLU-Pro, KMMLU-Redux를 lm-evaluation-harness로 평가하기 위한 태스크 정의 [LG Aimers 8기 산출물]

lm-evaluation-harness kmmlu kmmlu-pro kmmlu-redux

Updated Feb 26, 2026
Python

sangstar / comparator

Star

Evaluate models and compare their scores

benchmarking machine-learning ai evaluation evaluation-metrics lm-evaluation-harness lm-evaluation

Updated Nov 22, 2025
C++

Jojodicus / ai-identity-benchmark

Star

Does the identity in a system prompt change performance?

python benchmarking benchmark ai ai-agents uv ai-agent lm-evaluation-harness lm-eval

Updated Mar 3, 2026
Python

Improve this page

Add a description, image, and links to the lm-evaluation-harness topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the lm-evaluation-harness topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lm-evaluation-harness

Here are 4 public repositories matching this topic...

burcgokden / lm-evaluation-harness-with-PLDR-LLM-kvg-cache

SeoBuAs / KMMLU-Pro-Redux-eval

sangstar / comparator

Jojodicus / ai-identity-benchmark

Improve this page

Add this topic to your repo