Fork of LM Evaluation Harness Suite for evaluating benchmarks in paper titled "PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference"
-
Updated
Feb 25, 2025 - Python
Fork of LM Evaluation Harness Suite for evaluating benchmarks in paper titled "PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference"
한국어 전문/기술 자격 시험 벤치마크 KMMLU-Pro, KMMLU-Redux를 lm-evaluation-harness로 평가하기 위한 태스크 정의 [LG Aimers 8기 산출물]
Evaluate models and compare their scores
Does the identity in a system prompt change performance?
Add a description, image, and links to the lm-evaluation-harness topic page so that developers can more easily learn about it.
To associate your repository with the lm-evaluation-harness topic, visit your repo's landing page and select "manage topics."