Skip to content

Pull requests: EleutherAI/lm-evaluation-harness

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

added OGPTX and pubsector tasks
#3671 opened Apr 2, 2026 by Anirbanbhk88 Loading…
Added Toksuite Benchmark
#3669 opened Apr 1, 2026 by gsaltintas Loading…
Add SimpleQA (OpenAI factuality benchmark)
#3667 opened Mar 31, 2026 by sarthakkgupta Loading…
6 tasks done
Allow Task objects to defer dataset download
#3663 opened Mar 29, 2026 by siddhant-rajhans Loading…
MMLU Task Name
#3660 opened Mar 29, 2026 by Teddygat0r Loading…
[BUGFIX] Consistent handling of None answers and cache
#3656 opened Mar 26, 2026 by RawthiL Loading…
[Task] NEREL-bench
#3650 opened Mar 23, 2026 by bond005 Loading…
[Task] IFBench
#3642 opened Mar 20, 2026 by RawthiL Loading…
feat: add TRT-LLM backend.
#3628 opened Mar 11, 2026 by Tracin Loading…
ProTip! Updated in the last three days: updated:>2026-03-31.