EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 3.1k
Star 12k

Code
Issues 564
Pull requests 214
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security and quality
Insights

Pull requests: EleutherAI/lm-evaluation-harness

Labels 10 Milestones 1

New pull request New

214 Open 1,798 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Add --include_package support for external model modules

#3672 opened Apr 3, 2026 by morgenyu

Loading…

added OGPTX and pubsector tasks

#3671 opened Apr 2, 2026 by Anirbanbhk88

Loading…

fix: local directory with task name no longer shadows registered task

#3670 opened Apr 1, 2026 by nloughl

Loading…

Added Toksuite Benchmark

#3669 opened Apr 1, 2026 by gsaltintas

Loading…

fix: use np.median for correct median aggregation

#3668 opened Apr 1, 2026 by JuhongPark

Loading…

Add SimpleQA (OpenAI factuality benchmark)

#3667 opened Mar 31, 2026 by sarthakkgupta

Loading…

6 tasks done

AIFQA-393 BLK-007: [vLLM/lm_eval] ChatGLM2/3 tokenizer — empty stop string crash in lm_eval

#3665 opened Mar 30, 2026 by jklawikowski

Loading…

Allow Task objects to defer dataset download

#3663 opened Mar 29, 2026 by siddhant-rajhans

Loading…

Add InfiniteBench: long-context evaluation beyond 100K tokens

#3662 opened Mar 29, 2026 by siddhant-rajhans

Loading…

MMLU Task Name

#3660 opened Mar 29, 2026 by Teddygat0r

Loading…

fix: fall back to tokenizer.eos_token when decode returns empty string

#3657 opened Mar 27, 2026 by ganeshr10

Loading…

[BUGFIX] Consistent handling of None answers and cache

#3656 opened Mar 26, 2026 by RawthiL

Loading…

feat: add optional SymPy equivalence and math_verify to hendrycks_math

#3655 opened Mar 26, 2026 by NezLheimeur

Loading…

fix: Reset batch_sizes cache before each _loglikelihood_tokens call

#3654 opened Mar 26, 2026 by nevertmr

Loading…

[Hendrycks] Fix false negatives and add flexible_match to Hendrycks

#3653 opened Mar 25, 2026 by fxmarty-amd

Loading…

fix: add Phi-3.5-vision support for vllm-vlm model type

#3651 opened Mar 24, 2026 by ganeshr10

Loading…

[Task] NEREL-bench

#3650 opened Mar 23, 2026 by bond005

Loading…

fix: migrate pubmedqa to script-less qiaojin/pubmed_qa dataset #3645

#3649 opened Mar 23, 2026 by Ishitha-P

Loading…

Fix chat_template_args handling when enable_thinking is None

#3648 opened Mar 21, 2026 by ranjita-naik

Loading…

fix: zeno_visualize nested output directory discovery

#3647 opened Mar 21, 2026 by komaksym

Loading…

fix: extract \boxed{} from model response in hendrycks_math

#3644 opened Mar 20, 2026 by NezLheimeur

Loading…

[Task] IFBench

#3642 opened Mar 20, 2026 by RawthiL

Loading…

Fix ruff lint failures in models/__init__.py and huggingface.py

#3641 opened Mar 20, 2026 by dzautner

Loading…

1 task

fix: preserve chat_template_args when enable_thinking is None

#3640 opened Mar 19, 2026 by NezLheimeur

Loading…

feat: add TRT-LLM backend.

#3628 opened Mar 11, 2026 by Tracin

Loading…

Previous 1 2 3 4 5 … 8 9 Next

Previous Next

ProTip! Updated in the last three days: updated:>2026-03-31.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!