lm-evaluation

Here are 8 public repositories matching this topic...

IAAR-Shanghai / xFinder

[ICLR 2025] xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation

Updated Nov 14, 2025
Python

bethgelab / CiteME

CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.

lm-evaluation citation-attribution citation-dataset

Updated Nov 3, 2025
Python

hitz-zentroa / latxa

Star

Latxa: An Open Language Model and Evaluation Suite for Basque

evaluation language-model basque huggingface gpt-neox llm lm-evaluation latxa

Updated Dec 15, 2025
Shell

RulinShao / RAG-evaluation-harnesses

Star

An evaluation suite for Retrieval-Augmented Generation (RAG).

evaluation rag retrieval-augmented-generation lm-evaluation

Updated Apr 26, 2025
Python

vincentzed / decon

Star

`decon`, but with python API binding.

nlp data-science benchmark evaluation data-processing deduplication data-pipeline synthetic-data pretraining llm llms instruction-tuning llm-eval llm-evaluation datacomp lm-evaluation decontaminate

Updated Jan 9, 2026
Rust

sangstar / comparator

Star

Evaluate models and compare their scores

benchmarking machine-learning ai evaluation evaluation-metrics lm-evaluation-harness lm-evaluation

Updated Nov 22, 2025
C++

Blue-No1 / evaluation-metrics-v2

Star

Practical eval: accuracy, perplexity, simple task probes; harness cmds.

metrics evaluation benchmarks llm lm-evaluation

Updated Sep 6, 2025
Python

SYusupov / LogicGPT

Star

LLM Model: Fine-tuning, Evaluation, Containerization, Deployment, CI/CD Pipeline

deployment ci-cd containerization llm llm-finetuning mistral-7b lm-evaluation

Updated Sep 26, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the lm-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the lm-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lm-evaluation

Here are 8 public repositories matching this topic...

IAAR-Shanghai / xFinder

bethgelab / CiteME

hitz-zentroa / latxa

RulinShao / RAG-evaluation-harnesses

vincentzed / decon

sangstar / comparator

Blue-No1 / evaluation-metrics-v2

SYusupov / LogicGPT

Improve this page

Add this topic to your repo