Open-LLM-Leaderboard: Open-Style Question Evaluation. Paper at https://arxiv.org/abs/2406.07545
-
Updated
Jun 27, 2024 - Python
Open-LLM-Leaderboard: Open-Style Question Evaluation. Paper at https://arxiv.org/abs/2406.07545
A technical guide and live-tracking repository for the world's top AI models, specialized by coding, reasoning, and multimodal performance.
119 AI models × 55 benchmarks with per-score freshness dates, auto-updated pricing, task routing. Every score has a date and source URL. Daily CI.
WordleBench — Deterministic AI Wordle benchmark. Compare 34+ LLMs (GPT-5, Claude 4.5, Gemini, Grok, Llama) head-to-head on accuracy, speed, and cost across 50 standardized words.
E/RP benchmark leaderboard for LLMs
Add a description, image, and links to the llm-leaderboard topic page so that developers can more easily learn about it.
To associate your repository with the llm-leaderboard topic, visit your repo's landing page and select "manage topics."