Benchmarking framework for measuring energy consumption and performance of generative AI models like Large Language Models (LLMs), Multimodal LLMs (MLLMs), and Diffusion models.
You can browse The ML.ENERGY Leaderboard for the latest benchmarking results.
- Overview: Tasks, datasets, runtime
- Data Preparation: Downloading necessary datasets and processing them
- Running Benchmarks: Job generation and manual execution
- Analyzing Results: Analyzing and understanding benchmarking results
@inproceedings{mlenergy-neuripsdb25,
title={The {ML.ENERGY Benchmark}: Toward Automated Inference Energy Measurement and Optimization},
author={Jae-Won Chung and Jeff J. Ma and Ruofan Wu and Jiachen Liu and Oh Jun Kweon and Yuxuan Xia and Zhiyu Wu and Mosharaf Chowdhury},
year={2025},
booktitle={NeurIPS Datasets and Benchmarks},
}