π€π‘ LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context
"It's not like finding a needle in a haystack, it is like creating new needles."
π Leaderboard: http://liveideabench.com π‘
We are excited to announce that the latest dataset, including supplementary tests for models like deepseek-R1, deepseek-V3, minimax-01, phi-4, and Opus, has been uploaded to Hugging Face! π
Check it out here: https://huggingface.co/datasets/6cf/liveideabench-DLC-250127
@article{ruan2024liveideabench,
title={LiveIdeaBench: Evaluating LLMs' Scientific Creativity and Idea Generation with Minimal Context},
author={Kai Ruan and Xuan Wang and Jixiang Hong and Peng Wang and Yang Liu and Hao Sun},
journal={arXiv preprint arXiv:2412.17596},
year={2024}
}