Skip to content

Commit e11d83f

Browse files
authored
Update README.md
1 parent b6e8842 commit e11d83f

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,7 @@ QA is used in many vertical domains, see Vertical section bellow
284284
### Multi-Modal
285285
- LVLM-EHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models, Nov 2024, [IEEE](https://ieeexplore.ieee.org/abstract/document/10769058)
286286
- ScImage: How Good Are Multimodal Large Language Models at Scientific Text-to-Image Generation?, Dec 2024, [arxiv](https://arxiv.org/abs/2412.02368)
287+
- RealWorldQA, Apr 2024, [HuggingFace](https://huggingface.co/blog/KennyUTC/realworldqa)
287288
- Image2Struct: Benchmarking Structure Extraction for Vision-Language Models, Oct 2024, [arxiv](https://arxiv.org/abs/2410.22456)
288289
- MMBench: Is Your Multi-modal Model an All-Around Player?, Oct 2024 [springer ECCV 2024](https://link.springer.com/chapter/10.1007/978-3-031-72658-3_13)
289290
- MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models, Oct 2024, [arxiv](https://arxiv.org/abs/2410.10139)
@@ -433,6 +434,8 @@ QA is used in many vertical domains, see Vertical section bellow
433434
---
434435
### Various unclassified tasks
435436
(TODO as there are more than three papers per class, make a class a separate chapter in this Compendium)
437+
- Holmes ⌕ A Benchmark to Assess the Linguistic Competence of Language Models , Dec 2024, [MIT Press Transactions of ACL, 2024](https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00718/125534)
438+
- EscapeBench: Pushing Language Models to Think Outside the Box, Dec 2024, [arxiv](https://arxiv.org/abs/2412.13549)
436439
- Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making, Oct 2024, [arxiv](https://arxiv.org/abs/2410.07166)
437440
- Benchmarking Vision, Language, & Action Models on Robotic Learning Tasks, Nov 2024, [arxiv](https://arxiv.org/abs/2411.05821)
438441
- Evaluating Superhuman Models with Consistency Checks, Apr 2024, [IEEE](https://ieeexplore.ieee.org/abstract/document/10516635)

0 commit comments

Comments
 (0)