A Collection of Papers and Codes for CVPR2025/CVPR2024/ECCV2024 AIGC
-
Updated
Jan 23, 2025
A Collection of Papers and Codes for CVPR2025/CVPR2024/ECCV2024 AIGC
This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension"
[AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding
🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".
📖Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.
📖Curated list about reasoning abilitiy of MLLM, including OpenAI o1, OpenAI o3-mini, and Slow-Thinking.
Add a description, image, and links to the multi-modal-large-language-model topic page so that developers can more easily learn about it.
To associate your repository with the multi-modal-large-language-model topic, visit your repo's landing page and select "manage topics."