The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
-
Updated
Sep 25, 2025
The Paper List of Large Multi-Modality Model (Perception, Generation, Unification), Parameter-Efficient Finetuning, Vision-Language Pretraining, Conventional Image-Text Matching for Preliminary Insight.
✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).
This is an official repository for "Harnessing Vision Models for Time Series Analysis: A Survey".
Project Page for Paper "Neural Brain: A Neuroscience-inspired Framework for Embodied Agents".
SafeSora is a human preference dataset designed to support safety alignment research in the text-to-video generation field, aiming to enhance the helpfulness and harmlessness of Large Vision Models (LVMs).
The official implementation of "Diversity-Guided MLP Reduction for Efficient Large Vision Transformers"
These notes and resources are compiled from the crash course Prompt Engineering for Vision Models offered by DeepLearning.AI.
Add a description, image, and links to the large-vision-models topic page so that developers can more easily learn about it.
To associate your repository with the large-vision-models topic, visit your repo's landing page and select "manage topics."