[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
-
Updated
Jul 16, 2025 - Python
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
[NeurIPS 2024] An official implementation of ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
[CVPR'25 highlight] RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness
A most Frontend Collection and survey of vision-language model papers, and models GitHub repository. Continuous updates.
[CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
Try openai assistant api apps on Google Colab for free. Awesome assistant API Demos!
🚀 gpt_pdf_md: Convert PDF to Markdown with GPT-4V & more. Extract images, upload to Google Cloud, & generate Markdown with images. Python, GPT-4V Vision, Scala. Ideal for developers, researchers. PDF to Markdown, GPT-4V, image extraction, Python package
[NeurIPS'24] SpatialEval: a benchmark to evaluate spatial reasoning abilities of MLLMs and LLMs
Code for ICLR'24 workshop ME-FoMo-How Well Does GPT-4V(ision) Adapt to Distribution Shifts? A Preliminary Investigation
GPT-4V(ision) module for use with Autodistill.
WordPress plugin that leverages OpenAI's Vision API to automatically generate descriptive alt text for images, enhancing accessibility and SEO.
ShareGPT4Omni: Towards Building Omni Large Multi-modal Models with Comprehensive Multi-modal Annotations
How a Picture of Car Damage Can File Your Insurance Claim
Your own personal Ruskin.
Chatbot that comprehends uploaded images and engages in detailed conversations about their content.
How well do the GPT-4V, Gemini Pro Vision, and Claude 3 Opus models perform zero-shot vision tasks on data structures?
Add a description, image, and links to the gpt-4v topic page so that developers can more easily learn about it.
To associate your repository with the gpt-4v topic, visit your repo's landing page and select "manage topics."