#

qwen2-vl

Here are 33 public repositories matching this topic...

modelscope / ms-swift

Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen2.5, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, DeepSeek-VL2, Phi4, GOT-OCR2, ...).

Updated Apr 15, 2025
Python

roboflow / maestro

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

transformers vqa objectdetection captioning fine-tuning multimodal vision-and-language phi-3-vision paligemma florence-2 qwen2-vl

Updated Apr 14, 2025
Python

2U1 / Qwen2-VL-Finetune

An open-source implementaion for fine-tuning Qwen2-VL and Qwen2.5-VL series by Alibaba Cloud.

chatbot multimodal vision-language vision-language-model qwen2-vl qwen2-5

Updated Apr 15, 2025
Python

PaddlePaddle / PaddleMIX

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

Updated Apr 15, 2025
Python

NetEase-Media / grps_trtllm

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

openai multi-modal phi function-call qwq ai-agent llm llama-index chatglm internvideo tensorrt-llm qwen2 llama3 minicpm-v internvl2 qwen2-vl deepseek-r1 janus-pro olmocr

Updated Apr 15, 2025
Python

lucasjinreal / Crane

A Pure Rust based LLM (Any LLM based MLLM such as Spark-TTS) Inference Engine, powering by Candle framework.

rust mllm llama-cpp qwen2-vl spark-tts qwen3

Updated Mar 26, 2025
Rust

drive-bench / toolkit

Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives

autonomous-driving chatgpt vision-language-models phi-3 internvl qwen2-vl driving-with-language

Updated Feb 22, 2025
Python

arcstep / illufly

✨🦋 illufly 是自我进化的 Agent 框架: 基于自我进化，快速创造价值

agent ai growth openai multiagent gpt rag llm longtext qwen qwen2 dashscope glm-4 zhipu qwen2-vl illufly

Updated Apr 13, 2025
Python

soulteary / dify-with-qwen-vl

视频理解：千问视频多模态模型 & Dify

dify qwen2 qwen2-vl

Updated Sep 2, 2024
Python

fireicewolf / wd-llm-caption-cli

A Python base cli tool for caption images with WD series, Joy-caption-pre-alpha,meta Llama 3.2 Vision Instruct and Qwen2 VL Instruct models.

image-caption wd14 llama3-vision florence-2 qwen2-vl joy-caption

Updated Mar 18, 2025
Python

see2023 / autoXHS

基于多模态大模型的智能搜索助手，通过AI技术实现小红书平台的智能化信息检索和知识整合|An intelligent search assistant based on multimodal large models, enabling smart information retrieval and knowledge integration on the Xiaohongshu platform.

spider selenium-webdriver xiaohongshu llm qwen2-vl

Updated Nov 6, 2024
Python

shaadclt / Qwen2-VL-OCR-VQA

This project demonstrates how to use the Qwen2-VL model from Hugging Face for Optical Character Recognition (OCR) and Visual Question Answering (VQA). The model combines vision and language capabilities, enabling users to analyze images and generate context-based responses.

optical-character-recognition visual-question-answering qwen2-vl

Updated Oct 18, 2024
Jupyter Notebook

BUAADreamer / Qwen2-VL-History

Qwen2-VL在文旅领域的LLaMA-Factory微调案例 The case for fine-tuning Qwen2-VL in the field of historical literature and museums

beauty museum history supervised-finetuning mllm multimodal-large-language-models llama-factory qwen2-vl

Updated Sep 17, 2024

ZachcZhang / Qwen2-VL-inference

An open-source server implementation for inference Qwen2-VL series model using fastapi.

inference fastapi huggingface mllm qwen2-vl

Updated Nov 20, 2024
Python

Valdanitooooo / chat_with_qwen2_vl_test

qwen2-vl

Updated Dec 27, 2024
Python

Kazuhito00 / Qwen2-VL-Colaboratory-Sample

Colaboratory上でQwenLM/Qwen2-VLをお試しするサンプル

python vlm colaboratory qwen2-vl

Updated Sep 4, 2024
Jupyter Notebook

aws-samples / multi-modal-examples-for-amazon-sagemaker

A workshop for collections of multi-modal LLM examples, samples, reference architecture and demos on Amazon SageMaker.

sagemaker multi-modality sagemaker-example sagemaker-studio llm vllm video-llava internvl2 qwen2-vl

Updated Mar 16, 2025
Jupyter Notebook

Younis-Ahmed / qwen-ai-provider

Community-built Qwen AI Provider for Vercel AI SDK - Integrate Alibaba Cloud's Qwen models with Vercel's AI application framework

ai artificial-intelligence language-model alibaba-cloud vercel generative-ai vercel-ai vercel-ai-sdk qwen qwen-api llm-integration qwen2-vl qwen2-5 ai-provider

Updated Mar 10, 2025
TypeScript

Pavansomisetty21 / Qwen2-Vision-Finetuning-Unsloth---Maths-OCR-Formulae-Extraction-

we finetune unsloth llama model to extract mathematical fomulas in the images with optical character recognition(OCR)

ocr llama maths optical-character-recognition vlm ocr-recognition llm vision-language-model qwen2 unsloth qwen2-vl

Updated Jan 8, 2025
Jupyter Notebook

aws-samples / sample-for-multi-modal-document-to-json-with-sagemaker-ai

This open-source project delivers a complete pipeline for converting multi-page documents (PDFs/images) into structured JSON using Vision LLMs on Amazon SageMaker. The solution leverages the SWIFT Framework to fine-tune models specifically for document understanding tasks.

swift aws llama idp document-processing fine-tuning multimodal sagemaker sft huggingface qwen2-vl

Updated Mar 22, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the qwen2-vl topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the qwen2-vl topic, visit your repo's landing page and select "manage topics."