mutimodal
Here are 9 public repositories matching this topic...
基于Qwen Agent框架,融合JAKA机械臂、视觉检测、语音识别与合成、MCP数据库的多模态大模型
-
Updated
May 26, 2025 - Python
Gaze-Guided Learning: Avoiding Shortcut Bias in Visual Classification
-
Updated
Apr 15, 2025 - Python
"A private, local OCR solution using Meta's Llama 3.2 Vision model with a Streamlit interface. Processes images entirely offline, supporting formats like JPEG, PNG, and BMP.
-
Updated
Nov 21, 2024 - Python
Gemini 2 Pro app for Image, Audio, and Document understanding + Code Execution.
-
Updated
Feb 9, 2025 - Python
A multimodal RAG application using Qwen 2.5 VL, ColPali, and QdrantDB for text and image-based retrieval.
-
Updated
Mar 20, 2025 - Jupyter Notebook
明康慧医(MKTY)——基于LLM与多模态人工智能的健康管理与辅助诊疗系统设计与实现。(明康慧医智慧医疗系统)该项目已用于齐鲁工业大学(山东省科学院)计算机学部2025年毕业设计。项目作者:杜宇 @duyu09, 电子邮箱: qluduyu09@163.com [Source code of Design and Implementation of MINH KHỎE TUỆ Y - A Health Management and Assisted Diagnosis System Based on LLM and Multimodal Artificial Intelligence. (Minh Khoe Tue Y Smart Healthcare System)]
-
Updated
Jul 12, 2025 - Vue
QD-RetNet: Efficient Retinal Disease Classification via Quantized Knowledge Distillation [MIUA-2025]
-
Updated
Jul 20, 2025 - Python
Improve this page
Add a description, image, and links to the mutimodal topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the mutimodal topic, visit your repo's landing page and select "manage topics."