Show Lab

All

92 repositories

Awesome-Video-Diffusion
Public
A curated list of recent diffusion models for video generation, editing, and various other applications.
awesome video-editing video-generation diffusion-models motion-customization video-generation-evaluation
259•4.4k•1•0•Updated May 17, 2025May 17, 2025
livecc
Public
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025)
Python
•28•195•5•0•Updated May 16, 2025May 16, 2025
DoraCycle
Public
[CVPR 2025] DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
1•22•2•0•Updated May 13, 2025May 13, 2025
Awesome-Robotics-Diffusion
Public
A curated list of recent robot learning papers incorporating diffusion models for robotics tasks.
5•161•0•0•Updated May 1, 2025May 1, 2025
Exo2Ego-V
Public
Python
•
Apache License 2.0
•0•42•2•0•Updated Apr 28, 2025Apr 28, 2025
Show-o
Public
[ICLR 2025] Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
multimodal diffusion-models large-language-models
Python
•
Apache License 2.0
•60•1.4k•43•2•Updated Apr 28, 2025Apr 28, 2025
omg
Public
Open Multimodal Gathering workshop @ NUS
JavaScript
•0•0•0•0•Updated Apr 28, 2025Apr 28, 2025
PhotoDoodle
Public
Code Implementation of "PhotoDoodle: Learning Artistic Image Editing from Few-Shot Pairwise Data"
Python
•
MIT License
•26•386•8•1•Updated Apr 23, 2025Apr 23, 2025
videollm-online
Public
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Python
•
Apache License 2.0
•47•463•27•0•Updated Apr 23, 2025Apr 23, 2025
FAR
Public
Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"
Python
•
MIT License
•8•203•0•0•Updated Apr 23, 2025Apr 23, 2025
ROICtrl
Public
Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation
Python
•0•108•2•0•Updated Apr 16, 2025Apr 16, 2025
computer_use_ootb
Public
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
Python
•
Apache License 2.0
•156•1.6k•30•6•Updated Apr 15, 2025Apr 15, 2025
GUI-Thinker
Public
Enable AI to control your PC. This repo includes the WorldGUI Benchmark and GUI-Thinker Agent Framework.
gui-application agents large-multimodal-models gui-agent
Python
•6•67•1•0•Updated Apr 11, 2025Apr 11, 2025
Awesome-MLLM-Hallucination
Public
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
25•686•1•0•Updated Apr 9, 2025Apr 9, 2025
Awesome-Unified-Multimodal-Models
Public
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
26•546•1•0•Updated Apr 9, 2025Apr 9, 2025
GUI-Narrator
Public
Repository of GUI Action Narrator
JavaScript
•0•10•0•0•Updated Apr 8, 2025Apr 8, 2025
VideoGUI
Public
[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
gui video-language llm-agent
JavaScript
•2•35•0•0•Updated Apr 7, 2025Apr 7, 2025
SMS
Public
Balanced Image Stylization with Style Matching Score
1•29•0•0•Updated Apr 2, 2025Apr 2, 2025
LayerTracer
Public
Official code of "LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer"
Python
•
MIT License
•3•49•4•0•Updated Apr 1, 2025Apr 1, 2025
MakeAnything
Public
Official code of "MakeAnything: Harnessing Diffusion Transformers for Multi-Domain Procedural Sequence Generation"
Python
•
MIT License
•9•181•3•0•Updated Apr 1, 2025Apr 1, 2025
FQGAN
Public
FQGAN: Factorized Visual Tokenization and Generation
Python
•
Other
•2•50•0•0•Updated Mar 29, 2025Mar 29, 2025
MovieAgent
Public
MovieAgent: Automated Movie Generation via Multi-Agent CoT Planning
Python
•23•194•7•0•Updated Mar 26, 2025Mar 26, 2025
SAM-I2V
Public
Apache License 2.0
•0•2•1•0•Updated Mar 22, 2025Mar 22, 2025
LOVA3
Public
(NeurIPS 2024) Official PyTorch implementation of LOVA3
benchmark visual-question-answering visual-question-generation multimodal-large-language-models large-multimodal-models
Python
•2•84•0•0•Updated Mar 21, 2025Mar 21, 2025
Impossible-Videos
Public
Python
•6•67•1•0•Updated Mar 20, 2025Mar 20, 2025
MovieBench
Public
[CVPR 2025] A Hierarchical Movie Level Dataset for Long Video Generation
Python
•2•57•0•0•Updated Mar 16, 2025Mar 16, 2025
ShowUI
Public
[CVPR 2025] Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
agent vision-language-model vision-language-action computer-use gui-agent
Python
•
Apache License 2.0
•83•1.2k•8•0•Updated Mar 13, 2025Mar 13, 2025
TPDiff
Public
TPDiff: Temporal Pyramid Video Diffusion Model
2•19•1•0•Updated Mar 13, 2025Mar 13, 2025
VLog
Public
[CVPR 2025] Video Narration as Vocabulary & Video as Long Document
vocabulary whisper video-language chatgpt langchain large-language-model
Python
•28•567•8•0•Updated Mar 13, 2025Mar 13, 2025
MovieSeq
Public
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
Jupyter Notebook
•1•39•1•0•Updated Mar 11, 2025Mar 11, 2025