#

video-language

Here are 26 public repositories matching this topic...

showlab / VLog

[CVPR 2025] Video Narration as Vocabulary & Video as Long Document

vocabulary whisper video-language chatgpt langchain large-language-model

Updated Mar 13, 2025
Python

microsoft / UniVL

An official implementation for " UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation"

video localization caption alignment segmentation coin multimodality joint multimodal-sentiment-analysis pretrain pretraining msrvtt video-text-retrieval video-text video-language youcookii retrieval-task caption-task

Updated Jul 25, 2024
Python

showlab / UniVTG

[ICCV 2023] UniVTG: Towards Unified Video-Language Temporal Grounding

video-summarization video-grounding pretraining moment-retrieval highlight-detection video-language

Updated May 8, 2024
Python

wjn922 / ReferFormer

[CVPR2022] Official Implementation of ReferFormer

video-language referring-video-object-segmentation

Updated Feb 15, 2025
Python

showlab / all-in-one

[CVPR2023] All in One: Exploring Unified Video-Language Pre-training

pytorch codebase pre-training video-language

Updated Mar 25, 2023
Python

showlab / EgoVLP

[NeurIPS 2022] Egocentric Video-Language Pretraining

pytorch pretraining video-language egocentric-vision

Updated May 9, 2024
Python

junchen14 / Multi-Modal-Transformer

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.

language multi-modal image-transformer vision-transformer video-language efficiency-transformer video-transformer mlp-mixer transformer-readling-list multi-modal-cvpr2021

Updated Aug 27, 2022

salesforce / ALPRO

Align and Prompt: Video-and-Language Pre-training with Entity Prompts

representation-learning vision-and-language video-question-answering video-text-retrieval video-language prompt-learning

Updated Sep 20, 2022
Python

bytedance / Shot2Story

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.

benchmark research video-summarization dataset video-captioning video-story vision-language video-question-answering video-language large-language-models video-language-pretraining video-story-generation

Updated Jan 30, 2025
Python

MikeWangWZHL / VidIL

Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners

clip blip msvd youcook2 vision-language msrvtt vatex gpt-3 video-language vlep

Updated Sep 15, 2022
Python

TheShadow29 / VidSitu

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

nlp video vision srl captioning captioning-videos vision-and-language grounding video-language event-relations semantic-roles

Updated Aug 17, 2021
Python

liveseongho / Awesome-Video-Language-Understanding

A Survey on video and language understanding.

machine-learning deep-learning paper dataset multimodal-deep-learning awesome-papers video-language video-language-pretraining video-language-understanding

Updated Apr 21, 2023

showlab / Region_Learner

The Pytorch implementation for "Video-Text Pre-training with Learned Regions"

video-language

Updated Jul 15, 2022
Python

willyfh / awesome-video-text-datasets

A curated list of video-text datasets in a variety of languages. These datasets can be used for video captioning (video description) or video retrieval.

dataset video-captioning video-to-text video-retrieval video-description vision-language video-text video-language

Updated Feb 18, 2024

showlab / VideoGUI

[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos

gui video-language llm-agent

Updated Mar 8, 2025
JavaScript

zinengtang / Perceiver_VL

PyTorch code for "Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention" (WACV 2023)

retrieval scalability efficiency vision-and-language video-language

Updated Feb 5, 2023
Python

bigai-nlco / VideoTGB

[EMNLP 2024] A Video Chat Agent with Temporal Prior

spatial-temporal video-language llm mllm visual-instruction-tuning multimodal-large-language-models

Updated Mar 2, 2025
Python

zjr2000 / GVL

Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos

pytorch representation-learning pytorch-implementation dense-video-captioning video-grounding video-language temporal-localization long-video-understanding

Updated Dec 8, 2023
Python

waybarrios / guidance-based-video-grounding

[ICCV 2023] The official PyTorch implementation of the paper: "Localizing Moments in Long Video Via Multimodal Guidance"

pytorch multimodal-learning accepted-papers moment-retrieval video-language iccv2023

Updated Sep 26, 2024
Python

JerryYLi / svitt

Code for CVPR 2023 paper "SViTT: Temporal Learning of Sparse Video-Text Transformers"

vision-language video-language

Updated Jun 16, 2023
Python

Improve this page

Add a description, image, and links to the video-language topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the video-language topic, visit your repo's landing page and select "manage topics."