Open-Vocabulary Segmentation with Semantic-Assisted Calibration (CVPR 2024) [Paper]
VideoGrounding-DINO: Towards Open-Vocabulary Spatio-Temporal Video Grounding (CVPR 2024) [Paper]
Open-Vocabulary Segmentation with Semantic-Assisted Calibration (CVPR 2024) [Paper]
Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships (CVPR 2024) [Paper]
Exploring Region-Word Alignment in Built-in Detector for Open-Vocabulary Object Detection (CVPR 2024) [Paper]
Open Vocabulary Semantic Scene Sketch Understanding (CVPR 2024) [Paper]
USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation (CVPR 2024) [Paper]
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models (CVPR 2024) [Paper]
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection (CVPR 2024) [Paper]
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation (CVPR 2024) [Paper]
Open-Vocabulary 3D Semantic Segmentation with Foundation Models (CVPR 2024) [Paper]
OVER-NAV: Elevating Iterative Vision-and-Language Navigation with Open-Vocabulary Detection and StructurEd Representation (CVPR 2024) [Paper]
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection (CVPR 2024) [Paper]
MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation (CVPR 2024) [Paper]
Open-Vocabulary Video Anomaly Detection (CVPR 2024) [Paper]
CLIP-Driven Open-Vocabulary 3D Scene Graph Generation via Cross-Modality Contrastive Learning (CVPR 2024) [Paper]
OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers (CVPR 2024) [Paper]
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies (CVPR 2024) [Paper]
OVFoodSeg: Elevating Open-Vocabulary Food Image Segmentation via Image-Informed Textual Representation (CVPR 2024) [Paper]
GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields (CVPR 2024) [Paper]
Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models (CVPR 2024) [Paper]
The Devil is in the Fine-Grained Details: Evaluating Open-Vocabulary Object Detectors for Fine-Grained Understanding (CVPR 2024) [Paper]
Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations (CVPR 2024) [Paper]
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding (CVPR 2024) [Paper]
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation (CVPR 2024) [Paper]
Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation (CVPR 2024) [Paper]
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance (CVPR 2024) [Paper] (CVPR 2024)
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing (CVPR 2024) [Paper] (CVPR 2024)
Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection (CVPR 2024) [Paper] (CVPR 2024)
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents (CVPR 2024) [Paper] (CVPR 2024)
Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection (CVPR 2024) [Paper] (CVPR 2024)
Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation (CVPR 2024) [Paper] (CVPR 2024)
YOLO-World: Real-Time Open-Vocabulary Object Detection (CVPR 2024) [Paper] (CVPR 2024)
Open-Vocabulary Object 6D Pose Estimation (CVPR 2024) [Paper]
Taming Self-Training for Open-Vocabulary Object Detection (CVPR 2024) [Paper]
OVMR: Open-Vocabulary Recognition with Multi-Modal References (CVPR 2024) [Paper]
Retrieval-Augmented Open-Vocabulary Object Detection (CVPR 2024) [Paper]
Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding (CVPR 2024) [Paper]
Transferable and Principled Efficiency for Open-Vocabulary Segmentation (CVPR 2024) [Paper]
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models (CVPR 2024) [Paper]
Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection (CVPR 2024) [Paper]
OpenScene: 3D Scene Understanding with Open Vocabularies (CVPR 2023) [Paper]
Open-Vocabulary Point-Cloud Object Detection without 3D Annotation (CVPR 2023) [Paper]
Learning to Generate Language-supervised and Open-vocabulary Scene Graph using Pre-trained Visual-Semantic Space (CVPR 2023) [Paper]
Side Adapter Network for Open-Vocabulary Semantic Segmentation (CVPR 2023) [Paper]
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models (CVPR 2023) [Paper]
Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations (CVPR 2023) [Paper]
Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP (CVPR 2023) [Paper]
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers (CVPR 2023) [Paper]
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection (CVPR 2023) [Paper]
Aligning Bag of Regions for Open-Vocabulary Object Detection (CVPR 2023) [Paper]
Open-set Fine-grained Retrieval via Prompting Vision-Language Evaluator (CVPR 2023) [Paper]
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning (CVPR 2023) [Paper]
FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation (CVPR 2023) [Paper]
GLIGEN: Open-Set Grounded Text-to-Image Generation (CVPR 2023) [Paper]
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment (CVPR 2023) [Paper]
OvarNet: Towards Open-vocabulary Object Attribute Recognition (CVPR 2023) [Paper]
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding (CVPR 2023) [Paper]
Open-vocabulary Attribute Detection (CVPR 2023) [Paper]
Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision (CVPR 2023) [Paper]
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs (CVPR 2023) [Paper]
CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching (CVPR 2023) [Paper]
OVTrack: Open-Vocabulary Multiple Object Tracking (CVPR 2023) [Paper]
Learning to Detect and Segment for Open Vocabulary Object Detection (CVPR 2023) [Paper]
Learning to Detect and Segment for Open Vocabulary Object Detection (CVPR 2023) [Paper]
- Open-vocabulary Object Detection via Vision and Language Knowledge Distillation (ICLR 2023)
Datasets: LVIS, PASCAL VOC, COCO, Objects365
Task: Object Detection
Global Knowledge Calibration for Fast Open-Vocabulary Segmentation (ICCV 2023) [Paper]
Open-vocabulary Panoptic Segmentation with Embedding Modulation (ICCV 2023) [Paper]
SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation (ICML 2023) [Paper]
Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization (ICML 2023) [Paper]
Open-Vocabulary Universal Image Segmentation with MaskCLIP (ICML 2023) [Paper]
Multi-Modal Classifiers for Open-Vocabulary Object Detection (ICML 2023) [Paper]
- Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models (CVPRw 2023) [Paper]
A Language-Guided Benchmark for Weakly Supervised Open Vocabulary Semantic Segmentation (Arxiv 2023) [Paper]
Aligning Bag of Regions for Open-Vocabulary Object Detection (Arxiv 2023) [Paper]
From Occlusion to Insight: Object Search in Semantic Shelves using Large Language Models (Arxiv 2023) [Paper]
Side Adapter Network for Open-Vocabulary Semantic Segmentation (Arxiv 2023) [Paper]
CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets (Arxiv 2023) [Paper]
Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation (CVPR 2022) [Paper] [Code]
Datasets: MS COCO
Task: Object Detection -
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling (CVPR 2022) [Paper] [Code]
Datasets: MS-COCO, Open Images, Conceptual Caption
Task: Instance segmentation -
Learning to Prompt for Open-Vocabulary Object Detection with Vision-Language Model (CVPR 2022) [Paper] [Code]
Datasets: LVIS v1, Pascal VOC Dataset, COCO, Objects365 Dataset
Task: Object detection and instance segmentation -
NOC-REK: Novel Object Captioning With Retrieved Vocabulary From External Knowledge (CVPR 2022) [Paper]
Datasets: COCO, Nocaps
Task: Novel Object Captioning
Patching open-vocabulary models by interpolating weights (NeurIPS 2022) [Paper] [Code]
Datasets: Cars, DTD, EuroSAT, GTSRB, KITTI, MNIST, RESISC45, SUN397, and SVHN. We use the remaining tasks as supported tasks: CIFAR10, CIFAR100, Food101, ImageNet, and STL10
Task: Model Patching -
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection (NeurIPS 2022) [Paper] [Code]
Datasets: COCO, LVIS v1.0, OpenImages, Objects365
Task: Object Detection -
Paraphrasing Is All You Need for Novel Object Captioning (NeurIPS 2022) [Paper]
Datasets: Open Images V4, COCO Captions 2017
Task: Image Captioning
PromptDet: Towards Open-vocabulary Detection using Uncurated Images (ECCV 2022) [Paper] [Code]
Datasets: LVIS, LAION-400M and LAION-Novel, COCO
Task: Object Detection -
Scaling Open-vocabulary Image Segmentation with Image-level Labels (ECCV 2022) [Paper]
Datasets: COCO, Localized Narrative (Loc. Narr.) test: PASCAL Context, PASCAL VOC, ADE20k
Task: Instance segmentation -
Towards Open-vocabulary Scene Graph Generation with Prompt-based Finetuning (ECCV 2022) [Paper]
Datasets: Visual Genome(VG), GQA, Open-Image
Task: Scene Graph Generation -
Simple Open-Vocabulary Object Detection with Vision Transformers (ECCV 2022) [Paper] [Code]
Datasets: OpenImages V4 (OI), Objects 365 (O365),and/or Visual Genome (VG) - Evaluation: COCO, LVIS, and O365
Task: Object Detection -
Open Vocabulary Object Detection with Pseudo Bounding-Box Labels (ECCV 2022) [Paper] [Code]
Datasets: COCO Caption, Visual-Genome, and SBU Caption (Object names: COCO, PASCAL VOC, Objects365 and LVIS)
Task: Object Detection -
Open-Vocabulary DETR with Conditional Matching (ECCV 2022 Oral) [Paper] [Code]
Datasets: LVIS, COCO
Task: Object Detection -
Improving Closed and Open-Vocabulary Attribute Prediction using Transformers (ECCV 2022) [Paper] [Code]
Datasets: VAW (closed-set) LSA common, LSA common→rare, HICO
Task: Attribute Prediction -
A Simple Baseline for Open Vocabulary Semantic Segmentation with Pre-trained Vision-language Model (ECCV 2022) [Paper] [Code]
Datasets: COCO Stuff; Pascal VOC 2012; Cityscapes; Pascal Context; ADE20K
Task: Semantic Segmentation -
A Dataset for Interactive Vision-Language Navigation with Unknown Command Feasibility (ECCV 2022) [Paper] [Code]
Datasets: MoTIF
Task: Vision-Language Navigation (Apps) -
Acknowledging the Unknown for Multi-label Learning with Single Positive Labels (ECCV 2022) [Paper] [Code]
Datasets: PASCAL VOC 2012 (VOC), MS-COCO 2014 (COCO), NUS-WIDE (NUS), and CUB-200-2011 (CUB)
Task: Single Positive Multi-label Learning
OVIS: Open-Vocabulary Visual Instance Search via Visual-Semantic Aligned Representation Learning (AAAI 2022) [Paper]
Datasets: OVIS40; OVIS1600
Task: Visual Instance Search -
Open Vocabulary Electroencephalography-to-Text Decoding and Zero-Shot Sentiment Classification (AAAI 2022) [Paper] [Code]
Datasets: ZuCo
Task: Brain Signals Language Decoding
From Node To Graph: Joint Reasoning on Visual-Semantic Relational Graph for Zero-Shot Detection (WACV 2022) [Paper] [Code]
Datasets: MSCOCO
Task: Object Detection -
Trading-Off Information Modalities in Zero-Shot Classification (WACV 2022) [Paper] [Code]
Datasets: Caltech UCSD Birds 200-2011 (CUB), Animals with Attributes 1 and 2 (AWA1 & AWA2), attribute Pascal & Yahoo (APY), SUN attributes (SUN) and Oxford flowers (FLO)
Task: Image Classification
Partially-Supervised Novel Object Captioning Using Context from Paired Data (BMVC 2022) [Paper]
Datasets: MS COCO
Task: Object Captioning -
Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models (BMVC 2022) [Paper] [Code]
Datasets: : PASCAL-5i, COCO-20i, FSS-1000, Mosaic-4
Task: Semantic Segmentation
- Describing Sets of Images with Textual-PCA (EMNLP 2022)
Datasets: CelebA; Stanford Cars; COCO-Horses; LSUN-Church
Task: Text Generation for Sets of Images
- Open-Vocabulary Object Detection Using Captions (CVPR 2021)
Datasets: COCO Objects, COCO Captions
Task: Object Detection
A Latent Morphology Model for Open-Vocabulary Neural Machine Translation (ICLR 2020 Spotlight) [Paper] [Code]
Datasets: Arabic (AR), Czech (CS) and Turkish (TR)
Task: Neural Machine Translation -
Open Vocabulary Learning on Source Code with a Graph-Structured Cache (ICML 2019) [Paper]
Datasets: Java source code
Task: Java source code Learning -
Visual Question Generation for Class Acquisition of Unknown Objects (ECCV 2018) [Paper] [Code]
Datasets: Visual Genome, ILSVRC2012, ILSVRC2010, WordNet
Task: Visual Question Generation, Object Detection -
Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input (ECCV 2018) [Paper] [Code]
Datasets: Places Audio Caption, ADE20k, MSCOCO
Task: Audio-Visual Associative Localizations -
Image Captioning with Unseen Objects (BMVC 2019) [Paper]
Datasets: COCO
Task: Image Captioning -
nocaps: novel object captioning at scale (ICCV 2019) [Paper] [Code]
Datasets: nocaps, COCO Captions
Task: Image Captioning -
Pointing Novel Objects in Image Captioning (CVPR 2019) [Paper]
Datasets: held-out COCO, ImageNet
Task: Image Captioning -
Learning User Representations for Open Vocabulary Image Hashtag Prediction (CVPR 2020) [Paper]
Datasets: YFCC100M
Task: Image Hashtag Prediction -
Open-Edit: Open-Domain Image Manipulation with Open-Vocabulary Instructions (ECCV 2020) [Paper] [Code]
Datasets: BSDS500, Conceptual Captions
Task: Image Manipulation
Open-Category Human-Object Interaction Pre-training via Language Modeling Framework (CVPR 2023) [Paper]
Being Comes from Not-being: Open-vocabulary Text-to-Motion Generation with Wordless Training (CVPR 2023) [Paper]
OVTrack: Open-Vocabulary Multiple Object Tracking (CVPR 2023) [Paper]
- The Devil is in the Wrongly-classified Samples: Towards Unified Open-set Recognition (ICLR 2023)
Datasets: CIFAR100, LSUN, MiTv2, UCF101, HMDB51
Task: Image and Video Classification
- Open-VCLIP: Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization (ICML 2023) [Paper]
Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization (Arxiv 2023) [Paper]
TagCLIP: Improving Discrimination Ability of Open-Vocabulary Semantic Segmentation (Arxiv 2023) [Paper]
MVP-SEG: Multi-View Prompt Learning for Open-Vocabulary Semantic Segmentation (Arxiv 2023) [Paper]
Segment Everything Everywhere All at Once (Arxiv 2023) [Paper]
Towards Open-Vocabulary Video Instance Segmentation (Arxiv 2023) [Paper]
CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks (Arxiv 2023) [Paper]
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition (Arxiv 2023) [Paper]
V3Det: Vast Vocabulary Visual Detection Dataset (Arxiv 2023) [Paper]
Token Merging for Fast Stable Diffusion (Arxiv 2023) [Paper]
Going Beyond Nouns With Vision & Language Models Using Synthetic Data (Arxiv 2023) [Paper]
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks (Arxiv 2023) [Paper]
ZBS: Zero-shot Background Subtraction via Instance-level Background Modeling and Foreground Selection (CVPR 2023) [Paper]
Prompt-Guided Transformers for End-to-End Open-Vocabulary Object Detection (Arxiv 2023) [Paper]
Three ways to improve feature alignment for open vocabulary detection (Arxiv 2023) [Paper]
Zero-guidance Segmentation Using Zero Segment Labels (Arxiv 2023) [Paper]
Open-Vocabulary Object Detection using Pseudo Caption Labels (Arxiv 2023) [Paper]
Uni-Fusion: Universal Continuous Mapping (Arxiv 2023) [Paper]
- Open-Vocabulary Temporal Action Detection with Off-the-Shelf Image-Text Features (Arxiv 2022) [Paper]