A curated list of image captioning and related area. :-)
Please feel free to send me pull requests or email (zhjohnchan@gmail.com) to add links. Markdown format:
- [Paper Name](link) - Author 1 et al, `Conference Year`. [[code]](link)
- Nov.13 NeurIPS'18 and AAAI'19 papers updated!
- Dec.04 More implementations updated!
- Mar.04 Image captioning challenge updated!
- Mar.13 CVPR'19 paper updated!
- Apr.28 more CVPR'19 papers updated!
- A Comprehensive Survey of Deep Learning for Image Captioning - Hossain M et al,
arXiv preprint 2018
.
- I2t: Image parsing to text description - Yao B Z et al,
P IEEE 2011
. - Im2Text: Describing Images Using 1 Million Captioned Photographs - Ordonez V et al,
NIPS 2011
. [project web] - Deep Captioning with Multimodal Recurrent Neural Networks - Mao J et al,
arXiv preprint 2014
.
- Show and Tell: A Neural Image Caption Generator - Vinyals O et al,
CVPR 2015
. [code] [code] - Deep Visual-Semantic Alignments for Generating Image Descriptions - Karpathy A et al,
CVPR 2015
. [project web] [code] - Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation - Chen X et al,
CVPR 2015
. - Long-term Recurrent Convolutional Networks for Visual Recognition and Description - Donahue J et al,
CVPR 2015
. [code] [project web] - Guiding the Long-Short Term Memory Model for Image Caption Generation - Jia X et al,
ICCV 2015
. - Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images - Mao J et al,
ICCV 2015
. [code] - Expressing an Image Stream with a Sequence of Natural Sentences - Park C C et al,
NIPS 2015
. [code] - Show, Attend and Tell: Neural Image Caption Generation with Visual Attention - Xu K et al,
ICML 2015
. [project] [code] [code] - Order-Embeddings of Images and Language - Vendrov I et al,
arXiv preprint 2015
. [code] - Generating Images from Captions with Attention - Mansimov E et al,
arXiv preprint 2015
. [code] - Learning FRAME Models Using CNN Filters for Knowledge Visualization - Lu Y, et al,
arXiv preprint 2015
. [code] - Aligning where to see and what to tell: image caption with region-based attention and scene factorization - Jin J et al,
arXiv preprint 2015
.
- Image captioning with semantic attention - You Q et al,
CVPR 2016
. [code] - DenseCap: Fully Convolutional Localization Networks for Dense Captioning - Johnson J et al,
CVPR 2016
. [code] - What value do explicit high level concepts have in vision to language problems? - Wu Q et al,
CVPR 2016
. - SPICE: Semantic Propositional Image Caption Evaluation - Anderson P et al,
ECCV 2016
. [code] - Image Captioning with Deep Bidirectional LSTMs - Wang C et al,
ACMMM 2016
. [code] - Multimodal Pivots for Image Caption Translation - Hitschler J et al,
ACL 2016
. - Image Caption Generation with Text-Conditional Semantic Attention - Zhou L et al,
arXiv preprint 2016
. [code] - DeepDiary: Automatic Caption Generation for Lifelogging Image Streams - Fan C et al,
arXiv preprint 2016
. - Learning to generalize to new compositions in image understanding - Atzmon Y et al,
arXiv preprint 2016
. - Generating captions without looking beyond objects - Heuer H et al,
arXiv preprint 2016
. - Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning - Chen W et al,
arXiv preprint 2016
. [code] - Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering - Liu H et al,
arXiv preprint 2016
. - Recurrent Highway Networks with Language CNN for Image Captioning - Gu J et al,
arXiv preprint 2016
.
- Captioning Images with Diverse Objects - Venugopalan S et al,
CVPR 2017
. [code] - Top-down Visual Saliency Guided by Captions - Ramanishka V et al,
CVPR 2017
. [code] - Self-Critical Sequence Training for Image Captioning - Steven J et al,
CVPR 2017
. [code] - Dense Captioning with Joint Inference and Visual Context - Yang L et al,
CVPR 2017
. [code] - Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition - Yufei W et al,
CVPR 2017
. [code] - A Hierarchical Approach for Generating Descriptive Image Paragraphs - Krause J et al,
CVPR 2017
. [code] - Deep Reinforcement Learning-based Image Captioning with Embedding Reward - Ren Z et al,
CVPR 2017
. - Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects - Ting Y et al,
CVPR 2017
. - Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning - Lu J et al,
CVPR 2017
. [code] - Attend to You: Personalized Image Captioning with Context Sequence Memory Networks - CC Park et al,
CVPR 2017
. [code] - SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning - Chen L et al,
CVPR 2017
. [code] - Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-In-The-Blank Image Captioning - Qing S et al,
CVPR 2017
. - Areas of Attention for Image Captioning - Pedersoli M et al,
ICCV 2017
. - Boosting Image Captioning with Attributes - Yao T et al,
ICCV 2017
. - An Empirical Study of Language CNN for Image Captioning - Gu J et al,
ICCV 2017
. - Improved Image Captioning via Policy Gradient Optimization of SPIDEr - Liu S et al,
ICCV 2017
. - Towards Diverse and Natural Image Descriptions via a Conditional GAN - Dai B et al,
ICCV 2017
. [code] - Paying Attention to Descriptions Generated by Image Captioning Models - Tavakoliy H R et al,
ICCV 2017
. - Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner - Chen T H et al,
ICCV 2017
. [code] - Image Caption with Global-Local Attention - Li L et al,
AAAI 2017
. - Reference Based LSTM for Image Captioning - Chen M et al,
AAAI 2017
. - Attention Correctness in Neural Image Captioning - Liu C et al,
AAAI 2017
. - Text-guided Attention Model for Image Captioning - Mun J et al,
AAAI 2017
. [code] - Contrastive Learning for Image Captioning - Dai B et al,
NIPS 2017
. [code] - Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge - Vinyals O et al,
TPAMI 2017
. [code] - MAT: A Multimodal Attentive Translator for Image Captioning - Liu C et al,
arXiv preprint 2017
. - Actor-Critic Sequence Training for Image Captioning - Zhang L et al,
arXiv preprint 2017
. - What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator? - Tanti M et al,
arXiv preprint 2017
. [code] - Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning - Xian Y et al,
arXiv preprint 2017
. - Phrase-based Image Captioning with Hierarchical LSTM Model - Tan Y H et al,
arXiv preprint 2017
. - Show-and-Fool: Crafting Adversarial Examples for Neural Image Captioning - Chen H et al,
arXiv preprint 2017
.
- Neural Baby Talk - Lu J et al,
CVPR 2018
. [code] - Convolutional Image Captioning - Aneja J et al,
CVPR 2018
. - Learning to Evaluate Image Captioning - Cui Y et al,
CVPR 2018
. [code] - Discriminability Objective for Training Descriptive Captions - Luo R et al,
CVPR 2018
. [code] - SemStyle: Learning to Generate Stylised Image Captions using Unaligned Text - Mathews A et al,
CVPR 2018
. - Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering - Anderson P et al,
CVPR 2018
. [code] - GroupCap: Group-Based Image Captioning With Structured Relevance and Diversity Constraints
- Chen F et al,
CVPR 2018
. - Unpaired Image Captioning by Language Pivoting - Gu J et al,
ECCV 2018
. - Recurrent Fusion Network for Image Captioning - Jiang W et al,
ECCV 2018
. - Rethinking the Form of Latent States in Image Captioning - Dai B et al,
ECCV 2018
. [code] - Learning to Guide Decoding for Image Captioning - Jiang W et al,
AAAI 2018
. - Stack-Captioning: Coarse-to-Fine Learning for Image Captioning - Gu J et al,
AAAI 2018
. [code] - Temporal-difference Learning with Sampling Baseline for Image Captioning - Chen H et al,
AAAI 2018
. - Partially-Supervised Image Captioning - Anderson P et al,
NeurIPS 2018
. - A Neural Compositional Paradigm for Image Captioning - Dai B et al,
NeurIPS 2018
. - Defoiling Foiled Image Captions - Wang J et al,
NAACL 2018
. - Punny Captions: Witty Wordplay in Image Descriptions - Chandrasekaran A et al,
NAACL 2018
. [code] - Object Counts! Bringing Explicit Detections Back into Image Captioning - Aneja J et al,
NAACL 2018
. - Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning - Sharma P et al,
ACL 2018
. [code] - Attacking visual language grounding with adversarial examples: A case study on neural image captioning - Chen H et al,
ACL 2018
. [code] - Improved Image Captioning with Adversarial Semantic Alignment - Melnyk I et al,
arXiv preprint 2018
. - Improving Image Captioning with Conditional Generative Adversarial Nets - Chen C et al,
arXiv preprint 2018
. - CNN+CNN: Convolutional Decoders for Image Captioning - Wang Q et al,
arXiv preprint 2018
. - Diverse and Controllable Image Captioning with Part-of-Speech Guidance - Deshpande A et al,
arXiv preprint 2018
.
- Unsupervised Image Captioning - Yang F et al,
CVPR 2019
. [code] - Engaging Image Captioning Via Personality - Shuster K et al,
CVPR 2019
. - Pointing Novel Objects in Image Captioning - Li Y et al,
CVPR 2019
. - Context and Attribute Grounded Dense Captioning - Yin G et al,
CVPR 2019
. - Auto-Encoding Scene Graphs for Image Captioning - Yang X et al,
CVPR 2019
. - Self-critical n-step Training for Image Captioning - Gao J et al,
CVPR 2019
. - Intention Oriented Image Captions with Guiding Objects - Zheng Y et al,
CVPR 2019
. - Describing like humans: on diversity in image captioning - Wang Q et al,
CVPR 2019
. - CapSal: Leveraging Captioning to Boost Semantics for Salient Object Detection - Zhang L et al,
CVPR 2019
. [code] - Fast, Diverse and Accurate Image Captioning Guided By Part-of-Speech - Aditya D et al,
CVPR 2019
. - Good News, Everyone! Context driven entity-aware captioning for news images - Biten A F et al,
CVPR 2019
. - Dense Relational Captioning: Triple-Stream Networks for Relationship-Based Captioning - Kim D et al,
CVPR 2019
. - Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions - Cornia M et al,
CVPR 2019
. [code] - Meta Learning for Image Captioning - Li N et al,
AAAI 2019
. - Learning Object Context for Dense Captioning - Li X et al,
AAAI 2019
. - Hierarchical Attention Network for Image Captioning - Wang W et al,
AAAI 2019
. - Deliberate Residual based Attention Network for Image Captioning - Gao L et al,
AAAI 2019
. - Improving Image Captioning with Conditional Generative Adversarial Nets - Chen C et al,
AAAI 2019
. - Connecting Language to Images: A Progressive Attention-Guided Network for Simultaneous Image Captioning and Language Grounding - Song L et al,
AAAI 2019
.
- nocaps, LANG:
English
- MS COCO, LANG:
English
. - Flickr 8k, LANG:
English
. - Flickr 30k, LANG:
English
. - AI Challenger, LANG:
Chinese
. - Visual Genome, LANG:
English
. - SBUCaptionedPhotoDataset, LANG:
English
. - IAPR TC-12, LANG:
English, German and Spanish
.
To the extent possible under law, Zhihong Chen has waived all copyright and related or neighboring rights to this work.