Recognition to Cognition Networks https://visualcommonsense.com https://github.com/rowanz/r2c
CVPR 2018 - Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present https://github.com/chenxinpeng/ARNet
Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training https://github.com/bei21/img2poem
Dataset and starting code for visual entailment dataset https://arxiv.org/abs/1811.10582 https://github.com/necla-ml/SNLI-VE
Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction https://github.com/shikorab/SceneGraph
Tensorflow implementation of "A Structured Self-Attentive Sentence Embedding" https://github.com/flrngel/Self-Attentive-tensorflow
This repository contains the reference code for the paper Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions (CVPR 2019). https://github.com/aimagelab/show-control-and-tell
The Code for ICME2019 Grand Challenge: Short Video Understanding (Single Model Ranks 6th) https://github.com/guoday/ICME2019-CTR
【基于Transformer的图像自动描述PyTorch/Fairseq扩展】 https://github.com/krasserm/fairseq-image-captioning
Code for Neural Inverse Knitting: From Images to Manufacturing Instructions https://github.com/xionluhnis/neural_inverse_knitting
Code for paper "Attention on Attention for Image Captioning". ICCV 2019 https://arxiv.org/abs/1908.06954 https://github.com/husthuaan/AoANet
Learning to Evaluate Image Captioning. CVPR 2018 https://github.com/richardaecn/cvpr18-caption-eval
Vision-Language Pre-training for Image Captioning and Question Answering https://github.com/LuoweiZhou/VLP
Official Tensorflow Implementation of the paper "Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning" in CVPR 2018, with code, model and prediction results. https://github.com/JaywongWang/DenseVideoCaptioning
A PyTorch implementation of Transformer in "Attention is All You Need" https://arxiv.org/abs/1706.03762 https://github.com/dreamgonfly/Transformer-pytorch
《ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data》 https://www.arxiv-vanity.com/papers/2001.07966/
【结合BERT的图片描述生成】’Image Captioning System - BERT + Image Captioning' https://github.com/ajamjoom/Image-Captions
M^2: Meshed-Memory Transformer for Image Captioning https://github.com/aimagelab/meshed-memory-transformer
https://github.com/facebookresearch/grounded-video-description
Reformer:高效的Transformer https://github.com/google/trax/tree/master/trax/models/reformer
ICCV研讨会的中英文视频描述大赛 http://vatex.org/main/index.html
Cooperative Vision-and-Dialog Navigation https://github.com/mmurray/cvdn
Auto-Encoding Scene Graphs for Image Captioning, CVPR 2019 https://github.com/yangxuntu/SGAE
A PyTorch implementation of the Transformer model from "Attention Is All You Need". https://github.com/phohenecker/pytorch-transformer