Recognition to Cognition Networks https://visualcommonsense.com https://github.com/rowanz/r2c

CVPR 2018 - Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present https://github.com/chenxinpeng/ARNet

Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training https://github.com/bei21/img2poem

Dataset and starting code for visual entailment dataset https://arxiv.org/abs/1811.10582 https://github.com/necla-ml/SNLI-VE

Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction https://github.com/shikorab/SceneGraph

Tensorflow implementation of "A Structured Self-Attentive Sentence Embedding" https://github.com/flrngel/Self-Attentive-tensorflow

This repository contains the reference code for the paper Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions (CVPR 2019). https://github.com/aimagelab/show-control-and-tell

The Code for ICME2019 Grand Challenge: Short Video Understanding (Single Model Ranks 6th) https://github.com/guoday/ICME2019-CTR

【基于Transformer的图像自动描述PyTorch/Fairseq扩展】 https://github.com/krasserm/fairseq-image-captioning

Code for Neural Inverse Knitting: From Images to Manufacturing Instructions https://github.com/xionluhnis/neural_inverse_knitting

Code for paper "Attention on Attention for Image Captioning". ICCV 2019 https://arxiv.org/abs/1908.06954 https://github.com/husthuaan/AoANet

Learning to Evaluate Image Captioning. CVPR 2018 https://github.com/richardaecn/cvpr18-caption-eval

Vision-Language Pre-training for Image Captioning and Question Answering https://github.com/LuoweiZhou/VLP

Official Tensorflow Implementation of the paper "Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning" in CVPR 2018, with code, model and prediction results. https://github.com/JaywongWang/DenseVideoCaptioning

A PyTorch implementation of Transformer in "Attention is All You Need" https://arxiv.org/abs/1706.03762 https://github.com/dreamgonfly/Transformer-pytorch

《ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data》 https://www.arxiv-vanity.com/papers/2001.07966/

【结合BERT的图片描述生成】’Image Captioning System - BERT + Image Captioning' https://github.com/ajamjoom/Image-Captions

M^2: Meshed-Memory Transformer for Image Captioning https://github.com/aimagelab/meshed-memory-transformer

Video Grounding and Captioning

https://github.com/facebookresearch/grounded-video-description

Reformer：高效的Transformer https://github.com/google/trax/tree/master/trax/models/reformer

ICCV研讨会的中英文视频描述大赛 http://vatex.org/main/index.html

Cooperative Vision-and-Dialog Navigation https://github.com/mmurray/cvdn

Auto-Encoding Scene Graphs for Image Captioning, CVPR 2019 https://github.com/yangxuntu/SGAE

A PyTorch implementation of the Transformer model from "Attention Is All You Need". https://github.com/phohenecker/pytorch-transformer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImageCaption.md

ImageCaption.md

Video Grounding and Captioning

Files

ImageCaption.md

Latest commit

History

ImageCaption.md

File metadata and controls

Video Grounding and Captioning