Skip to content

Latest commit

 

History

History
198 lines (145 loc) · 9.83 KB

multimodal.md

File metadata and controls

198 lines (145 loc) · 9.83 KB

Multimodal

Comic-Guided Speech Synthesis
[project]

Word2vec to behavior: morphology facilitates the grounding of language in machines (IROS2019)
David Matthews, Sam Kriegman, Collin Cappelle, Josh Bongard
[paper]

Embodied Language Grounding with Implicit 3D Visual Feature Representations
Mihir Prabhudesai, Hsiao-Yu Fish Tung, Syed Ashar Javed, Maximilian Sieb, Adam W. Harley, Katerina Fragkiadaki
[paper]

Expressing Visual Relationships via Language (ACL2019)
Hao Tan, Franck Dernoncourt, Zhe Lin, Trung Bui, Mohit Bansal
[paper]

Toward Self-Supervised Object Detection in Unlabeled Videos
Elad Amrani, Rami Ben-Ari, Tal Hakim, Alex Bronstein
[paper]

Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding (CVPR2019)
Hassan Akbari, Svebor Karaman, Surabhi Bhargava, Brian Chen, Carl Vondrick, Shih-Fu Chang
[paper]

FAN: Focused Attention Networks
Chu Wang, Babak Samari, Vladimir Kim, Siddhartha Chaudhuri, Kaleem Siddiqi
[paper]

Learning to Explain with Complemental Examples (CVPR2019)
Atsushi Kanehira, Tatsuya Harada
[paper]

Sketchforme: Composing Sketched Scenes from Text Descriptions for Interactive Applications
Forrest Huang, John F. Canny
[paper]

End-to-End Learning Using Cycle Consistency for Image-to-Caption Transformations
Keisuke Hagiwara, Yusuke Mukuta, Tatsuya Harada
[paper]

Text-to-Image-to-Text Translation using Cycle Consistent Adversarial Networks
Satya Krishna Gorti, Jeremy Ma
[paper]

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI2019 oral)
Hang Zhou, Yu Liu, Ziwei Liu, Ping Luo, Xiaogang Wang
[paper]

Unified Visual-Semantic Embeddings: Bridging Vision and Language with Structured Meaning Representations (CVPR2019)
Hao Wu, Jiayuan Mao, Yufeng Zhang, Yuning Jiang, Lei Li, Weiwei Sun, Wei-Ying Ma
[paper]

Prospection: Interpretable Plans From Language By Predicting the Future (ICRA2019)
Chris Paxton, Yonatan Bisk, Jesse Thomason, Arunkumar Byravan, Dieter Fox
[paper]

A Knowledge-Grounded Multimodal Search-Based Conversational Agent
Shubham Agarwal, Ondrej Dusek, Ioannis Konstas, Verena Rieser
[paper]

Hierarchical Scene Parsing by Weakly Supervised Learning with Image Descriptions
Ruimao Zhang, Liang Lin, Guangrun Wang, Meng Wang, Wangmeng Zuo
[paper]

Learning Robust Visual-Semantic Embeddings (ICCV2017)
[paper]

Deep Visual-Semantic Quantization for Efficient Image Retrieval (CVPR2017)
[paper]

Transductive Visual-Semantic Embedding for Zero-shot Learning (ICMR2017)
[paper]

Hierarchical Multimodal LSTM for Dense Visual-Semantic Embedding (ICCV2017)
[paper]

Multiple Instance Visual-Semantic Embedding (BMVC2017)
[paper]

Finding beans in burgers: Deep semantic-visual embedding with localization (CVPR2018)
[paper]

Fine-grained Image Classification by Visual-Semantic Embedding (IJCAI2018)
[paper]

VSE-ens: Visual-Semantic Embeddings with Efficient Negative Sampling (AAAI2018)
Guibing Guo, Songlin Zhai, Fajie Yuan, Yuan Liu, Xingwei Wang
[paper]

VSE++: Improving Visual-Semantic Embeddings with Hard Negatives (BMVC2018 spotlight)
Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, Sanja Fidler
[paper]

Actor and Action Video Segmentation from a Sentence (CVPR2018 oral)
Kirill Gavrilyuk, Amir Ghodrati, Zhenyang Li, Cees G.M. Snoek
[paper]

Guide Me: Interacting with Deep Networks
Christian Rupprecht, Iro Laina, Nassir Navab, Gregory D. Hager, Federico Tombari
[paper]

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence
Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Anna Rohrbach, Bernt Schiele, Trevor Darrell, Marcus Rohrbach
[paper]

From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood(ACL2017)
[paper]

Gated-Attention Architectures for Task-Oriented Language Grounding
Devendra Singh Chaplot, Kanthashree Mysore Sathyendra, Rama Kumar Pasumarthi, Dheeraj Rajagopal, Ruslan Salakhutdinov
[paper]

Netizen-Style Commenting on Fashion Photos: Dataset and Diversity Measures(WWW2018)
Wen Hua Lin, Kuan-Ting Chen, Hung Yueh Chiang, Winston Hsu
[paper]

Natural Language Communication with Robots (NAACL2016)
[paper]

Source-Target Inference Models for Spatial Instruction Understanding (AAAI2018)
Hao Tan, Mohit Bansal
[paper]

Learning Interpretable Spatial Operations in a Rich 3D Blocks World (AAAI2018)
Yonatan Bisk, Kevin J. Shih, Yejin Choi, Daniel Marcu
[paper]

Grounded Language Learning in a Simulated 3D World
Karl Moritz Hermann, Felix Hill, Simon Green, Fumin Wang, Ryan Faulkner, Hubert Soyer, David Szepesvari, Wojciech Marian Czarnecki, Max Jaderberg, Denis Teplyashin, Marcus Wainwright, Chris Apps, Demis Hassabis, Phil Blunsom
[paper]

Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions
Jun Hatori, Yuta Kikuchi, Sosuke Kobayashi, Kuniyuki Takahashi, Yuta Tsuboi, Yuya Unno, Wilson Ko, Jethro Tan
[paper]

Unsupervised Visual Sense Disambiguation for Verbs using Multimodal Embeddings(NAACL2016)
[paper]

Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes(EMNLP2016)
[paper]

Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
Andrew Owens, Jiajun Wu, Josh H. McDermott, William T. Freeman, Antonio Torralba
[paper]

Learning Modality-Invariant Representations for Speech and Images
Kenneth Leidal, David Harwath, James Glass
[paper]

Visual to Sound: Generating Natural Sound for Videos in the Wild
[project]

Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
Peter Anderson, Qi Wu, Damien Teney, Jake Bruce, Mark Johnson, Niko Sünderhauf, Ian Reid, Stephen Gould, Anton van den Hengel
[paper]

Person Search with Natural Language Description   [paper]

From Red Wine to Red Tomato: Composition with Context
[paper]

VSE++: Improved Visual-Semantic Embeddings
Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, Sanja Fidler
[paper]

Self-supervised learning of visual features through embedding images into text topic spaces(CVPR2017)
[paper]
[code]

SCAN: Learning Abstract Hierarchical Compositional Visual Concepts
Irina Higgins, Nicolas Sonnerat, Loic Matthey, Arka Pal, Christopher P Burgess, Matthew Botvinick, Demis Hassabis, Alexander Lerchner
[paper]

Conditional generation of multi-modal data using constrained embedding space mapping   Subhajit Chaudhury, Sakyasingha Dasgupta, Asim Munawar, Md. A. Salam Khan, Ryuki Tachibana
[paper]

Look, Listen and Learn
Relja Arandjelović, Andrew Zisserman
[paper]

Recurrent Multimodal Interaction for Referring Image Segmentation   Chenxi Liu, Zhe Lin, Xiaohui Shen, Jimei Yang, Xin Lu, Alan Yuille
[paper]

Visually grounded learning of keyword prediction from untranscribed speech
Herman Kamper, Shane Settle, Gregory Shakhnarovich, Karen Livescu
[paper]

Cross-modal Deep Metric Learning with Multi-task Regularization
Xin Huang, Yuxin Peng
[paper]

Fusion of EEG and Musical Features in Continuous Music-emotion Recognition(AAAI2017)
Nattapong Thammasan, Ken-ichi Fukui, Masayuki Numao
[paper]