This repository contains material and code related to the COLING 2020-paper Probing Multimodal Embeddings for Linguistic Properties: the Visual Semantic Case
.
All contact regarding this repository or the related paper is pointed to dali@cs.umu.se
.
The main requirements of this code is Python 3 together with Pytorch 1.4 using Tensorflow 2.1 and CUDA 10. The code is GPU-based, and has not been tested on CPU-only. All requirements can be found in requirements.txt, and can be installed using
pip install -r requirements.txt
The code runs on Pytorch and is partially based on VSE++.
Each of the visual-semantic models investigated give access to pretrained models via their respective repositories. It is required to add these repositories to the root of this project to test each of them. Their respective pretrained models can be located elsewhere, as the model path is given separately.
You also need the MSCOCO annotation data, together with the original images and captions. These experiments uses the 2014 training and valiation sets. Pycocotools is used to parse the MSCOCO annotation data.
Before running the semantic congruence task, the alternative captions must be generated and placed in the same directory as the original caption files.
python3 semantic_transform.py --datadir annotations/ --datatype train2014
Extracting embeddings, training and evaluating the probe are all done by evaluate.py
. An example call is
python3 evaluate.py --annotation_path annotations/ --data_path vsepp/data/ --model_path vsepp/runs/coco_vse++/model_best.pth.tar --model vsepp --task objcat --probe linear --resultfile results/vse_linear_objcat
- The symbolic link
data
is used by the VSE-models to load the vocabulary. - The
vsepp
data directory contains the alternative caption files generated bysemantic_transform.py
. These generated files needs to be either linked or copied to the HAL and VSE_C data directories.
.
├── LICENSE
├── README.md
├── embeddings.py
├── evaluate.py
├── log_collector.py
├── probing_model.py
├── probing_tasks.py
├── progress.py
├── requirements.txt
├── semantic_transform.py
├── annotations
│ ├── alt_captions_train2014.json
│ ├── alt_captions_val2014.json
│ ├── alts_train2014.json
│ ├── captions_train2014.json
│ ├── captions_val2014.json
│ ├── instances_train2014.json
│ ├── instances_val2014.json
│ └── train_ids_categories.npy
├── data -> vsepp/data/
├── results/
├── hal/
├── VSE_C/
└── vsepp
├── LICENSE
├── README.md
├── data
│ ├── coco_precomp
│ │ ├── dev.txt
│ │ ├── dev_alts.txt
│ │ ├── dev_caps.txt
│ │ ├── dev_ims.npy
│ │ ├── dev_tags.txt
│ │ ├── test.txt
│ │ ├── test_alts.txt
│ │ ├── test_caps.txt
│ │ ├── test_ims.npy
│ │ ├── test_tags.txt
│ │ ├── train.txt
│ │ ├── train_alts.txt
│ │ ├── train_caps.txt
│ │ ├── train_ims.npy
│ │ └── train_tags.txt
│ └── vocab
│ └── coco_precomp_vocab.pkl
├── model.py
├── runs
│ └── coco_vse++
│ └── model_best.pth.tar
└── vocab.py
10 directories, 27 files
usage: evaluate.py [-h] [--vsemodel {vsepp,vsec,hal}] [--unimodel {bert,gpt2}]
[--annotation_path ANNOTATION_PATH] [--data_path DATA_PATH]
[--split {train2014,val2014}]
[--vse_model_path VSE_MODEL_PATH]
[--result_file RESULT_FILE]
[--task {objcat,numobj,semcong}] [--seed SEED]
[--probe {mlp,linear}]
optional arguments:
-h, --help show this help message and exit
--vsemodel {vsepp,vsec,hal}
The visual-semantic embedding model to probe.
--unimodel {bert,gpt2}
The unimodal embedding model to probe.
--annotation_path ANNOTATION_PATH
Path to MSCOCO annotations.
--data_path DATA_PATH
Path to the raw MSCOCO data.
--split {train2014,val2014}
Which MSCOCO datasplit to use.
--vse_model_path VSE_MODEL_PATH
Path to pretrained visual-semantic embedding model.
--result_file RESULT_FILE
File to store probing results.
--task {objcat,numobj,semcong}
The probing task to execute.
--seed SEED The seed used for the Numpy RNG.
--probe {mlp,linear} Which probing model to use.
Please cite the our paper if you use this code (or derivatives) in your own work:
@inproceedings{dahlgren-lindstrom-etal-2020-probing,
title = "Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case",
author = {Dahlgren Lindstr{\"o}m, Adam and
Bj{\"o}rklund, Johanna and
Bensch, Suna and
Drewes, Frank},
booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
month = dec,
year = "2020",
address = "Barcelona, Spain (Online)",
publisher = "International Committee on Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.coling-main.64",
pages = "730--744",
}