Skip to content

Latest commit

 

History

History
167 lines (140 loc) · 6.23 KB

README.md

File metadata and controls

167 lines (140 loc) · 6.23 KB

Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case

This repository contains material and code related to the COLING 2020-paper Probing Multimodal Embeddings for Linguistic Properties: the Visual Semantic Case.

Contact

All contact regarding this repository or the related paper is pointed to dali@cs.umu.se.

Requirements

The main requirements of this code is Python 3 together with Pytorch 1.4 using Tensorflow 2.1 and CUDA 10. The code is GPU-based, and has not been tested on CPU-only. All requirements can be found in requirements.txt, and can be installed using

pip install -r requirements.txt

Instructions - under construction

The code runs on Pytorch and is partially based on VSE++.

External dependencies

Each of the visual-semantic models investigated give access to pretrained models via their respective repositories. It is required to add these repositories to the root of this project to test each of them. Their respective pretrained models can be located elsewhere, as the model path is given separately.

  • VSE++ (see Download data which also contains the vocab and model data)
  • VSE-C
  • HAL

You also need the MSCOCO annotation data, together with the original images and captions. These experiments uses the 2014 training and valiation sets. Pycocotools is used to parse the MSCOCO annotation data.

Generating alternative captions for semantic incongruencies

Before running the semantic congruence task, the alternative captions must be generated and placed in the same directory as the original caption files.

python3 semantic_transform.py --datadir annotations/ --datatype train2014

Training and evaluation

Extracting embeddings, training and evaluating the probe are all done by evaluate.py. An example call is

python3 evaluate.py --annotation_path annotations/ --data_path vsepp/data/ --model_path vsepp/runs/coco_vse++/model_best.pth.tar --model vsepp --task objcat --probe linear --resultfile results/vse_linear_objcat

Directory structure

  1. The symbolic link data is used by the VSE-models to load the vocabulary.
  2. The vsepp data directory contains the alternative caption files generated by semantic_transform.py. These generated files needs to be either linked or copied to the HAL and VSE_C data directories.
.
├── LICENSE
├── README.md
├── embeddings.py
├── evaluate.py
├── log_collector.py
├── probing_model.py
├── probing_tasks.py
├── progress.py
├── requirements.txt
├── semantic_transform.py
├── annotations
│   ├── alt_captions_train2014.json
│   ├── alt_captions_val2014.json
│   ├── alts_train2014.json
│   ├── captions_train2014.json
│   ├── captions_val2014.json
│   ├── instances_train2014.json
│   ├── instances_val2014.json
│   └── train_ids_categories.npy
├── data -> vsepp/data/
├── results/
├── hal/
├── VSE_C/
└── vsepp
    ├── LICENSE
    ├── README.md
    ├── data
    │   ├── coco_precomp
    │   │   ├── dev.txt
    │   │   ├── dev_alts.txt
    │   │   ├── dev_caps.txt
    │   │   ├── dev_ims.npy
    │   │   ├── dev_tags.txt
    │   │   ├── test.txt
    │   │   ├── test_alts.txt
    │   │   ├── test_caps.txt
    │   │   ├── test_ims.npy
    │   │   ├── test_tags.txt
    │   │   ├── train.txt
    │   │   ├── train_alts.txt
    │   │   ├── train_caps.txt
    │   │   ├── train_ims.npy
    │   │   └── train_tags.txt
    │   └── vocab
    │       └── coco_precomp_vocab.pkl
    ├── model.py
    ├── runs
    │   └── coco_vse++
    │       └── model_best.pth.tar
    └── vocab.py
10 directories, 27 files

Detailed usage

usage: evaluate.py  [-h] [--vsemodel {vsepp,vsec,hal}] [--unimodel {bert,gpt2}]
                    [--annotation_path ANNOTATION_PATH] [--data_path DATA_PATH]
                    [--split {train2014,val2014}]
                    [--vse_model_path VSE_MODEL_PATH]
                    [--result_file RESULT_FILE]
                    [--task {objcat,numobj,semcong}] [--seed SEED]
                    [--probe {mlp,linear}]

optional arguments:
  -h, --help            show this help message and exit
  --vsemodel {vsepp,vsec,hal}
                        The visual-semantic embedding model to probe.
  --unimodel {bert,gpt2}
                        The unimodal embedding model to probe.
  --annotation_path ANNOTATION_PATH
                        Path to MSCOCO annotations.
  --data_path DATA_PATH
                        Path to the raw MSCOCO data.
  --split {train2014,val2014}
                        Which MSCOCO datasplit to use.
  --vse_model_path VSE_MODEL_PATH
                        Path to pretrained visual-semantic embedding model.
  --result_file RESULT_FILE
                        File to store probing results.
  --task {objcat,numobj,semcong}
                        The probing task to execute.
  --seed SEED           The seed used for the Numpy RNG.
  --probe {mlp,linear}  Which probing model to use.

Cite

Please cite the our paper if you use this code (or derivatives) in your own work:

@inproceedings{dahlgren-lindstrom-etal-2020-probing,
    title = "Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case",
    author = {Dahlgren Lindstr{\"o}m, Adam  and
      Bj{\"o}rklund, Johanna  and
      Bensch, Suna  and
      Drewes, Frank},
    booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
    month = dec,
    year = "2020",
    address = "Barcelona, Spain (Online)",
    publisher = "International Committee on Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.coling-main.64",
    pages = "730--744",
}