SS-VideoCaptioning

This repository contains the Tensorflow implementation of our model "Semantically Sensible Video Captioning (SSVC)"
[Code] [Paper] [ArXiv]

Authors

Md. Mushfiqur Rahman, Thasin Abedin, Khondokar S. S. Prottoy, Ayana Moshruba, Fazlul Hasan Siddiqui

Requirements

Install the following dependencies before running the model

Tensorflow 2.0 install
tqdm pip install tqdm
sklearn pip install -U scikit-learn
nltk pip install nltk

Directory structure

-root
  -glove.6B.100d.txt
  -MSVD_captions.csv
  -models_and_utils
    -models.py
    -utils.py
  -data_picle
    -train
      -filename1.pkl
      -filename2.pkl
      ...
    -test
      -filename1.pkl
      -filename2.pkl
      ...
    -validation
      -filename1.pkl
      -filename2.pkl
      ...
    -train.csv
    -test.csv
    -validation.csv

Train and Evaluate

Download and extract 'glove.6B.100d.txt' link
Download the MSVD dataset and create corresponding pickle files using vid2frames.ipynb. Split the data in train-test-val sets.

Alternate step: Download and extract 'data_pickle.zip'. This compressed file already contains the pickles files of MSVD dataset
run the train.ipynb file

This file has a detailed list of options. Change the options to adjust the model according to requirements
Train and evaluation codes are inside the python notebook

Sample Outputs

SSVC: "A woman is cutting a piece of meat"
GT: "a woman is cutting into the fatty areas of a pork chop"
SS score: 1.0, BLEU1: 1.0, BLEU2: 1.0, BLEU3: 1.0, BLEU4: 1.0

SSVC: "A person is slicing tomato"
GT: "Someone wearing blue rubber gloves is slicing a tomato with a large knife"
SS score: 0.825, BLEU1: 1.0, BLEU2: 1.0, BLEU3: 1.0, BLEU4: 1.0

SSVC: "A woman is cutting a piece of meat"
GT: "a woman is cutting into the fatty areas of a pork chop"
SS score: 0.94, BLEU1: 1.0, BLEU2: 0.84, BLEU3: 0.61, BLEU4: 0.0

Please cite the following:

@article{rahman2021video,
  title={Video captioning with stacked attention and semantic hard pull},
  author={Rahman, Md Mushfiqur and Abedin, Thasin and Prottoy, Khondokar SS and Moshruba, Ayana and Siddiqui, Fazlul Hasan},
  journal={PeerJ Computer Science},
  volume={7},
  pages={e664},
  year={2021},
  publisher={PeerJ Inc.}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

SS-VideoCaptioning

Authors

Requirements

Directory structure

Train and Evaluate

Sample Outputs

Files

README.md

Latest commit

History

README.md

File metadata and controls

SS-VideoCaptioning

Authors

Requirements

Directory structure

Train and Evaluate

Sample Outputs