Skip to content

Latest commit

 

History

History
78 lines (67 loc) · 3.09 KB

README.md

File metadata and controls

78 lines (67 loc) · 3.09 KB

SS-VideoCaptioning

This repository contains the Tensorflow implementation of our model "Semantically Sensible Video Captioning (SSVC)"
[Code] [Paper] [ArXiv]

Main Model

Authors

Md. Mushfiqur Rahman, Thasin Abedin, Khondokar S. S. Prottoy, Ayana Moshruba, Fazlul Hasan Siddiqui

Requirements

Install the following dependencies before running the model

  • Tensorflow 2.0 install
  • tqdm pip install tqdm
  • sklearn pip install -U scikit-learn
  • nltk pip install nltk

Directory structure

-root
  -glove.6B.100d.txt
  -MSVD_captions.csv
  -models_and_utils
    -models.py
    -utils.py
  -data_picle
    -train
      -filename1.pkl
      -filename2.pkl
      ...
    -test
      -filename1.pkl
      -filename2.pkl
      ...
    -validation
      -filename1.pkl
      -filename2.pkl
      ...
    -train.csv
    -test.csv
    -validation.csv

Train and Evaluate

  • Download and extract 'glove.6B.100d.txt' link
  • Download the MSVD dataset and create corresponding pickle files using vid2frames.ipynb. Split the data in train-test-val sets.

    Alternate step: Download and extract 'data_pickle.zip'. This compressed file already contains the pickles files of MSVD dataset

  • run the train.ipynb file

    This file has a detailed list of options. Change the options to adjust the model according to requirements

  • Train and evaluation codes are inside the python notebook

Sample Outputs


SSVC: "A woman is cutting a piece of meat"
GT: "a woman is cutting into the fatty areas of a pork chop"
SS score: 1.0, BLEU1: 1.0, BLEU2: 1.0, BLEU3: 1.0, BLEU4: 1.0


SSVC: "A person is slicing tomato"
GT: "Someone wearing blue rubber gloves is slicing a tomato with a large knife"
SS score: 0.825, BLEU1: 1.0, BLEU2: 1.0, BLEU3: 1.0, BLEU4: 1.0


SSVC: "A woman is cutting a piece of meat"
GT: "a woman is cutting into the fatty areas of a pork chop"
SS score: 0.94, BLEU1: 1.0, BLEU2: 0.84, BLEU3: 0.61, BLEU4: 0.0

Please cite the following:

@article{rahman2021video,
  title={Video captioning with stacked attention and semantic hard pull},
  author={Rahman, Md Mushfiqur and Abedin, Thasin and Prottoy, Khondokar SS and Moshruba, Ayana and Siddiqui, Fazlul Hasan},
  journal={PeerJ Computer Science},
  volume={7},
  pages={e664},
  year={2021},
  publisher={PeerJ Inc.}
}