This repository contains the PyTorch implementation of the paper Self-Supervised Video Similarity Learning. It contains code for the training of video similarity learning network with self-supervision. Also, to facilitate the reproduction of the paper's results, the evaluation code, the extracted features for the employed video datasets, and pre-trained models are provided.
- Python 3
- PyTorch
- Torchvision
- FFMpeg
- Clone this repo
$ git clone git@github.com:https://github.com/gkordo/s2vs.git
$ cd s2vs
- Install the required packages
$ pip install -r requirements.txt
- Extract the frames from the videos in the dataset used for training.
$ ffmpeg -nostdin -y -vf fps=1 -start_number 0 -q 0 ${video_id}/%05d.jpg -i <path_to_video>
-
Edit
scripts/train_ssl.sh
to configure the training session. -
Choose the augmentation types you want to include during training by providing the appropriate values to the
--augmentations
argument. Provide a string that containsGT
for Global Transformations,FT
for Frame TransformationsTT
for Temporal Transformations andViV
for Video-in-Video. -
Run the script as follows
$ bash scripts/train_ssl.sh
- Once the training is over, a
model.pth
file will have been created in a path based on the providedexperiment_path
argument.
-
Download the datasets from the original sources:
-
Determine the pattern based on the video ids that video files are stored, e.g.
{id}/video.*
if it follows the pattern:
Dataset_dir
├── video_id1
│ └── video.mp4
├── video_id2
│ └── video.flv
│ ⋮
└── video_idN
└── video.webm
- Run the
evaluation.py
script to evaluate a trained model.
$ python evaluation.py --dataset FIVR-200K --dataset_path <path_to_dataset> --pattern '{id}/video.*' --model_path <path_to_model>
or run the script with the provided features
$ python evaluation.py --dataset FIVR-200K --dataset_hdf5 <path_to_hdf5> --model_path <path_to_model>
- If no value is given to the
--model_path
argument, then the pretraineds2vs_dns
model is used.
import torch
feat_extractor = torch.hub.load('gkordo/s2vs:main', 'resnet50_LiMAC')
s2vs_dns = torch.hub.load('gkordo/s2vs:main', 's2vs_dns')
s2vs_vcdb = torch.hub.load('gkordo/s2vs:main', 's2vs_vcdb')
If you use this code for your research, please consider citing our papers:
@inproceedings{kordopatis2023s2vs,
title={Self-Supervised Video Similarity Learning},
author={Kordopatis-Zilos, Giorgos and Tolias, Giorgos and Tzelepis, Christos and Kompatsiaris, Ioannis and Patras, Ioannis and Papadopoulos, Symeon},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year={2023}
}
@inproceedings{kordopatis2019visil,
title={{ViSiL}: Fine-grained Spatio-Temporal Video Similarity Learning},
author={Kordopatis-Zilos, Giorgos and Papadopoulos, Symeon and Patras, Ioannis and Kompatsiaris, Ioannis},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
year={2019}
}
For visualization examples of augmentation and similarity matrices, as well as model usage in code, have a look at this Colab notebook.
DnS - computational efficiency w/ selector network
ViSiL - original ViSiL approach
FIVR-200K - download our FIVR-200K dataset
This project is licensed under the MIT License - see the LICENSE file for details
Giorgos Kordopatis-Zilos (kordogeo@fel.cvut.cz)