The aim focus of is to improve the pretext task of object detection with the help of coherence-metrics.
This implementation acheives this by training the Siamese Autoencoder (models/model.py
) on a dataset with the help of the scripts/dataloader.py
. The model works on the principle of attention mechanism, where it learns spatial features with each reconstruction.
The network is uses two loss functions (in scripts/losses.py
): ContrastiveLoss and MSELoss.
-
The ConstrastiveLoss is used to measure the similarity between two frames from the dataloader (to measure coherence metrics)
-
The MSELoss is used for reconstruction of the inputs (for the encoder to learn representations)
(Under progress)
Enable gpu 0
$export CUDA_VISIBLE_DEVICES=0
Train network
$CUDA_VISIBLE_DEVICES=0 python train.py --lr_sim 5e-4 --lr_recon 1e-4 --epochs 25 --batch_size 4 --mu 1e-5 --training_dir 'Add dataset path' --training_csv 'Add corresponding csv path --num_workers 2'
$CUDA_VISIBLE_DEVICES=0 python test.py --batch_size 2 --test_dir 'Add-path-to-testset' --test_csv 'Add-path-to-csv.csv'
References:
@misc{goroshin2015unsupervised,
title={Unsupervised Learning of Spatiotemporally Coherent Metrics},
author={Ross Goroshin and Joan Bruna and Jonathan Tompson and David Eigen and Yann LeCun},
year={2015},
eprint={1412.6056},
archivePrefix={arXiv},
primaryClass={cs.CV}
}