Name		Name	Last commit message	Last commit date
parent directory ..
c2board @ c0d2213		c2board @ c0d2213
caffe2 @ b4e1588		caffe2 @ b4e1588
lstm		lstm
tracking		tracking
video-nonlocal-net		video-nonlocal-net
README.md		README.md

README.md

Baselines on CATER

Here we provide code and some basic instructions to reproduce some of the baselines on CATER.

Setup and process the data

Generate or copy over the pre-generated data to a fast I/O directory. unzip the file. For the non-local code, we need to generate LMDBs. Do so by:

$ cd baselines/video-nonlocal-net
$ vim process_data/cater/gen_lmdbs.py  # <-- change the path to the lists folder for which to generate lmdb
$ python process_data/cater/gen_lmdbs.py  # <-- This will generate the LMDBs in the same folder as lists

Launch non-local training pipeline

The training follows the same paradigm as the non local codebase. We provide sample .yaml config files in the configs_cater folder. To launch training, we provide a nifty little launch.py script. Again, you'll likely need to edit it to point it to the right paths (if needed). The pretrained model used for initialization can be downloaded from the non-local codebase [direct link]. Then, launch the training as follows:

$ vim configs_cater/001_I3D_NL_localize_imagenetPretrained_32f_8SR.yaml  # <-- change the first 2 lines to point to the data as processed above
$ python launch.py -c configs_cater/001_I3D_NL_localize_imagenetPretrained_32f_8SR.yaml

This would run the training and testing. The final trained output model for this run available is here. Configs for other experiments can also be created based on provided config, and can be run in the same way.

LSTM on top of Non local features

The above experiment uses the standard non-local testing paradigm: 30-crop average. However, for a task like CATER, training a temporal model on top of the clip features might make more sense than averaging the features. Hence, we also try a LSTM-based aggregation, which can be run as follows (after training the non-local model):

$ python launch.py -c configs_cater/001_I3D_NL_localize_imagenetPretrained_32f_8SR.yaml -t test  # <-- Test the model and store the features. These 
$ python launch.py -c configs_cater/001_I3D_NL_localize_imagenetPretrained_32f_8SR.yaml -t lstm  # <-- To train/test the LSTM. We saw some random variation in LSTM training so this script trains/tests the model 3 times and averages the numbers for a more stable estimate of performance

The expected performance for these models as in the paper is in this table (please expect some random variation across training runs):

Expt	Config	Localize (top-1 accuracy)
R3D + NL, 32 frame	configs_cater/001_I3D_NL_localize_imagenetPretrained_32f_8SR.yaml	28.8
R3D + NL, 32 frame, LSTM	configs_cater/001_I3D_NL_localize_imagenetPretrained_32f_8SR.yaml	45.5

Tracking baseline

This code was tested using pytorch 0.4. Set the paths in main.py and run it to run the tracking baseline. It uses DaSiamRPN codebase with pretrained model.

$ cd tracking
$ source activate pytorch0.4  # <-- Need pytorch0.4, OpenCV (with ffmpeg) installed
$ python main.py  # <-- To run the tracking. Change parameters in the

Expt	Config	Localize (top-1 accuracy)
Tracking	-	33.9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

baselines

baselines

README.md

Baselines on CATER

Setup and process the data

Launch non-local training pipeline

LSTM on top of Non local features

Tracking baseline

Files

baselines

Directory actions

More options

Directory actions

More options

Latest commit

History

baselines

Folders and files

parent directory

README.md

Baselines on CATER

Setup and process the data

Launch non-local training pipeline

LSTM on top of Non local features

Tracking baseline