This repository uses Pytorch Lightning to implement the training and models, Hydra to define the configurations, and Wandb to visualize the training.
The experiments are defined as YAML files in the configs/experiments
folder. For more detailed information on the
structure of the config files, and how to create them, read configs/README.md
. Here we show the basic information.
To run an experiment, call the run.py
file with the name of the experiment:
python run.py +experiments=experiment_name # without the .yaml extension
If the experiment name is inside a subfolder of the experiments
directory, simply add it:
python run.py +experiments=folder_experiment/experiment_name
A specific command to run an actual experiment in this project would be:
python run.py +experiments=train/train_finegym/finegym_bbox_triplet_asym
which trains our model with bounding box representations, for the short sequences case in FineGym. All the other training configuration files are under the same directory.
To debug, we have to add a debug configuration from the configs/debug
folder, like debug/debug.yaml
. It is
recommended to add it after the experiment:
python run.py +experiments=experiment_name +debug=debug
For small changes that do not require creating a new experiment YAML file, we can just add the parameters in the same command:
python run.py +experiments=experiment_name \
wandb.name=new_name \
dataset.dataloader_params.batch_size=8 \
++trainer.max_epochs=100
+trainer.new_param_trainer=0.001 \
~trainer.profiler
where ~
removes a parameter from the configuration, +
adds a non-existing parameter, and no prefix or ++
changes
an existing parameter.
There are different options under resume
:
load_all
. Resuming same training. In this case we want to load weights, training state, wandb run, and have same config for dataset and everything else. We explicitly make sure all the config is the same, including dataset, dataloader, etc. Theid
has to be defined.load_state
. Pre-train from a previous checkpoint, load both training state and model.load_model
. Pre-train from a previous checkpoint, only model.
The priority is from top to bottom. So if load_all
is true and load_state
is false, load_all
prevails.
If any of the first three is set, either the id
of the experiment we are loading from, or the path
of the checkpoint
we are loading from have to be set.
For load_state
and load_model
we do not check configuration. If parameters like lr
are set, they will be
overwritten, but other configurations like a different optimizer or different network size will break because the load
will not work.
The option of resuming training with different parameters (like learning rate), but under the same wandb run and model folder is not supported, because it is confusing and bad for reproducibility. wandb logs in the filesystem (not on the web application) each run separately even when they have the same id (which is good and clear), but still not enough for this feature to be supported.
The checkpoints and wandb logs are stored in the wandb.save_dir
directory (under {wandb.project}
and wandb
folders, respectively). The checkpoints are stored with the experiment id (e.g. 1234abcd), and the logs under the run
ID, which has a format like run-{date}_{time}-{id}. The run name is not necessary, it is loaded from the experiment
id.
The logs are also stored online, and can be accessed in https://wandb.ai (you will need to create an account). The
experiment ID can be found by accessing a specific run, going to overview ("info" sign top left), and check the "run
path". Change wandb config in configs/wandb/wandb.yaml
We use the FineGym (downloadable from this link), Diving48 ( in this link), and FisV (this link) datasets.
The process to obtain the keypoints that the model uses is described next.
There are three steps to obtain keypoints from data:
- Extract keypoints from either videos or images using OpenPose. The code in
extract_keypoints_images.py
andextract_keypoints_videos.py
underdata/data_utils
does that. We used OpenPose in a docker installation. - For videos that may contain multiple shots, extract divisions between shots in using
shot_detection.py
. - Post-process keypoints to group them into trajectories (they are initially extracted per-frame). This is done automatically during the dataset creation when running experiments.
The configuration of the cuda environment is in requirements.yml
. To create an environment with the same packages,
run:
conda env create --file requirements.yml
The code is divided in different files and folders (all python):
run.py
. Main file to be executed. Loads the configuration, creates model, trainer, and dataloader, and runs them.losses.py
. File with loss functions and other evaluation functions.distances.py
. File with distance functions.data
. Dataset and dataloader code. Relies on a LightningDataModule, defined inmain_data_module.py
, that manages the dataset. The datasets are defined underdata/datasets
, and all inherit from theBaseDataset
defined inbase_dataset.py
. There is also adata_utils
folder with general dataset utils.models
. Under this folder we define the python modules (nn.Module
), undernetworks
, as well as the trainer, which is implemented using LightningModule. The lightning modules encapsulate all the training procedure, as well as the model definition.trajectory_dict.py
is an auxiliary file that defines the state of all input- and latent-space trajectories.utils
. General utils for the project.
Most of the files and methods are described in the code. For more specific comments about how they work and what they do, go directly to the files.