Skip to content

Details of Testing & Training scripts

Junyong Lee edited this page Mar 15, 2022 · 13 revisions

Testing

CUDA_VISIBLE_DEVICES=0 python run.py --mode [mode] --config [config] --data RealMCVSR --data_offset [data_offset] --output_offset [output_offset]
# e.g., CUDA_VISIBLE_DEVICES=0 python run.py --mode RefVSR_MFID --config config_RefVSR_MFID --data RealMCVSR --data_offset /data --output_offset ./result

Options

  • --mode: The name of a model to test.
  • --config: The name of a config file located as ./config/[config].py. If it is not specified, the config file used for training a model will be automatically loaded. Default: None.
  • --data: The name of a dataset for evaluation. Default: RealMCVSR
    • The data structure can be modified by the function set_data_path(..) in ./configs/config.py.
  • -ckpt_name: Loads the checkpoint with the name of the checkpoint under [LOG_ROOT]/RefVSR_CVPR2022/[mode]/checkpoint/train/epoch/ckpt/ (e.g., python run.py --mode RefVSR --data RealMCVSR--ckpt_name RefVSR_00100.pytorch).
  • -ckpt_abs_name. Loads the checkpoint of the absolute path (e.g., python run.py --mode RefVSR --data RealMCVSR --ckpt_abs_name ./ckpt/RefVSR.pytorch).
  • -ckpt_epoch: Loads the checkpoint of the specified epoch (e.g., python run.py --mode RefVSR --data RealMCVSR --ckpt_epoch 100).
  • -ckpt_sc: Loads the checkpoint with the best validation score (e.g., python run.py --mode RefVSR --data RealMCVSR -ckpt_sc).
  • -vid_name: evaluates only the specified video name (e.g., python run.py --mode RefVSR --data RealMCVSR -ckpt_sc -vid_name 0024 0074 0121).
  • -eval_mode: evaluation mode (e.g., python run.py --mode RefVSR --data RealMCVSR -ckpt_sc --eval_mode quan_qual): quan_qual | FOV | conf. Default: quan_qual.
  • -quantitative_only: compute quantitative measures (PSNR and SSIM) only. Valid only if -eval_mode is quan_qual (e.g., python run.py --mode RefVSR --data RealMCVSR -ckpt_sc -quantitative_only). Default: False.
  • -qualitative_only: save qualitative results. Valid only if -eval_mode is quan_qual or FOV (e.g., python run.py --mode RefVSR --data RealMCVSR -ckpt_sc -is_quan -qualitative_only). Default: False.

Training

# multi GPU (with DistributedDataParallel) example
CUDA_VISIBLE_DEVICES=0,1,2,3 python -B -m torch.distributed.launch --nproc_per_node=4 --master_port=9000 run.py \
            --is_train \
            --mode RefVSR_MFID \
            --config config_RefVSR_MFID \
            --data RealMCVSR \
            -b 1 \
            -th 8 \
            -dl \
            -ss \
            -dist

# resuming example 1 (trainer will load a checkpoint and state (*e.g.*, learning rate, parameters of an optimizer) saved after 100 epoch, training will resume from 101 epoch)
CUDA_VISIBLE_DEVICES=0,1,2,3 python -B -m torch.distributed.launch --nproc_per_node=4 --master_port=9000 run.py \
            ... \
            -th 8 \
            -r 100 \
            -ss \
            -dist

# resuming example 2 (trainer will load only a checkpoint given in absolute path. Need for fine-tuning a model for the adaptation stage)
CUDA_VISIBLE_DEVICES=0,1,2,3 python -B -m torch.distributed.launch --nproc_per_node=4 --master_port=9000 run.py \
            ... \
            -th 8 \
            -ra ./ckpt/RefVSR_MFID.pytorch \
            -ss \
            -dist

# single GPU (with DataParallel) example
CUDA_VISIBLE_DEVICES=0 python -B run.py \
            ... \
            -ss

# For PyTorch >= 1.10.x, (especially when running the small model using PyTorch AMP)
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=9000 run.py \
            ... \

Options

  • --is_train: If it is specified, run.py will train the network. Default: False
  • --mode: The name of a model to train. The logging folder named with the [mode] will be created as [LOG_ROOT]/RefVSR_CVPR2022/[mode]/. Default: RefVSR
  • --config: The name of a config file located as ./config/[config].py. Default: None, and the default should not be changed.
  • --trainer: The name of a trainer file located as ./models/trainers/[trainer].py. Default: ``
  • --network: The name of a network file located as ./models/archs/[network].py. Default: ``
  • -b, --batch_size: The batch size. For the multi GPUs (DistributedDataParallel), the total batch size will be, nproc_per_node * b. Default: 8
  • -th, --thread_num: The number of threads (num_workers) for the data loader. Default: 8
  • -dl, --delete_log: The option whether to delete logs under [mode] (i.e., [LOG_ROOT]/RefVSR_CVPR2022/[mode]/*). The option works only when --is_train is specified. Default: False
  • -r, --resume: Resume training with the checkpoint saved in specified epoch (e.g., -r 100). Note that -dl should not be specified with this option. Default: None
  • -ra, --resume_ab: Resume training with the checkpoint given with the absolute path (e.g., ./ckpt/RefVSR_MFID.pytorch). Note that -dl should not be specified with this option. Default: None
  • -ss, --save_sample: Save sample images for both training and testing. Images will be saved in [LOG_ROOT]/RefVSR_CVPR2022/[mode]/sample/. Default: False
  • -dist: Enables multi-processing with DistributedDataParallel. Default: False
  • --is_crop_valid: Crop frames of the validation set during the training phase. This is mainly due to the out-of-memory issue. Default: False
Clone this wiki locally