A script for training the ConvNextV2 on CIFAR10 dataset using the FSDP technique for a distributed training scheme.
You can run the script using the torchrun
with the run.py
file, i.e.: torchrun --nnodes 1 --nproc_per_node 2 run.py
run.py
script arguments include:
--batch-size
--epochs
--lr
--gamma
--no-cuda
--seed
--run_validation
--save-model
Additional info for the arguments can be seen using the --help
argument.