Skip to content

naga-karthik/ddp-resnet-cifar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Training Memory-Intensive Deep Learning Models with PyTorch’s Distributed Data Parallel

This is a mini-repository for running a ResNet101 model on CIFAR10 dataset using distributed training. Link to the main article can be found here.

Getting Started

Prerequisites

  1. Linux (only tested on Linux)
  2. PyTorch
  3. NVIDIA GPU and CuDNN

Installation

  1. Clone this repository:

    git clone https://github.com/naga-karthik/ddp-resnet-cifar
    cd ddp-resnet-cifar
  2. Download the necessary packages:

    pip install requirements.txt
  3. If you will be running it on a remote server, then it is probably better to pre-download the dataset than actually doing it on-the-fly.

    • CIFAR10 Dataset

    • Create a folder named "data" and move the downloaded dataset into the folder.

Running the model

From the terminal use the following commands to run the model.

  1. With default settings:
    python mainCIFAR10.py
  2. With other options:
    python mainCIFAR10.py --n_epochs=100 --lr=0.001 --batch_size=32

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages