Name		Name	Last commit message	Last commit date
parent directory ..
DDP		DDP
DeepSpeed		DeepSpeed
Horovod		Horovod
README.md		README.md

README.md

Distributed Deep Learning

Led by Huihuo Zheng, Corey Adams, and Zhen Xie from ALCF

This section of the workshop will introduce to you the methods we use to run distributed deep learning training on ALCF resources like Theta and ThetaGPU.

We show distributed training using three frameworks:

Horovod (for TensorFlow and PyTorch), and
DistributedDataParallel (DDP) (for PyTorch only).
DeepSpeed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distributedDeepLearning

distributedDeepLearning

README.md

Distributed Deep Learning

Files

distributedDeepLearning

Directory actions

More options

Directory actions

More options

Latest commit

History

distributedDeepLearning

Folders and files

parent directory

README.md

Distributed Deep Learning