[cifar ds training]: Set cuda device during initialization of distrib…

…uted backend. (#931) * Set cuda device during initialization of distributed backend. The commit is needed to avoid GPU 0 being set as the compute stream via torch.cuda.current_stream() during initialization across all GPUs. Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com> * Use device-agnostic accelerator API. Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com> --------- Signed-off-by: Jagadish Krishnamoorthy <jagadish.krishnamoorthy@amd.com>
microsoft · Oct 29, 2024 · 130fb58 · 130fb58
1 parent f73a6ed
commit 130fb58
Showing 1 changed file with 3 additions and 0 deletions.
diff --git a/training/cifar/cifar10_deepspeed.py b/training/cifar/cifar10_deepspeed.py
@@ -1,4 +1,5 @@
 import argparse
+import os
 
 import deepspeed
 import torch
@@ -279,6 +280,8 @@ def test(model_engine, testset, local_device, target_dtype, test_batch_size=4):
 def main(args):
     # Initialize DeepSpeed distributed backend.
     deepspeed.init_distributed()
+    _local_rank = int(os.environ.get("LOCAL_RANK"))
+    get_accelerator().set_device(_local_rank)
 
     ########################################################################
     # Step1. Data Preparation.