You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You are right that the forward and backward passes are called in separate context managers.
It seems to me that both context managers would control the same flag: self.require_backward_grad_sync, where self here refers to the DDP module. (I found this from the code of the no_sync manager.)
Thus, I wonder if calling the forward and backward passes in separate managers might be okay? Please correct me if I missed something. Thanks!
Thanks for your quick response! I don't have exact root cause, but see other users also reported calling fwd and bwd separately in no_sync context still triggers grad_sync. https://discuss.pytorch.org/t/whats-no-sync-exactly-do-in-ddp/170259. I am not sure if this is still an issue today so would like to confirm here.
According to no_sync function description in https://github.com/pytorch/pytorch/blob/main/torch/nn/parallel/distributed.py#L1424
The current code does separate forward and backward pass in no_sync, therefore will still trigger gradient synchronization
The text was updated successfully, but these errors were encountered: