Gradient handling in Workflows #7368
Replies: 4 comments
-
I would consider this quite a high priority since the current setup of Workflows basically does not allow the user to have a way to do gradient normalization and the clipping is forced through the usage of backward pass hooks. |
Beta Was this translation helpful? Give feedback.
-
Hi @danieltudosiu , Can you write a handler attaching to this event and do some logic? Thanks. |
Beta Was this translation helpful? Give feedback.
-
Hi @Nic-Ma, This solution would require one handler for each of the engines MONAI has. Given the specialized nature, wouldn't it be better fitted as a class method of the Trainer's child? I already submitted a pull request draft #1967 that is a WIP and has only the SupervisedTrainer logic. |
Beta Was this translation helpful? Give feedback.
-
Hi @wyli , Could you please help share some comments here? I think maybe this is a research-specific feature request that I don't totally understand why not writing a handler. Thanks in advance. |
Beta Was this translation helpful? Give feedback.
-
Is your feature request related to a problem? Please describe.
In some cases, gradient clipping or normalization is needed to stabilize the training of networks.
Describe the solution you'd like
Allow the option to do gradient clipping or normalization via an argument at the construction of Workflows.
Describe alternatives you've considered
Registering a hook for each model parameter to handle the gradient clipping, but is dirtier and it is not the main way PyTorch handles it. This would also invalidate the usage of gradient normalization since the PyTorch implementation is an inplace transformation, and the non-inplace gradient clipping will be deprecated. Furthermore, we need to handle the AMP to unscale the gradients before normalizing them, as per https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping.
Beta Was this translation helpful? Give feedback.
All reactions