You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
requires_grad of intermediate TensorProxies is ignored in our automatic differentiation code because we haven't done the work of properly threading this property through all computations.
We should remove the ability to query .requires_grad from intermediate TensorProxies completely to avoid similar bugs in the future. This can be achieved by introducing a separate "InputTensorProxy" which has this attribute and removing it from the regular TensorProxy.
Do you want to remove it and then restore it when it's correct, or just remove it?
Being able to understand if an intermediate tensor requires a gradient computation sounds interesting. If we remove this would we rely exclusively on transformations inferring whether intermediates require gradient or not? That seems workable.
Follow-up question, how do we (or would we) support stopping gradient propagation? I guess we'd create an operator that acted as a gradient "sink" or "cut point"?
Remove it and restore it when it's correct. Today requires_grad is checked and used only for inputs of computation traces and ignored otherwise. Currently, the implemented logic is that every intermediate requires a gradient. The result of intermediate gradient calculations may be DCE'd if it remains unused and is not an output of the backward trace. The output of the backward trace is set to use None for every input tensor which doesn't require a gradient.
PyTorch has at least two ways of stopping gradient propagation:
setting requires_grad=False on an intermediate tensor
using .detach() on an intermediate tensor
We should support tracing both and a special operator to act as a gradient sink could be "detach". It should be possible to coerce the tensor.requires_grad assignment to False to detach. I don't think setting tensor.requires_grad=True does anything useful except for undoing the False assignment.
Originally posted by @IvanYashchuk in #1563 (comment)
requires_grad
of intermediate TensorProxies is ignored in our automatic differentiation code because we haven't done the work of properly threading this property through all computations.We should remove the ability to query
.requires_grad
from intermediate TensorProxies completely to avoid similar bugs in the future. This can be achieved by introducing a separate "InputTensorProxy" which has this attribute and removing it from the regular TensorProxy.cc @Borda @apaz-cli
The text was updated successfully, but these errors were encountered: