Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"requires_grad" attribute on intermediate TensorProxies is unused and misleading #1570

Open
IvanYashchuk opened this issue Dec 18, 2024 · 3 comments

Comments

@IvanYashchuk
Copy link
Collaborator

IvanYashchuk commented Dec 18, 2024

Originally posted by @IvanYashchuk in #1563 (comment)

requires_grad of intermediate TensorProxies is ignored in our automatic differentiation code because we haven't done the work of properly threading this property through all computations.
We should remove the ability to query .requires_grad from intermediate TensorProxies completely to avoid similar bugs in the future. This can be achieved by introducing a separate "InputTensorProxy" which has this attribute and removing it from the regular TensorProxy.

cc @Borda @apaz-cli

@mruberry
Copy link
Collaborator

Do you want to remove it and then restore it when it's correct, or just remove it?

Being able to understand if an intermediate tensor requires a gradient computation sounds interesting. If we remove this would we rely exclusively on transformations inferring whether intermediates require gradient or not? That seems workable.

Follow-up question, how do we (or would we) support stopping gradient propagation? I guess we'd create an operator that acted as a gradient "sink" or "cut point"?

@IvanYashchuk
Copy link
Collaborator Author

Remove it and restore it when it's correct. Today requires_grad is checked and used only for inputs of computation traces and ignored otherwise. Currently, the implemented logic is that every intermediate requires a gradient. The result of intermediate gradient calculations may be DCE'd if it remains unused and is not an output of the backward trace. The output of the backward trace is set to use None for every input tensor which doesn't require a gradient.

PyTorch has at least two ways of stopping gradient propagation:

  1. setting requires_grad=False on an intermediate tensor
  2. using .detach() on an intermediate tensor

We should support tracing both and a special operator to act as a gradient sink could be "detach". It should be possible to coerce the tensor.requires_grad assignment to False to detach. I don't think setting tensor.requires_grad=True does anything useful except for undoing the False assignment.

@t-vi
Copy link
Collaborator

t-vi commented Dec 18, 2024

We're not there yet, but there may be cases where we want to create new autograd leaves when we look beyond tracing models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants