"requires_grad" attribute on intermediate TensorProxies is unused and misleading #1570

IvanYashchuk · 2024-12-18T12:20:28Z

Originally posted by @IvanYashchuk in #1563 (comment)

requires_grad of intermediate TensorProxies is ignored in our automatic differentiation code because we haven't done the work of properly threading this property through all computations.
We should remove the ability to query .requires_grad from intermediate TensorProxies completely to avoid similar bugs in the future. This can be achieved by introducing a separate "InputTensorProxy" which has this attribute and removing it from the regular TensorProxy.

cc @Borda @apaz-cli

The text was updated successfully, but these errors were encountered:

mruberry · 2024-12-18T16:13:02Z

Do you want to remove it and then restore it when it's correct, or just remove it?

Being able to understand if an intermediate tensor requires a gradient computation sounds interesting. If we remove this would we rely exclusively on transformations inferring whether intermediates require gradient or not? That seems workable.

Follow-up question, how do we (or would we) support stopping gradient propagation? I guess we'd create an operator that acted as a gradient "sink" or "cut point"?

IvanYashchuk · 2024-12-18T17:21:30Z

Remove it and restore it when it's correct. Today requires_grad is checked and used only for inputs of computation traces and ignored otherwise. Currently, the implemented logic is that every intermediate requires a gradient. The result of intermediate gradient calculations may be DCE'd if it remains unused and is not an output of the backward trace. The output of the backward trace is set to use None for every input tensor which doesn't require a gradient.

PyTorch has at least two ways of stopping gradient propagation:

setting requires_grad=False on an intermediate tensor
using .detach() on an intermediate tensor

We should support tracing both and a special operator to act as a gradient sink could be "detach". It should be possible to coerce the tensor.requires_grad assignment to False to detach. I don't think setting tensor.requires_grad=True does anything useful except for undoing the False assignment.

t-vi · 2024-12-18T19:12:12Z

We're not there yet, but there may be cases where we want to create new autograd leaves when we look beyond tracing models.

IvanYashchuk added autograd developer efficiency labels Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"requires_grad" attribute on intermediate TensorProxies is unused and misleading #1570

"requires_grad" attribute on intermediate TensorProxies is unused and misleading #1570

IvanYashchuk commented Dec 18, 2024 •

edited by github-actions bot

Loading

mruberry commented Dec 18, 2024

IvanYashchuk commented Dec 18, 2024

t-vi commented Dec 18, 2024

"requires_grad" attribute on intermediate TensorProxies is unused and misleading #1570

"requires_grad" attribute on intermediate TensorProxies is unused and misleading #1570

Comments

IvanYashchuk commented Dec 18, 2024 • edited by github-actions bot Loading

mruberry commented Dec 18, 2024

IvanYashchuk commented Dec 18, 2024

t-vi commented Dec 18, 2024

IvanYashchuk commented Dec 18, 2024 •

edited by github-actions bot

Loading