-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patching tensor proxy shape in trace #1260
Conversation
for more information, see https://pre-commit.ci
…ce' into patching_TensorProxyShape_in_trace
Note for myself: failing cases looks real, since I don't see them in other PRs. But strangely I'm not seeing the repro on my local non-cuda environment.... 😕 , even though the failing test does run on CPU as well. Let me try to grab a node with an actual GPU on it. |
errr. I don't know what's going on with that failing test. but looks like CI on main is green, so I must have done something else here... |
Still don't see what's wrong with this failing test and I can't repro it locally: https://dev.azure.com/Lightning-AI/lightning/_build/results?buildId=216738&view=logs&j=2840892e-91ab-5245-da62-77ec9923516a&t=444f4171-6797-5730-4229-41427ed3bdc9&l=15058 checking my luck again with CI.
|
🤯 Nope. CI failed. |
kkk. Looks like indeed a real issue here.
Strangely this doesn't repro locally with that test. That's a bit concerning. |
Now I know where the failure is coming from, I'll try to get a repro on this one and verify if it's caused in this PR (likely maybe?!) But is it really an issue introduced by this PR?! Though I'm not sure why others didn't hit this one. |
…ce' into patching_TensorProxyShape_in_trace
Note for myself: FAILED thunder/tests/test_ops.py::test_core_vs_torch_consistency_pow_torch_cuda_thunder.dtypes.int8 - RuntimeError: "reciprocal_cuda" not implemented for 'Char' Opinfo test for pow runs through some random inputs. specifically I think this one lightning-thunder/thunder/tests/opinfos.py Lines 1841 to 1843 in fceb64e
# Tests the inputs are a CPU scalar tensor and a CUDA tensor a = make_tensor((4, 4), device=device, dtype=dtype, requires_grad=requires_grad, **kwargs) b = make_tensor((), device="cpu", dtype=dtype, requires_grad=requires_grad, **kwargs) So for int8 type, pytorch implementation cannot handle where b == torch.tensor(-1) -> https://github.com/pytorch/pytorch/blob/fe44b6a67f32b562c88701b630e65b62ce1b63ba/aten/src/ATen/native/cuda/PowKernel.cu#L178 Somehow this is reliably failing on CI for me in my PR, but not on my local runs. Anyway, I'm trying to avoid this by |
for more information, see https://pre-commit.ci
…ce' into patching_TensorProxyShape_in_trace
@t-vi This PR is ready for review now. 🙇 cc'ing @tfogal @kevinstephano |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend that TomV (or Mike? both?) take a look at this before it goes in, but my naive view is that this seems good.
@@ -1116,7 +1116,7 @@ def forward(self, x): | |||
("cpu", "cuda"), | |||
) | |||
def test_cache_symbolic_values_reshape(device): | |||
if not torch.cuda.is_available(): | |||
if device == "cuda" and not torch.cuda.is_available(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we get anything out of running this test on multiple devices?
I am wondering if it makes more sense to just not parameterize the test and run it once on the CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good call!
At this point since nvfuser isn't taking shape operations at all, GPU test doesn't do anything. Let me clean it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jjsjann123 This is still pending?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, than you @jjsjann123 @tfogal
Working on #1253
Updating deferred
_numel
's deferred computation, so we can handle call without argument. (This is an existing technical debt, see Makenumel
a method of TensorProxy not a property attribute #925 for more context).This change fixes the error reported in symbolic values: error in infer_tensor_properties, through reshape or numel #1257, but let's keep that issue open since there likely will be other more errors coming from that repro script.
Updating
TensorProxy.shape
, so shape query within trace context is recorded withprims.shape
.This doesn't contradict Modeling of shape queries #1133. The explicit trace ensures correctness of the trace. I'll follow up with transformation to optimize shape queries.
There are occasions where we don't want to leave a
prims.shape
in the log while query shape of a TensorProxy. e.g. printing a trace. I'm working around those cases via explicitly accessingTensorProxy._shape
.add
prims.shape
support in python executor, sinceprims.shape
could show up in prologue trace, where torch executor isn't available.Updating dce pass to choose the first producer instead of the last one, when multiple producer exists for a single proxy.
TODO:
Need to update DCE pass to remove duplicated producer in a given scope. I intend to handle that in a separate PR. See comment for details.