Demonstrates autograd integration with NVFuser multidevice #3787

syed-ahmed · 2025-01-29T08:03:48Z

This PR demonstrates how to wrap a forward and a backward fusion definition in a torch.autograd.Function that takes PyTorch DTensors as input and outputs PyTorch DTensors.

wujingyue · 2025-01-29T18:01:50Z

Cool -- add me to reviewers when it's ready!

syed-ahmed · 2025-01-29T18:18:59Z

@wujingyue To review.

syed-ahmed · 2025-01-29T18:19:39Z

Oops I can't add to the reviewers list.

wujingyue

LGTM otherwise

wujingyue · 2025-01-29T22:54:11Z

tests/python/test_dtensor.py

+    class FusionDefintionArguments:
+        def __init__(self, num_devices: int, batch: int, sequence: int, hidden: int):
+            self.d = num_devices
+            self.b = batch
+            self.s = sequence
+            self.e = hidden


from dataclasses import dataclass @dataclass class LinearConfig: d: int b: int s: int e: int

wujingyue · 2025-01-29T22:58:07Z

tests/python/test_dtensor.py

+            self.s = sequence
+            self.e = hidden
+
+    class LinearForwardDefinition(FusionDefintionArguments):


I feel using class and inheritance is an overkill. Functions and partials should be good enough.

def define_linear_forward(config: LinearConfig, fd: FusionDefinition) -> None:

and later

partial(define_linear_forward, config)

wujingyue · 2025-01-29T22:59:44Z

tests/python/test_dtensor.py

+        ):
+            b, s, e = input._local_tensor.shape
+            d = weight.device_mesh.size()
+            op = FusionDefinitionWrapper(LinearForwardDefinition(d, b, s, e))


Can you try to construct the op in __init__? Example: https://github.com/canqin001/PointDAN/blob/5001b38cb5506b1c6b40ad1329c1d6f4fbbdd26d/Model.py#L29. I'm worried about the overhead of constructing FusionDefinitionWrapper for each forward and backward call.

wujingyue · 2025-01-29T23:00:18Z

tests/python/test_dtensor.py

+            outputs = op([input, weight, grad_output])
+            return outputs[0], outputs[1], outputs[2]
+
+    world_size = dist.get_world_size()


Suggested change

world_size = dist.get_world_size()

d = dist.get_world_size()

wujingyue · 2025-01-29T23:03:27Z

tests/python/test_dtensor.py

+            op = FusionDefinitionWrapper(LinearBackwardDefinition(d, b, s, e))
+            input, weight = ctx.saved_tensors
+            outputs = op([input, weight, grad_output])
+            return outputs[0], outputs[1], outputs[2]


Suggested change

return outputs[0], outputs[1], outputs[2]

assert len(outputs) == 3

return outputs

wujingyue · 2025-01-29T23:06:38Z

tests/python/test_dtensor.py

+    )
+
+    torch.testing.assert_close(
+        out_dtensor.to_local(), expected_out_tensor, rtol=1.3e-6, atol=1e-3


Suggested change

out_dtensor.to_local(), expected_out_tensor, rtol=1.3e-6, atol=1e-3

expected_out_tensor, out_dtensor.to_local(), rtol=1.3e-6, atol=1e-3

to make the order consistent with other assert_closes.

wujingyue · 2025-01-29T23:08:09Z

tests/python/test_dtensor.py

+    )
+
+    torch.testing.assert_close(
+        out_dtensor.to_local(), expected_out_tensor, rtol=1.3e-6, atol=1e-3


Consider

def assert_close(expected_tensor, dtensor): torch.testing.assert_close(expected_tensor, dtensor.to_local(), rtol=1.3e-6, atol=1e-3) assert_close(...) assert_close(...) assert_close(...) assert_close(...)

syed-ahmed added 2 commits January 29, 2025 07:59

Initial commit

56c2719

lint

8935063

wujingyue self-requested a review January 29, 2025 22:52

wujingyue reviewed Jan 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demonstrates autograd integration with NVFuser multidevice #3787

Demonstrates autograd integration with NVFuser multidevice #3787

syed-ahmed commented Jan 29, 2025 •

edited

Loading

wujingyue commented Jan 29, 2025

syed-ahmed commented Jan 29, 2025

syed-ahmed commented Jan 29, 2025

wujingyue left a comment

wujingyue Jan 29, 2025

wujingyue Jan 29, 2025

wujingyue Jan 29, 2025

wujingyue Jan 29, 2025

wujingyue Jan 29, 2025

wujingyue Jan 29, 2025

wujingyue Jan 29, 2025

	return outputs[0], outputs[1], outputs[2]
	assert len(outputs) == 3
	return outputs

	out_dtensor.to_local(), expected_out_tensor, rtol=1.3e-6, atol=1e-3
	expected_out_tensor, out_dtensor.to_local(), rtol=1.3e-6, atol=1e-3

Demonstrates autograd integration with NVFuser multidevice #3787

Are you sure you want to change the base?

Demonstrates autograd integration with NVFuser multidevice #3787

Conversation

syed-ahmed commented Jan 29, 2025 • edited Loading

wujingyue commented Jan 29, 2025

syed-ahmed commented Jan 29, 2025

syed-ahmed commented Jan 29, 2025

wujingyue left a comment

Choose a reason for hiding this comment

wujingyue Jan 29, 2025

Choose a reason for hiding this comment

wujingyue Jan 29, 2025

Choose a reason for hiding this comment

wujingyue Jan 29, 2025

Choose a reason for hiding this comment

wujingyue Jan 29, 2025

Choose a reason for hiding this comment

wujingyue Jan 29, 2025

Choose a reason for hiding this comment

wujingyue Jan 29, 2025

Choose a reason for hiding this comment

wujingyue Jan 29, 2025

Choose a reason for hiding this comment

syed-ahmed commented Jan 29, 2025 •

edited

Loading