A minimum demo for PyTorch distributed extension functionality for collectives.
This repository contains miminal implementation of two different workflows for extending torch.distributed
collectives in C++.
-
Custom Backend Implementation (
custom_backend
) (new workflow, recommended) -
Custom Process Group Implementation (
custom_process_group
) (old workflow, deprecated)
See the READMEs in each folder for more details.
Why are there two different workflows?
- With the introduction of dispatchable collectives in PyTorch 2.0, pytorch distributed collectives allow routing to different backends based on the device type of the tensor arguments.
custom_process_group
was the old extension method andcustom_backend
is the new extension method.
What are the differences?
custom_backend
is the more flexible alternative as it allows users to route to respective backend based on device type. For exampleinit_process_group(backend="cpu:gloo,cuda:dummy", ...)
will dispatch collectives with cpu tensor arguments to gloo and cuda tensor arguments to dummy. On the other hand,custom_process_group
is more limited as it only allows users to route to a single backend.
Which one should I use?
- We recommend using the
custom_backend
implementation.