Skip to content

Conversation

@mori360
Copy link
Contributor

@mori360 mori360 commented Feb 6, 2026

torchcomms CI fails(we fail to close comms because there's no comm in self.parallel_dims.comms) due to the change at #1660 that we remove

    def build_mesh(self) -> DeviceMesh:
        if self.ep > 1:
            return self._build_mesh_with_ep()
        else:
            return self._build_mesh_without_ep()

So that self._build_mesh_with_ep and self._build_mesh_without_ep in TorchCommsParallelDims is not called.
This PR combine them together and just call them under TorchCommsParallelDims.build_mesh, so that TorchCommsParallelDims is built successfully, so that we can finally close the comms.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 6, 2026
@mori360 mori360 marked this pull request as ready for review February 6, 2026 02:33
@mori360 mori360 requested review from fduwjj, fegin and tianyu-l February 6, 2026 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant