Skip to content

wraps DDP models with DSD#857

Open
LucasLLC wants to merge 1 commit intomasterfrom
export-D59234083
Open

wraps DDP models with DSD#857
LucasLLC wants to merge 1 commit intomasterfrom
export-D59234083

Conversation

@LucasLLC
Copy link
Contributor

@LucasLLC LucasLLC commented Jul 2, 2024

Summary:
Distributed State Dict is the current suggested way from PyTorch for ensuring parallelized models state dicts are compatible with save/loads in Single process or re-sharding scenarios.

This diff updates dcp_saver to use DSD for DDP models. A good idea would be wrap all models in TNT with DSD, as this could replace some of the wrapper logic for FSDP and would guarantee future compat.

N5551629 also contains a workaround for current DDP model saved before this diff, by manually removing the "module." prefix in the checkpoint.

Differential Revision: D59234083

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59234083

Summary:
Pull Request resolved: #857

Distributed State Dict is the current suggested way from PyTorch for ensuring parallelized models state dicts are compatible with save/loads in Single process or re-sharding scenarios.

This diff updates dcp_saver to use DSD for DDP models. A good idea would be wrap all models in TNT with DSD, as this could replace some of the wrapper logic for FSDP and would guarantee future compat.

N5551629 also contains a workaround for current DDP model saved before this diff, by manually removing the "module." prefix in the checkpoint.

Differential Revision: D59234083
@LucasLLC LucasLLC force-pushed the export-D59234083 branch from 5818bb8 to 435b1cb Compare July 8, 2024 15:32
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D59234083

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants