feat: online load/save huggingface model weights for Megatron-FSDP by conver334 · Pull Request #80 · ISEEKYAN/mbridge

conver334 · 2026-02-12T05:13:50Z

What does this PR do ?

Adds support for online loading and saving of Hugging Face model weights for Megatron-FSDP with expert parallelism (EP).

Tensor parallelism (TP) is not supported at this time.

API

see detailed usage in file: example/5.mfsdp_load_and_export_multiple_gpus.py

# Init random Megatron-FSDP model
bridge = AutoBridge.from_pretrained(hf_model_path)
ddp_config = {
    "use_distributed_optimizer": True,
    "check_for_nan_in_grad": True,
    "use_megatron_fsdp": True,
    "data_parallel_sharding_strategy": "optim_grads_params",
}
model = bridge.get_model(wrap_with_ddp=True, use_megatron_fsdp=True, ddp_config=ddp_config,data_parallel_random_init=False, post_model_creation_callbacks=[])
 
# load HF weights
bridge.load_weights(model, hf_model_path, memory_efficient=True)

# export HF weights
for k, v in bridge.export_weights(model): 
    pass

Test

Successfully tested with Qwen2 and DeepSeekV3 architectures.

torchrun --nproc_per_node=8 5.mfsdp_load_and_export_multiple_gpus.py --model_path moonshotai/Moonlight-16B-A3B --ep 4 --trust_remote_code

@ISEEKYAN

Signed-off-by: conver334 <conver334@gmail.com>

support mfsdp

88e2ec1

Signed-off-by: conver334 <conver334@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: online load/save huggingface model weights for Megatron-FSDP#80

feat: online load/save huggingface model weights for Megatron-FSDP#80
conver334 wants to merge 1 commit intoISEEKYAN:mainfrom
conver334:hf_mfsdp

conver334 commented Feb 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

conver334 commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

API

Test

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

conver334 commented Feb 12, 2026 •

edited

Loading