[Qwen3] StateDictAdapter support for MoE model #1766

shuhuayu · 2025-09-27T06:01:39Z

Reused StateDictAdapter support for DeepSeek V3 model to implement Qwen3 StateDictAdapter. Updated a checkpoint loading api to support distributed huggingface checkpoint loading when unpicklable objects exist.

torchtitan/experiments/qwen3/model/state_dict_adapter.py

torchtitan/components/checkpoint.py

torchtitan/models/deepseek_v3/model/quantization.py

…n Qwen3 hf models

torchtitan/protocols/state_dict_adapter.py

wwwjn · 2025-09-27T21:20:16Z

Thanks for working on this again! Can you attach a screenshot of your local run after loading HF weights?

… general protocal, and format files

shuhuayu · 2025-09-28T07:14:51Z

Thanks for working on this again! Can you attach a screenshot of your local run after loading HF weights?

Sure, here is a local run after loading weights from huggingface Qwen/Qwen3-30B-A3B-Instruct-2507 model:

torchtitan/protocols/state_dict_adapter.py

torchtitan/components/checkpoint.py

torchtitan/models/utils.py

torchtitan/models/deepseek_v3/model/state_dict_adapter.py

torchtitan/models/utils.py

vwxyzjn · 2025-10-01T16:44:05Z

@shuhuayu nice PR! The loss and grad norms looks a bit high though -- any idea why?

Another good way to validate the implementation is to run inference to check if you can get the same output tokens as the HF implementaiton.

…ubclass

shuhuayu · 2025-10-01T23:32:25Z

@shuhuayu nice PR! The loss and grad norms looks a bit high though -- any idea why?

Another good way to validate the implementation is to run inference to check if you can get the same output tokens as the HF implementaiton.

@vwxyzjn Thanks for the suggestion! I guess the large losses and gradient norms may be due to the suboptimal training configs we set for debug. We have verified the KL Divergence between a huggingface model and a converted torchtitan model on Qwen3 30B-A3B.

tianyu-l

LGTM
In the future, we should consider adding unit tests for MoEStateDictAdapter.

torchtitan/experiments/qwen3/model/state_dict_adapter.py

torchtitan/models/utils.py

torchtitan/experiments/qwen3/model/model.py

torchtitan/models/deepseek_v3/model/quantization.py

shuhuayu and others added 3 commits September 25, 2025 17:16

fix qwen3 30B-A3B model config

97c3127

Merge branch 'pytorch:main' into modeldev

a8f7710

[Qwen3] StateDictAdapter support for MoE model

20cae7e

shuhuayu requested review from tianyu-l, fegin, wwwjn and wconstab as code owners September 27, 2025 06:01

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 27, 2025

shuhuayu marked this pull request as draft September 27, 2025 06:02

wwwjn reviewed Sep 27, 2025

View reviewed changes

torchtitan/experiments/qwen3/model/state_dict_adapter.py Outdated Show resolved Hide resolved

torchtitan/components/checkpoint.py Outdated Show resolved Hide resolved

torchtitan/models/deepseek_v3/model/quantization.py Show resolved Hide resolved

[Qwen3] Revert changes on checkpoint.py and remove (de)quantization o…

e369431

…n Qwen3 hf models

wwwjn reviewed Sep 27, 2025

View reviewed changes

torchtitan/protocols/state_dict_adapter.py Outdated Show resolved Hide resolved

Move quantization methods back to deepseek v3 state_dict_adapter from…

b7763d9

… general protocal, and format files

shuhuayu marked this pull request as ready for review September 28, 2025 07:37

shuhuayu marked this pull request as draft September 28, 2025 22:58

tianyu-l requested changes Sep 29, 2025

View reviewed changes

torchtitan/protocols/state_dict_adapter.py Outdated Show resolved Hide resolved

shuhuayu force-pushed the modeldev branch from 3406f85 to 414f83e Compare September 29, 2025 21:20

[Qwen3] Shared state_dict_adapter class for MoE models

b5ef6fd

shuhuayu force-pushed the modeldev branch from 414f83e to b5ef6fd Compare September 29, 2025 21:25

tianyu-l reviewed Sep 29, 2025

View reviewed changes

torchtitan/components/checkpoint.py Outdated Show resolved Hide resolved

torchtitan/models/utils.py Show resolved Hide resolved

torchtitan/models/deepseek_v3/model/state_dict_adapter.py Outdated Show resolved Hide resolved

torchtitan/models/utils.py Show resolved Hide resolved

[Qwen3] Simplify model inheritance and document MoEStateDictAdapter s…

9f54f38

…ubclass

shuhuayu force-pushed the modeldev branch from ccc0389 to 098c89e Compare October 1, 2025 22:48

shuhuayu marked this pull request as ready for review October 1, 2025 23:23

tianyu-l approved these changes Oct 2, 2025

View reviewed changes

fegin reviewed Oct 2, 2025

View reviewed changes

torchtitan/experiments/qwen3/model/state_dict_adapter.py Outdated Show resolved Hide resolved

torchtitan/models/utils.py Outdated Show resolved Hide resolved

torchtitan/models/utils.py Outdated Show resolved Hide resolved

wwwjn reviewed Oct 2, 2025

View reviewed changes

torchtitan/experiments/qwen3/model/model.py Outdated Show resolved Hide resolved

torchtitan/models/deepseek_v3/model/quantization.py Outdated Show resolved Hide resolved

shuhuayu force-pushed the modeldev branch from 098c89e to 92b89fe Compare October 2, 2025 19:55

[Qwen3] Merge remote-tracking branch 'upstream/main' into modeldev

128e7c7

shuhuayu force-pushed the modeldev branch from 92b89fe to 128e7c7 Compare October 3, 2025 23:09

shuhuayu merged commit 4409c13 into pytorch:main Oct 3, 2025
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Qwen3] StateDictAdapter support for MoE model #1766

[Qwen3] StateDictAdapter support for MoE model #1766

Uh oh!

shuhuayu commented Sep 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wwwjn commented Sep 27, 2025

Uh oh!

shuhuayu commented Sep 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vwxyzjn commented Oct 1, 2025

Uh oh!

shuhuayu commented Oct 1, 2025 •

edited

Loading

Uh oh!

tianyu-l left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Qwen3] StateDictAdapter support for MoE model #1766

[Qwen3] StateDictAdapter support for MoE model #1766

Uh oh!

Conversation

shuhuayu commented Sep 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wwwjn commented Sep 27, 2025

Uh oh!

shuhuayu commented Sep 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vwxyzjn commented Oct 1, 2025

Uh oh!

shuhuayu commented Oct 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shuhuayu commented Oct 1, 2025 •

edited

Loading