Skip to content

Conversation

shuhuayu
Copy link
Contributor

Reused StateDictAdapter support for DeepSeek V3 model to implement Qwen3 StateDictAdapter. Updated a checkpoint loading api to support distributed huggingface checkpoint loading when unpicklable objects exist.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 27, 2025
@shuhuayu shuhuayu marked this pull request as draft September 27, 2025 06:02
@wwwjn
Copy link
Contributor

wwwjn commented Sep 27, 2025

Thanks for working on this again! Can you attach a screenshot of your local run after loading HF weights?

@shuhuayu
Copy link
Contributor Author

Thanks for working on this again! Can you attach a screenshot of your local run after loading HF weights?

Sure, here is a local run after loading weights from huggingface Qwen/Qwen3-30B-A3B-Instruct-2507 model:

image

@shuhuayu shuhuayu marked this pull request as ready for review September 28, 2025 07:37
@shuhuayu shuhuayu marked this pull request as draft September 28, 2025 22:58
@vwxyzjn
Copy link

vwxyzjn commented Oct 1, 2025

@shuhuayu nice PR! The loss and grad norms looks a bit high though -- any idea why?

Another good way to validate the implementation is to run inference to check if you can get the same output tokens as the HF implementaiton.

@shuhuayu shuhuayu marked this pull request as ready for review October 1, 2025 23:23
@shuhuayu
Copy link
Contributor Author

shuhuayu commented Oct 1, 2025

@shuhuayu nice PR! The loss and grad norms looks a bit high though -- any idea why?

Another good way to validate the implementation is to run inference to check if you can get the same output tokens as the HF implementaiton.

@vwxyzjn Thanks for the suggestion! I guess the large losses and gradient norms may be due to the suboptimal training configs we set for debug. We have verified the KL Divergence between a huggingface model and a converted torchtitan model on Qwen3 30B-A3B.

test

Copy link
Contributor

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
In the future, we should consider adding unit tests for MoEStateDictAdapter.

@shuhuayu shuhuayu merged commit 4409c13 into pytorch:main Oct 3, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants