Skip to content

Conversation

@LiuXTao
Copy link
Contributor

@LiuXTao LiuXTao commented Feb 10, 2026

Originally, the _save_weight_fast function saved each small weight as an individual file. When the number of weights is large, this results in a concentrated burst of creating and deleting a large number of new files in a short period. This not only may put pressure on the distributed file system but is also relatively inefficient. Therefore, I have added a new logic for saving batch files here.

I have verified the correctness, and testing before and after the modifications showed that the save_weights time for an 80B MoE model on 16 GPUs was reduced from 250s to 190s, a decrease of 24%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant