-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hybrid Parallel Plugin下TP显存比同配置下deepspeed要高??? #6161
Comments
Title: Is the TP memory under Hybrid Parallel Plugin higher than the deepspeed under the same configuration? ? ? Hello, may I ask why the TP memory under Hybrid Parallel Plugin is higher than the deepspeed under the same configuration? ? ? |
好像是显存碎片造成?而且很严重,请问下有对应的优化改进措施吗? |
It seems to be caused by memory fragmentation? And it’s very serious. Are there any corresponding optimization and improvement measures? |
Deepspeed zero-3是完全切分权重而TP并不完全切分(例如非Linear/Embedding层)。当Activation较小时这种情况有可能发生,请提供更详细的信息 |
Deepspeed zero-3 is a complete slicing of weights while TP is not fully slicing (e.g., non-Linear/Embedding layers). This may occur when Activation is small, please provide more detailed information |
您好,请问下为啥Hybrid Parallel Plugin下TP显存比同配置下deepspeed要高???
The text was updated successfully, but these errors were encountered: