Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid Parallel Plugin下TP显存比同配置下deepspeed要高??? #6161

Open
duomicoding opened this issue Dec 17, 2024 · 5 comments
Open

Comments

@duomicoding
Copy link

您好,请问下为啥Hybrid Parallel Plugin下TP显存比同配置下deepspeed要高???

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: Is the TP memory under Hybrid Parallel Plugin higher than the deepspeed under the same configuration? ? ?

Hello, may I ask why the TP memory under Hybrid Parallel Plugin is higher than the deepspeed under the same configuration? ? ?

@duomicoding
Copy link
Author

好像是显存碎片造成?而且很严重,请问下有对应的优化改进措施吗?

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


It seems to be caused by memory fragmentation? And it’s very serious. Are there any corresponding optimization and improvement measures?

@ver217
Copy link
Member

ver217 commented Feb 20, 2025

Deepspeed zero-3是完全切分权重而TP并不完全切分(例如非Linear/Embedding层)。当Activation较小时这种情况有可能发生,请提供更详细的信息

@Issues-translate-bot
Copy link

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Deepspeed zero-3 is a complete slicing of weights while TP is not fully slicing (e.g., non-Linear/Embedding layers). This may occur when Activation is small, please provide more detailed information

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants