-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for LLaMA-2 70B with Grouped-Query Attention #91
Comments
I got the same error for LLaMa2 70B |
@kaiwang13 Could you please share how you resolved the issue? |
Just uninstall the old version and install the latest one from source code. |
I just cloned the repo and installed it from the main branch. But I'm still facing the error. Do I need to install it from any specific branch? |
|
This resolved the issue of the shape error. But now it seems offloading to CPU memory and the process get killed because of CPU overloading. I have 450 GB of CPU memory and 4*A100 80 GB. I'm using the LOMO optimizer. Is this expected for LOMO? |
What I did cannot solve the problem. The pretrained state dict cannot be loaded for training without thos code. |
okay, so is there any suggestion to solve the problem? |
Do you mean that the error occurs when loading pretrained state dict? Could you please show the error log? |
I was testing this with the main branch. While loading the state dict, the CPU is taking around 550GB. Yesterday I tried that in a larger instance with 900GB CPU and it got some shape error during the train start as @kaiwang13 mentioned. I don't have access to that machine right now to share the log. However, yesterday I tested the dev branch as well. In the dev branch, the CPU was taking only around 150GB. But there, I was getting OOM while saving the checkpoint after the first epoch. You can check issue #98 about this. Let me know if this info is enough for you to proceed further |
Thanks for your information! The latest llama-v2 is not updated to main branch yet, so it is normal to raise errors in main branch. |
Yes, I was using zero3 and LOMO. And I was getting GPU OOM while saving |
Thanks a lot! We will try to fix it |
@dittops @x54-729 Additionally, I tried training llama1-33b with a sequence length of 2048 and a batch size of 1 using AdamW with zero3 on 8xA100 80G. The training process went fine, but I encountered OOM when attempting to save the model. |
We've found that the OOM problem is due to the parameter gathering process with DeepSpeed's API. And we plan to fix it by gathering parameters one by one. |
@x54-729 Please let me know if you have pushed any updates on this to the dev branch. I can try it out. |
I have tested the code. I was able to train and save the model. I was testing by training a small dataset that contains the identity(English) of the model. But on inference, the model started generating Chinese instead of English while generating identity-related text. I was using 70B + LOMO + Stage 3 + transformer-4.32.1 I have tried encoding and decoding the training data with the tokenizer and that looks fine. Any thoughts on what could be the issue here? |
Due to the Grouped-Query Attention introduced in LLaMA-2 70B,llama issue,it cannot be loaded into the collie implementation of LLaMA. Hope LLaMA-2 70B can be support in collie. Thanks
The text was updated successfully, but these errors were encountered: