You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
when loading the fine-tuned smaller model , an error happens:Trying to set a tensor of shape torch.Size([311164928]) in "weight" (which has shape torch.Size([151936, 2048]))
#1
您好!我是哈尔滨工业大学的一名学生,最近正尝试复刻您关于CoGenesis的工作。我遇到了一些棘手的麻烦,希望得到您的帮助。
问题如下:在“基于草稿的方法”下,加载微调过的小模型时出现了如下的报错:**ValueError: Trying to set a tensor of shape torch.Size([311164928]) in "weight" (which has shape torch.Size([151936, 2048])), this looks incorrect.**这个问题看起来和模型的保存有关系,我尝试了以下方法,均没有奏效:删除并重新训练模型、使用transformers库中提供的保存模型的函数trainer.save_model而不是您写的trainer_save_model_safe函数(尽管原理相似)、仅使用主线程保存模型(会导致运行时间过长而产生nccl超时错误)。
这个问题困扰我一段时间了,希望您能拨冗解答一下,我将不胜感激!
相关环境如下:
GPU: a100-pcie-40gb *2
SLM: Qwen1.5-1.8B Chat
LLM: Qwen1.5-72B
Package Version
I am a student from Harbin Institute of Technology and I am currently attempting to replicate your work on CoGenesis. I have encountered some tricky issues and I hope to receive your assistance.
Here is the problem: Under the "draft-based method," when loading the fine-tuned smaller model, I encountered the following error: ValueError: Trying to set a tensor of shape torch.Size([311164928]) in "weight" (which has shape torch.Size([151936, 2048])), this looks incorrect. This issue seems to be related to the saving of the model. I have tried the following methods without success: deleting and retraining the model, using the model saving function trainer.save_model provided by the transformers library instead of your trainer_save_model_safe function (although the principle is similar), and saving the model using only the main thread (which leads to a long running time and results in nccl timeout errors).
This problem has been bothering me for a while, and I hope you can spare some time to answer it. I would be very grateful!
Here is my environment setup:
GPU: a100-pcie-40gb *2
SLM: Qwen1.5-1.8B Chat
LLM: Qwen1.5-72B
Package Version
Hello! This issue may stem from compatibility issues with the environment or library versions. For this project, we used the full training code from FastChat. However, once you have created sketch-based data from LLMs, you can proceed with any code for supervised fine-tuning, such as trl or other utilities available within the Transformers library.
Hello! This issue may stem from compatibility issues with the environment or library versions. For this project, we used the full training code from FastChat. However, once you have created sketch-based data from LLMs, you can proceed with any code for supervised fine-tuning, such as trl or other utilities available within the Transformers library.
Is it possible to provide detailed version information for every package? Thanks a lot!
您好!我是哈尔滨工业大学的一名学生,最近正尝试复刻您关于CoGenesis的工作。我遇到了一些棘手的麻烦,希望得到您的帮助。
问题如下:在“基于草稿的方法”下,加载微调过的小模型时出现了如下的报错:**ValueError: Trying to set a tensor of shape torch.Size([311164928]) in "weight" (which has shape torch.Size([151936, 2048])), this looks incorrect.**这个问题看起来和模型的保存有关系,我尝试了以下方法,均没有奏效:删除并重新训练模型、使用transformers库中提供的保存模型的函数trainer.save_model而不是您写的trainer_save_model_safe函数(尽管原理相似)、仅使用主线程保存模型(会导致运行时间过长而产生nccl超时错误)。
这个问题困扰我一段时间了,希望您能拨冗解答一下,我将不胜感激!
相关环境如下:
GPU: a100-pcie-40gb *2
SLM: Qwen1.5-1.8B Chat
LLM: Qwen1.5-72B
Package Version
transformers 4.45.1
vllm 0.6.2
tqdm 4.66.5
colorama 0.4.4
srsly 2.4.8
fire 0.6.0
langchain 0.3.1
langchain-openai 0.2.1
orjson 3.10.7
uvicorn 0.30.6
fastapi 0.115.0
torch 2.1.0+cu121
numpy 1.26.4
requests 2.32.3
Hello!
I am a student from Harbin Institute of Technology and I am currently attempting to replicate your work on CoGenesis. I have encountered some tricky issues and I hope to receive your assistance.
Here is the problem: Under the "draft-based method," when loading the fine-tuned smaller model, I encountered the following error: ValueError: Trying to set a tensor of shape torch.Size([311164928]) in "weight" (which has shape torch.Size([151936, 2048])), this looks incorrect. This issue seems to be related to the saving of the model. I have tried the following methods without success: deleting and retraining the model, using the model saving function trainer.save_model provided by the transformers library instead of your trainer_save_model_safe function (although the principle is similar), and saving the model using only the main thread (which leads to a long running time and results in nccl timeout errors).
This problem has been bothering me for a while, and I hope you can spare some time to answer it. I would be very grateful!
Here is my environment setup:
GPU: a100-pcie-40gb *2
SLM: Qwen1.5-1.8B Chat
LLM: Qwen1.5-72B
Package Version
transformers 4.45.1
vllm 0.6.2
tqdm 4.66.5
colorama 0.4.4
srsly 2.4.8
fire 0.6.0
langchain 0.3.1
langchain-openai 0.2.1
orjson 3.10.7
uvicorn 0.30.6
fastapi 0.115.0
torch 2.1.0+cu121
numpy 1.26.4
requests 2.32.3
The text was updated successfully, but these errors were encountered: