Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: shape '[16, 2048, 32, 128]' is invalid for input of size 33554432 #26

Open
Bleking opened this issue May 29, 2024 · 0 comments

Comments

@Bleking
Copy link

Bleking commented May 29, 2024

데이터셋 문제는 덕분에 해결됐습니다. 감사합니다. 이후로도 조금씩 없는 데이터 문제가 뜨긴 했지만, 데이터허브에서 받은 파일에 있는거라 쉽게 해결했습니다.

그런데 현재는 "RuntimeError: shape '[16, 2048, 32, 128]' is invalid for input of size 33554432" 에러 때문에 또다시 학습이 막힌 상태입니다.
.view(bsz, q_len, self.num_heads, self.head_dim)로 결정되고, 'bsz'랑 'q_len'은 "finetune_lora.sh"의 per_device_train_batch_size, model_max_length가 결정한다는 것은 알고있습니다. 그래서 'bsz'를 16에서 4로 바꿔봤더니 "RuntimeError: shape '[4, 2048, 32, 128]' is invalid for input of size 8388608"이 뜨더라고요. 'q_len'을 바꿔봐도 소용이 없고요.

혹시 파인튜닝용 KoLLaVA-v1.5-Synatra-7B 모델의 global batch size의 크기가(128) 이 문제에 기여하는지 확인해봤는데, 그건 또 아닌것 같습니다. (참고로 이 128에 맞추기 위해 gradient_accumulation_steps값을 4로 바꿨습니다. 제 서버 환경의 GPU 개수가 2개여서요.)

역시나 이는 'self.num_heads', 'self.head_dim'와는 전혀 관계가 없어보이는데, 어떤 값을 수정해야 할까요?

다시 한번 감사드리며, 커맨드창의 에러 부분만 공유해드리겠습니다.

wandb: 🚀 View run at https://wandb.ai/jiwon_ha/huggingface/runs/d5rk2eng
0%| | 0/4543 [00:00<?, ?it/s]/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
Traceback (most recent call last):
File "/home/work/testdataset1/KoLLaVA/llava/train/train_xformers.py", line 13, in
train()
File "/home/work/testdataset1/KoLLaVA/llava/train/train.py", line 933, in train
trainer.train()
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
return inner_training_loop(
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/trainer.py", line 2654, in training_step
loss = self.compute_loss(model, inputs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in compute_loss
outputs = model(**inputs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1735, in forward
loss = self.module(*inputs, **kwargs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/peft/peft_model.py", line 922, in forward
return self.base_model(
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/work/testdataset1/KoLLaVA/llava/model/language_model/llava_llama.py", line 88, in forward
return super().forward(
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward
outputs = self.model(
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 685, in forward
layer_outputs = torch.utils.checkpoint.checkpoint(
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
return fn(*args, **kwargs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
return fn(*args, **kwargs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 262, in forward
outputs = run_function(*args)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward
return module(*inputs, output_attentions, None)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
[rank1]: Traceback (most recent call last):
[rank1]: File "/home/work/testdataset1/KoLLaVA/llava/train/train_xformers.py", line 13, in
[rank1]: train()
[rank1]: File "/home/work/testdataset1/KoLLaVA/llava/train/train.py", line 933, in train
[rank1]: trainer.train()
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
[rank1]: return inner_training_loop(
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
[rank1]: tr_loss_step = self.training_step(model, inputs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/trainer.py", line 2654, in training_step
[rank1]: loss = self.compute_loss(model, inputs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in compute_loss
[rank1]: outputs = model(**inputs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank1]: return forward_call(*args, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank1]: ret_val = func(*args, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1735, in forward
[rank1]: loss = self.module(*inputs, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/peft/peft_model.py", line 922, in forward
[rank1]: return self.base_model(
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/work/testdataset1/KoLLaVA/llava/model/language_model/llava_llama.py", line 88, in forward
[rank1]: return super().forward(
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward
[rank1]: outputs = self.model(
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 685, in forward
[rank1]: layer_outputs = torch.utils.checkpoint.checkpoint(
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
[rank1]: return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
[rank1]: return fn(*args, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
[rank1]: return fn(*args, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint
[rank1]: return CheckpointFunction.apply(function, preserve, *args)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
[rank1]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 262, in forward
[rank1]: outputs = run_function(*args)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward
[rank1]: return module(*inputs, output_attentions, None)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
[rank1]: hidden_states, self_attn_weights, present_key_value = self.self_attn(
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank1]: result = forward_call(*args, **kwargs)
[rank1]: File "/home/work/testdataset1/KoLLaVA/llava/train/llama_xformers_attn_monkey_patch.py", line 42, in xformers_forward
[rank1]: .view(bsz, q_len, self.num_heads, self.head_dim)
[rank1]: RuntimeError: shape '[16, 2048, 32, 128]' is invalid for input of size 33554432
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/work/testdataset1/KoLLaVA/llava/train/llama_xformers_attn_monkey_patch.py", line 42, in xformers_forward
.view(bsz, q_len, self.num_heads, self.head_dim)
RuntimeError: shape '[16, 2048, 32, 128]' is invalid for input of size 33554432
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/work/testdataset1/KoLLaVA/llava/train/train_xformers.py", line 13, in
[rank0]: train()
[rank0]: File "/home/work/testdataset1/KoLLaVA/llava/train/train.py", line 933, in train
[rank0]: trainer.train()
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/trainer.py", line 1539, in train
[rank0]: return inner_training_loop(
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/trainer.py", line 1809, in _inner_training_loop
[rank0]: tr_loss_step = self.training_step(model, inputs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/trainer.py", line 2654, in training_step
[rank0]: loss = self.compute_loss(model, inputs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/trainer.py", line 2679, in compute_loss
[rank0]: outputs = model(**inputs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
[rank0]: return forward_call(*args, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
[rank0]: ret_val = func(*args, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1735, in forward
[rank0]: loss = self.module(*inputs, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/peft/peft_model.py", line 922, in forward
[rank0]: return self.base_model(
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/home/work/testdataset1/KoLLaVA/llava/model/language_model/llava_llama.py", line 88, in forward
[rank0]: return super().forward(
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 806, in forward
[rank0]: outputs = self.model(
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 685, in forward
[rank0]: layer_outputs = torch.utils.checkpoint.checkpoint(
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
[rank0]: return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 36, in inner
[rank0]: return fn(*args, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 487, in checkpoint
[rank0]: return CheckpointFunction.apply(function, preserve, *args)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/autograd/function.py", line 598, in apply
[rank0]: return super().apply(*args, **kwargs) # type: ignore[misc]
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 262, in forward
[rank0]: outputs = run_function(*args)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 681, in custom_forward
[rank0]: return module(*inputs, output_attentions, None)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 408, in forward
[rank0]: hidden_states, self_attn_weights, present_key_value = self.self_attn(
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
[rank0]: return self._call_impl(*args, **kwargs)
[rank0]: File "/home/work/anaconda3/envs/kollava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1582, in _call_impl
[rank0]: result = forward_call(*args, **kwargs)
[rank0]: File "/home/work/testdataset1/KoLLaVA/llava/train/llama_xformers_attn_monkey_patch.py", line 42, in xformers_forward
[rank0]: .view(bsz, q_len, self.num_heads, self.head_dim)
[rank0]: RuntimeError: shape '[16, 2048, 32, 128]' is invalid for input of size 33554432

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant