Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama2 70B 训练报错 #104

Open
xiaopqr opened this issue Aug 14, 2023 · 3 comments
Open

Llama2 70B 训练报错 #104

xiaopqr opened this issue Aug 14, 2023 · 3 comments

Comments

@xiaopqr
Copy link

xiaopqr commented Aug 14, 2023

使用最新 dev分支代码训练 llama2 70B ,存在以下问题:
│collie/collie/models/llama/model.py:203 in _forward │
│ │
│ 200 │ │ │ │ │ │ │ .permute(0, 2, 1, 4, 3) \ │
│ 201 │ │ │ │ │ │ │ .reshape(batch_size, self.num_key_value_heads, │
│ 202 │ │ │ │ │ │ │ │ │ seq_len + start_pos, -1) │
│ ❱ 203 │ │ │ new_layer_past = torch.stack((present_key, value.permute([0, 2, 1, 3])), dim │
│ 204 │ │ attention_mask = attention_mask if attention_mask is not None else torch.ones((q │
│ 205 │ │ if self.config.use_flash: │
│ 206 │ │ │ output = flash_attention(query, key, value, attention_mask)
RuntimeError: stack expects each tensor to be equal size, but got [1, 8, 2048, 1024] at entry 0 and [1, 64, 2048, 128] at entry 1

上面是一个问题,还有一个问题是 前几天的 dev分支代码, trainer.save_model,llama2 70B(8张V100, 可以训练)会出现显存 OOM,按道理能跑训练,不应该显存不够,最新dev代码可能还有这个问题,只是还没跑到就报错了
│ │
│ /opt/conda/lib/python3.9/site-packages/deepspeed/runtime/zero/partition_parameters.py:1553 in │
│ _allgather_params_coalesced │
│ │
│ 1550 │ │ allgather_params = [] │
│ 1551 │ │ for psize in partition_sizes: │
│ 1552 │ │ │ tensor_size = psize * self.num_partitions │
│ ❱ 1553 │ │ │ flat_tensor = torch.empty(tensor_size, dtype=param_list[0].dtype, device=sel │
│ 1554 │ │ │ flat_tensor.requires_grad = False │
│ 1555 │ │ │ allgather_params.append(flat_tensor) │
│ 1556 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB (GPU 7; 31.75 GiB total capacity; 29.60 GiB already allocated; 312.75 MiB free; 29.63 GiB reserved in total by PyTorch) If reserved memory
is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@00INDEX 方便看一下吗?

@xiaopqr
Copy link
Author

xiaopqr commented Aug 16, 2023

@KaiLv69 @QipengGuo 方便看一下吗?

@KaiLv69
Copy link
Collaborator

KaiLv69 commented Aug 21, 2023

你好,使用zero3时保存模型的bug正在解决中

@KaiLv69
Copy link
Collaborator

KaiLv69 commented Aug 23, 2023

你好,可以更新到最新的dev分支尝试一下。

FYI: 82869ee ac6eed4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants