Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PPO training CUDA out of memory #52

Open
14H034160212 opened this issue Aug 31, 2023 · 0 comments
Open

PPO training CUDA out of memory #52

14H034160212 opened this issue Aug 31, 2023 · 0 comments

Comments

@14H034160212
Copy link

14H034160212 commented Aug 31, 2023

你好,
我现在在训练PPO的时候出现了CUDA out of memory的问题,我是用了8个A100 GPUs,每一个GPU有80GB显存。下面是我运行的命令。我是用的stanford-alpaca提供的代码用8个A100全参微调训练了llama2-13B的sft model,reward模型是用的LLM-tuning项目提供的训练reward的代码基于llama2-13B训练的。现在就是在运行下面的ppo的时候出现了爆显存的问题,请问有什么办法可以降低显存吗?谢谢

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python rl_training.py \
    --base_model_name /data/qbao775/Explanation-Generation/llama-2/llama-2-13B \
    --merged_sft_model_path /data/qbao775/Explanation-Generation/llama_2_13B_merged_all_generator_avg_3_lenexp_10 \
    --sft_model_lora_path /data/qbao775/Explanation-Generation/llama_2_13B_merged_all_generator_avg_3_lenexp_10 \
    --reward_model_lora_path ../weights/llama-2-13B_beyond_reward_chinese_5000_peft_last_checkpoint \
    --adafactor False \
    --save_freq 10 \
    --output_max_length 64 \
    --batch_size 1 \
    --gradient_accumulation_steps 1 \
    --batched_gen True \
    --ppo_epochs 4 \
    --seed 0 \
    --learning_rate 1e-5 \
    --early_stopping True \
    --output_dir weights/llama-13_rlhf_beyond_test_6 \
    --log_with wandb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant