PPO training CUDA out of memory #52

14H034160212 · 2023-08-31T01:46:59Z

你好，
我现在在训练PPO的时候出现了CUDA out of memory的问题，我是用了8个A100 GPUs，每一个GPU有80GB显存。下面是我运行的命令。我是用的stanford-alpaca提供的代码用8个A100全参微调训练了llama2-13B的sft model，reward模型是用的LLM-tuning项目提供的训练reward的代码基于llama2-13B训练的。现在就是在运行下面的ppo的时候出现了爆显存的问题，请问有什么办法可以降低显存吗？谢谢

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python rl_training.py \
    --base_model_name /data/qbao775/Explanation-Generation/llama-2/llama-2-13B \
    --merged_sft_model_path /data/qbao775/Explanation-Generation/llama_2_13B_merged_all_generator_avg_3_lenexp_10 \
    --sft_model_lora_path /data/qbao775/Explanation-Generation/llama_2_13B_merged_all_generator_avg_3_lenexp_10 \
    --reward_model_lora_path ../weights/llama-2-13B_beyond_reward_chinese_5000_peft_last_checkpoint \
    --adafactor False \
    --save_freq 10 \
    --output_max_length 64 \
    --batch_size 1 \
    --gradient_accumulation_steps 1 \
    --batched_gen True \
    --ppo_epochs 4 \
    --seed 0 \
    --learning_rate 1e-5 \
    --early_stopping True \
    --output_dir weights/llama-13_rlhf_beyond_test_6 \
    --log_with wandb

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PPO training CUDA out of memory #52

PPO training CUDA out of memory #52

14H034160212 commented Aug 31, 2023 •

edited

Loading

PPO training CUDA out of memory #52

PPO training CUDA out of memory #52

Comments

14H034160212 commented Aug 31, 2023 • edited Loading

14H034160212 commented Aug 31, 2023 •

edited

Loading