Hi @EasternJournalist ,
I have a question regarding the batch size setting.
When you mention a batch size of 128, does this refer to the total effective batch size, i.e.
batch_size_total = batch_size_forward × gradient_accumulation_steps × accelerator.num_processes
or does it mean that batch_size_forward itself is set to 128?
I’m asking because in my experiments, using batch_size_forward = 4 already consumes around 50 GB of GPU memory, so I want to make sure I’m interpreting the batch size correctly.
Looking forward to your clarification.
Thanks in advance!