About batch_size_forward and gradient_accumulation_steps param in papers

Hi @EasternJournalist ,  

I have a question regarding the batch size setting.  
When you mention a batch size of 128, does this refer to the total effective batch size, i.e.  

batch_size_total = batch_size_forward × gradient_accumulation_steps × accelerator.num_processes  

or does it mean that batch_size_forward itself is set to 128?  

I’m asking because in my experiments, using batch_size_forward = 4 already consumes around 50 GB of GPU memory, so I want to make sure I’m interpreting the batch size correctly.  

Looking forward to your clarification.  
Thanks in advance!  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About batch_size_forward and gradient_accumulation_steps param in papers #144

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About batch_size_forward and gradient_accumulation_steps param in papers #144

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions