Skip to content

About batch_size_forward and gradient_accumulation_steps param in papers #144

@ZhouJankin

Description

@ZhouJankin

Hi @EasternJournalist ,

I have a question regarding the batch size setting.
When you mention a batch size of 128, does this refer to the total effective batch size, i.e.

batch_size_total = batch_size_forward × gradient_accumulation_steps × accelerator.num_processes

or does it mean that batch_size_forward itself is set to 128?

I’m asking because in my experiments, using batch_size_forward = 4 already consumes around 50 GB of GPU memory, so I want to make sure I’m interpreting the batch size correctly.

Looking forward to your clarification.
Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions