Skip to content

Megatron Inference support for GRPO (Dynamic batching) #1079

@nathan-az

Description

@nathan-az

GRPO appears to support megatron as the generation backend, but then will break later in the script when trying to call prepare_refit_info on a NoneType.

I am curious about whether it's possible to use a TRT Model Optimizer quantized model with the Fake Quant layers for training, for QAT GRPO. I assumed this would be simpler with the Megatron backend since no refit would be required (where I imagine this would be more complex for vLLM which wouldn't support those layers).

So my main question is: what is the state of using Megatron as the generation backend? Should this be supported? (If so, I think this doubles as a bug report).

Additional questions are:

  • Is there any loose guidance or comparison on generation speed for Megatron vs vLLM?
  • Does my intention regarding QAT and being easier with Megatron for generation hold water or am I misunderstanding something?

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions