Megatron Inference support for GRPO (Dynamic batching)

GRPO appears to [support megatron as the generation backend](https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/algorithms/grpo.py#L325), but then will break later in the script when trying to call `prepare_refit_info` on a NoneType.

I am curious about whether it's possible to use a TRT Model Optimizer quantized model with the Fake Quant layers for training, for QAT GRPO. I assumed this would be simpler with the Megatron backend since no refit would be required (where I imagine this would be more complex for vLLM which wouldn't support those layers).

So my main question is: what is the state of using Megatron as the generation backend? Should this be supported? (If so, I think this doubles as a bug report).

Additional questions are:
* Is there any loose guidance or comparison on generation speed for Megatron vs vLLM?
* Does my intention regarding QAT and being easier with Megatron for generation hold water or am I misunderstanding something?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Megatron Inference support for GRPO (Dynamic batching) #1079

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Megatron Inference support for GRPO (Dynamic batching) #1079

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions