-
Notifications
You must be signed in to change notification settings - Fork 175
Closed
Labels
community-requestexternalquestionFurther information is requestedFurther information is requestedx-atlassian
Description
GRPO appears to support megatron as the generation backend, but then will break later in the script when trying to call prepare_refit_info on a NoneType.
I am curious about whether it's possible to use a TRT Model Optimizer quantized model with the Fake Quant layers for training, for QAT GRPO. I assumed this would be simpler with the Megatron backend since no refit would be required (where I imagine this would be more complex for vLLM which wouldn't support those layers).
So my main question is: what is the state of using Megatron as the generation backend? Should this be supported? (If so, I think this doubles as a bug report).
Additional questions are:
- Is there any loose guidance or comparison on generation speed for Megatron vs vLLM?
- Does my intention regarding QAT and being easier with Megatron for generation hold water or am I misunderstanding something?
Metadata
Metadata
Assignees
Labels
community-requestexternalquestionFurther information is requestedFurther information is requestedx-atlassian