Skip to content

Conversation

@Lucaskabela
Copy link
Contributor

@Lucaskabela Lucaskabela commented Feb 10, 2026

Summary

We make a handful of QOL changes to make simple_rl_multiprocess.py work with recent vLLM. These are:

  1. Updating the .gitignore to ignore the converted/ and models/ directories that are added by these scripts

  2. pyrefly suppress to get precommit signal passing

  3. Update to README.md to add monarch requirement

  4. AttentionBackendEnum forwarded in locations needed (similar to [Experimental][rl][vllm compat] Update simple_rl example to work with vLLM nightly #2219)

  5. Wiring through enable_gqa support in vllm_compat attention - this requires reshaping when q.shape[1] != k.shape[1] and using an output tensor explicitly with the proper shape

With these changes, we are able to run the simple_rl_multiprocess script again

Test Plan

VLLM_BATCH_INVARIANT=1 VLLM_ATTENTION_BACKEND=FLASH_ATTN python3 torchtitan/experiments/rl/unified/simple_rl_multiprocess.py

Before (Main)

  File "/home/lucaskabela/.conda/envs/vllm/lib/python3.12/asyncio/base_events.py", line 691, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/lucaskabela/torchtitan/torchtitan/experiments/rl/unified/simple_rl_multiprocess.py", line 66, in main
    init_batch_invariance()
TypeError: init_batch_invariance() missing 1 required positional argument: 'attention_backend'

And after patching with the enum:

  File "/home/lucaskabela/pytorch/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lucaskabela/pytorch/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: VLLMCompatibleFlashAttention.forward() got an unexpected keyword argument 'enable_gqa'

After

Adding requests: 100%|█████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 2935.95it/s]
Processed prompts: 100%|███████████████| 40/40 [00:01<00:00, 27.61it/s, est. speed input: 193.29 toks/s, output: 552.25 toks/s]
[2026-02-09 16:11:59] INFO generator.py:446: [actor=<root>.<torchtitan.experiments.rl.unified.actors.generator.Generator generator{'gpus': 0/1}>] os.getpid()=300001 Generating finish generate (policy v0)...
[2026-02-09 16:11:59] INFO trainer.py:101: [actor=<root>.<torchtitan.experiments.rl.unified.actors.trainer.Trainer trainer{'gpus': 1/2}>] os.getpid()=300426 Trainer starts to train 0 on traj:
[2026-02-09 16:11:59] INFO trainer.py:101: [actor=<root>.<torchtitan.experiments.rl.unified.actors.trainer.Trainer trainer{'gpus': 0/2}>] os.getpid()=299553 Trainer starts to train 0 on traj:
NCCL version 2.28.9+cuda12.9
  ✓ vLLM-TorchTitan bitwise determinism verified: 20 tokens match exactly
  ✓ vLLM-TorchTitan bitwise determinism verified: 20 tokens match exactly

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 10, 2026
@Lucaskabela Lucaskabela marked this pull request as ready for review February 10, 2026 00:16
@Lucaskabela Lucaskabela changed the title [Bugfix] Fix torchtitan/experiments/rl/unified/simple_rl_multiprocess.py to be runnable with recent vLLM version [Bugfix] Fix simple_rl_multiprocess.py to be runnable with recent vLLM version Feb 10, 2026
# Output is (batch * seq_len, num_heads * head_dim), reshape to (batch, seq_len, num_heads, head_dim)
output = output_flat.view(batch_size, seq_len, num_heads, head_dim)
# Use self.num_heads and self.head_dim since vLLM Attention outputs based on its configured dimensions
output = output_flat.view(batch_size, seq_len, self.num_heads, self.head_dim)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you suggesting self.num_heads might be different from num_heads here?

1. Install PyTorch nightly & Monarch for torchtitan:
```
pip3 install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu126 --force-reinstall
pip3 install torchmonarch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we update to uv pip install as the rest of this doc uses uv?

out_t = out_batch.transpose(1, 2)
grad_out_t = grad_out_batch.transpose(1, 2)

# For GQA, we need to expand K/V to match Q's num_heads
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't vllm handle GQA internally?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants