[Bugfix] Fix bitwise determinism after vLLM SiluAndMul change #2358

Lucaskabela · 2026-02-09T23:30:35Z

Summary

vllm-project/vllm#32806 Changed the behavior of SiluAndMul to use torch.compile inside the custom op. This causes a divergence in the vllm definition and torchtitan definition, causing the RL script to fail

We fix this by changing the implementation to call through to the kernel used in vLLM for equivalence

Test Plan

VLLM_BATCH_INVARIANT=1 VLLM_FLASH_ATTN_VERSION=3 python -m torchtitan.experiments.rl.vllm_compat.simple_rl

Before (Main)

  ⚠ vLLM-TorchTitan logprobs differ: 59/100 tokens
    Max delta: 3.650573e-01, Avg delta: 1.829216e-02
    vLLM logprobs:     ['-0.0642131940', '-0.1617958397', '-0.0011243457', '-0.0051660384', '-0.2546415329']
    TorchTitan logprobs: ['-0.0609663166', '-0.1456209123', '-0.0010027625', '-0.0048254938', '-0.2543271184']

After

INFO 02-09 15:18:20 [llm.py:343] Supported tasks: ['generate']
✓ Created new vLLM engine
Adding requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 402.86it/s]
Processed prompts: 100%|████████████████████████████████████████████████████████| 80/80 [00:03<00:00, 22.59it/s, est. speed input: 1473.08 toks/s, output: 2259.32 toks/s]
  ✓ vLLM-TorchTitan bitwise determinism verified: 100 tokens match exactly
Adding requests: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 495.83it/s]
Processed prompts: 100%|████████████████████████████████████████████████████████| 80/80 [00:03<00:00, 24.28it/s, est. speed input: 1583.37 toks/s, output: 2428.49 toks/s]
  ✓ vLLM-TorchTitan bitwise determinism verified: 100 tokens match exactly

Lucaskabela · 2026-02-09T23:39:03Z

cc @wwwjn @tianyu-l @PaulZhang12

wwwjn · 2026-02-10T00:03:38Z

torchtitan/experiments/rl/vllm_compat/batch_invariant_backward.py

        # Since these are parameter free we instantiate default config
        with set_current_vllm_config(VllmConfig()):
-            vllm_silu_and_mul = VLLMSiluAndMul()
+            vllm_silu_and_mul = VLLMSiluAndMul(compile_native=False)


Does compile_native=False means we don't use torch.compile inside the custom op? Can you add a line of comment to explain why this field is neede? Thanks!

n00b q: Is vllm_compat path using vllm's compile mechanism, or it applies compile manually? Or compile is not yet enabled?

NOTE: if we enable compile in vLLM in the future, we need to compile this op as well.

I'm a little bit confused here, and want to high-levely understand how we should enable compile properly. IIUC there should be 2 ways of applying compile:

apply compile by ourself, like apply_compile() function, and send the compiled model to vllm (set vllm.compilation_config.level =0);

Let vllm engine enable compile (vllm.compilation_config.level = 3), and decorate our model with decorator @support_torch_compile

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 9, 2026

Lucaskabela marked this pull request as ready for review February 9, 2026 23:37

Lucaskabela requested review from PaulZhang12, bwasti, tianyu-l and wwwjn February 9, 2026 23:40

Lucaskabela force-pushed the lucaskabela/fix_determinism branch from 556ae85 to 5a68847 Compare February 9, 2026 23:52

pytorch-bot bot added the ciflow/8gpu label Feb 9, 2026

wwwjn self-assigned this Feb 10, 2026

wwwjn reviewed Feb 10, 2026

View reviewed changes

Lucaskabela force-pushed the lucaskabela/fix_determinism branch 2 times, most recently from 278337a to 972b13b Compare February 10, 2026 00:09

Lucaskabela requested a review from wwwjn February 10, 2026 00:10

acisseJZhong approved these changes Feb 10, 2026

View reviewed changes

Fix batch invariance

7981609

Lucaskabela force-pushed the lucaskabela/fix_determinism branch from 972b13b to 7981609 Compare February 10, 2026 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] Fix bitwise determinism after vLLM SiluAndMul change #2358

[Bugfix] Fix bitwise determinism after vLLM SiluAndMul change #2358

Lucaskabela commented Feb 9, 2026

Uh oh!

Lucaskabela commented Feb 9, 2026

Uh oh!

wwwjn Feb 10, 2026

Uh oh!

Lucaskabela Feb 10, 2026

Uh oh!

wwwjn Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Bugfix] Fix bitwise determinism after vLLM SiluAndMul change #2358

Are you sure you want to change the base?

[Bugfix] Fix bitwise determinism after vLLM SiluAndMul change #2358

Conversation

Lucaskabela commented Feb 9, 2026

Summary

Test Plan

Before (Main)

After

Uh oh!

Lucaskabela commented Feb 9, 2026

Uh oh!

wwwjn Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Lucaskabela Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

wwwjn Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants