[fsdp, vllm] feat: add NPU GRPO training scripts for Qwen3-VL-30B (FSDP/VLLM backends) #5260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

alwaysyiyu wants to merge 2 commits into verl-project:main from alwaysyiyu:qwen3_vl_script

Contributor

alwaysyiyu commented Feb 10, 2026 •

edited

Loading

What does this PR do?

Add NPU GRPO training scripts for Qwen3-VL-30B (FSDP/VLLM backends). The reward curves of this scenario are also shown.

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

the test results in gpu and npu:

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)
If your PR is related to the recipe submodule, please also update the reference to the submodule commit via git submodule update --remote or cd recipe && git pull origin main.


          add qwen3_vl_30 script

ee1543b

gemini-code-assist bot reviewed

View reviewed changes

Contributor

gemini-code-assist bot left a comment

Code Review

This pull request introduces a new shell script for GRPO training of the Qwen3-VL-30B model on NPUs, utilizing FSDP and VLLM backends to launch the verl.trainer.main_ppo Python module. A critical security vulnerability has been identified: the script uses unquoted shell variables as arguments to the python command, which could lead to command injection if malicious characters are present in environment variables or command-line inputs. Furthermore, a critical typo exists where an extra + character prefixes a command-line argument, which will cause the script to fail during execution.

examples/grpo_trainer/run_qwen3_vl_30b_vllm_fsdp_npu.sh

+                  actor_rollout_ref.rollout.max_num_batched_tokens=20000 \
+                  actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=2 \
+                  actor_rollout_ref.rollout.tensor_model_parallel_size=${gen_tp} \
+                  +actor_rollout_ref.rollout.engine_kwargs.vllm.disable_mm_preprocessor_cache=True \

Contributor

gemini-code-assist bot Feb 10, 2026

There is a + character at the beginning of this line which appears to be a typo, likely from a copy-paste operation from a diff. This will cause the argument to be passed incorrectly to the python script and will likely cause a failure. Please remove the leading +.

Suggested change

      
                +actor_rollout_ref.rollout.engine_kwargs.vllm.disable_mm_preprocessor_cache=True \
          
                actor_rollout_ref.rollout.engine_kwargs.vllm.disable_mm_preprocessor_cache=True \

examples/grpo_trainer/run_qwen3_vl_30b_vllm_fsdp_npu.sh

+                  data.filter_overlong_prompts=True \
+                  data.truncation='error' \
+                  data.image_key=images \
+                  actor_rollout_ref.model.path=${MODEL_PATH} \

Contributor

gemini-code-assist bot Feb 10, 2026

The variable ${MODEL_PATH} is used without double quotes. If the MODEL_PATH environment variable contains shell metacharacters (e.g., ;, &, |), it could lead to arbitrary command execution when the script is run. Always wrap shell variables in double quotes when they are used as command arguments to prevent word splitting and command injection.

Suggested change

      
                actor_rollout_ref.model.path=${MODEL_PATH} \
          
                actor_rollout_ref.model.path="${MODEL_PATH}" \

examples/grpo_trainer/run_qwen3_vl_30b_vllm_fsdp_npu.sh

+                  actor_rollout_ref.ref.fsdp_config.param_offload=True \
+                  actor_rollout_ref.ref.entropy_from_logits_with_chunking=True \
+                  actor_rollout_ref.ref.ulysses_sequence_parallel_size=$sp_size \
+                  actor_rollout_ref.rollout.name=$ENGINE \

Contributor

gemini-code-assist bot Feb 10, 2026

The variable $ENGINE is derived from the first command-line argument ($1) and used unquoted. This is a direct vector for command injection. For example, passing "vllm; id" as the first argument would cause the shell to execute the id command. Wrap the variable in double quotes to ensure it is treated as a single string argument.

Suggested change

      
                actor_rollout_ref.rollout.name=$ENGINE \
          
                actor_rollout_ref.rollout.name="$ENGINE" \

examples/grpo_trainer/run_qwen3_vl_30b_vllm_fsdp_npu.sh

+                  trainer.experiment_name="${exp_name}" \
+                  trainer.n_gpus_per_node=16 \
+                  trainer.nnodes=2 \
+                  trainer.default_local_dir=${CKPTS_DIR} \

Contributor

gemini-code-assist bot Feb 10, 2026

The variable ${CKPTS_DIR} is used unquoted in the command line. Similar to MODEL_PATH, if this environment variable is controlled by an attacker or contains unexpected characters, it can lead to command injection. Use double quotes to safely pass the variable to the python process.

Suggested change

      
                trainer.default_local_dir=${CKPTS_DIR} \
          
                trainer.default_local_dir="${CKPTS_DIR}" \

alwaysyiyu marked this pull request as ready for review

February 10, 2026 13:28

alwaysyiyu requested review from FightingZhen, PeterSH6, ji-huazhong, tardis-key and vermouth1992 as code owners

February 10, 2026 13:28

ji-huazhong reviewed

View reviewed changes

examples/grpo_trainer/run_qwen3_vl_30b_vllm_fsdp_npu.sh Outdated

+                  algorithm.rollout_correction.rollout_rs=${rollout_rs} \
+                  algorithm.rollout_correction.rollout_rs_threshold=${rollout_rs_threshold} \
+                  actor_rollout_ref.rollout.calculate_log_probs=True \
+                  trainer.device=npu \

Collaborator

ji-huazhong Feb 10, 2026

Suggested change

trainer.device=npu \

Contributor Author

alwaysyiyu Feb 11, 2026

modified according to the suggestion.


          bug fix

1c2b7b4

alwaysyiyu requested a review from ji-huazhong

February 11, 2026 03:29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

vermouth1992 Awaiting requested review from vermouth1992 vermouth1992 is a code owner

PeterSH6 Awaiting requested review from PeterSH6 PeterSH6 is a code owner

tardis-key Awaiting requested review from tardis-key tardis-key is a code owner

FightingZhen Awaiting requested review from FightingZhen FightingZhen is a code owner

ji-huazhong Awaiting requested review from ji-huazhong ji-huazhong is a code owner

1 more reviewer

gemini-code-assist[bot] gemini-code-assist[bot] left review comments

At least 1 approving review is required to merge this pull request.

Labels

None yet