Skip to content

Conversation

@alwaysyiyu
Copy link
Contributor

@alwaysyiyu alwaysyiyu commented Feb 10, 2026

What does this PR do?

Add NPU GRPO training scripts for Qwen3-VL-30B (FSDP/VLLM backends). The reward curves of this scenario are also shown.

Checklist Before Starting

  • Search for similar PRs. Paste at least one query link here: ...
  • Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
    • {modules} include fsdp, megatron, veomni, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data, cfg, reward
    • If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
    • {type} is in feat, fix, refactor, chore, test
    • If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
    • Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

the test results in gpu and npu:
image

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new shell script for GRPO training of the Qwen3-VL-30B model on NPUs, utilizing FSDP and VLLM backends to launch the verl.trainer.main_ppo Python module. A critical security vulnerability has been identified: the script uses unquoted shell variables as arguments to the python command, which could lead to command injection if malicious characters are present in environment variables or command-line inputs. Furthermore, a critical typo exists where an extra + character prefixes a command-line argument, which will cause the script to fail during execution.

actor_rollout_ref.rollout.max_num_batched_tokens=20000 \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=2 \
actor_rollout_ref.rollout.tensor_model_parallel_size=${gen_tp} \
+actor_rollout_ref.rollout.engine_kwargs.vllm.disable_mm_preprocessor_cache=True \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a + character at the beginning of this line which appears to be a typo, likely from a copy-paste operation from a diff. This will cause the argument to be passed incorrectly to the python script and will likely cause a failure. Please remove the leading +.

Suggested change
+actor_rollout_ref.rollout.engine_kwargs.vllm.disable_mm_preprocessor_cache=True \
actor_rollout_ref.rollout.engine_kwargs.vllm.disable_mm_preprocessor_cache=True \

data.filter_overlong_prompts=True \
data.truncation='error' \
data.image_key=images \
actor_rollout_ref.model.path=${MODEL_PATH} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The variable ${MODEL_PATH} is used without double quotes. If the MODEL_PATH environment variable contains shell metacharacters (e.g., ;, &, |), it could lead to arbitrary command execution when the script is run. Always wrap shell variables in double quotes when they are used as command arguments to prevent word splitting and command injection.

Suggested change
actor_rollout_ref.model.path=${MODEL_PATH} \
actor_rollout_ref.model.path="${MODEL_PATH}" \

actor_rollout_ref.ref.fsdp_config.param_offload=True \
actor_rollout_ref.ref.entropy_from_logits_with_chunking=True \
actor_rollout_ref.ref.ulysses_sequence_parallel_size=$sp_size \
actor_rollout_ref.rollout.name=$ENGINE \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The variable $ENGINE is derived from the first command-line argument ($1) and used unquoted. This is a direct vector for command injection. For example, passing "vllm; id" as the first argument would cause the shell to execute the id command. Wrap the variable in double quotes to ensure it is treated as a single string argument.

Suggested change
actor_rollout_ref.rollout.name=$ENGINE \
actor_rollout_ref.rollout.name="$ENGINE" \

trainer.experiment_name="${exp_name}" \
trainer.n_gpus_per_node=16 \
trainer.nnodes=2 \
trainer.default_local_dir=${CKPTS_DIR} \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The variable ${CKPTS_DIR} is used unquoted in the command line. Similar to MODEL_PATH, if this environment variable is controlled by an attacker or contains unexpected characters, it can lead to command injection. Use double quotes to safely pass the variable to the python process.

Suggested change
trainer.default_local_dir=${CKPTS_DIR} \
trainer.default_local_dir="${CKPTS_DIR}" \

@alwaysyiyu alwaysyiyu marked this pull request as ready for review February 10, 2026 13:28
algorithm.rollout_correction.rollout_rs=${rollout_rs} \
algorithm.rollout_correction.rollout_rs_threshold=${rollout_rs_threshold} \
actor_rollout_ref.rollout.calculate_log_probs=True \
trainer.device=npu \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
trainer.device=npu \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified according to the suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants