-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[fsdp, vllm] feat: add NPU GRPO training scripts for Qwen3-VL-30B (FSDP/VLLM backends) #5260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new shell script for GRPO training of the Qwen3-VL-30B model on NPUs, utilizing FSDP and VLLM backends to launch the verl.trainer.main_ppo Python module. A critical security vulnerability has been identified: the script uses unquoted shell variables as arguments to the python command, which could lead to command injection if malicious characters are present in environment variables or command-line inputs. Furthermore, a critical typo exists where an extra + character prefixes a command-line argument, which will cause the script to fail during execution.
| actor_rollout_ref.rollout.max_num_batched_tokens=20000 \ | ||
| actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=2 \ | ||
| actor_rollout_ref.rollout.tensor_model_parallel_size=${gen_tp} \ | ||
| +actor_rollout_ref.rollout.engine_kwargs.vllm.disable_mm_preprocessor_cache=True \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a + character at the beginning of this line which appears to be a typo, likely from a copy-paste operation from a diff. This will cause the argument to be passed incorrectly to the python script and will likely cause a failure. Please remove the leading +.
| +actor_rollout_ref.rollout.engine_kwargs.vllm.disable_mm_preprocessor_cache=True \ | |
| actor_rollout_ref.rollout.engine_kwargs.vllm.disable_mm_preprocessor_cache=True \ |
| data.filter_overlong_prompts=True \ | ||
| data.truncation='error' \ | ||
| data.image_key=images \ | ||
| actor_rollout_ref.model.path=${MODEL_PATH} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable ${MODEL_PATH} is used without double quotes. If the MODEL_PATH environment variable contains shell metacharacters (e.g., ;, &, |), it could lead to arbitrary command execution when the script is run. Always wrap shell variables in double quotes when they are used as command arguments to prevent word splitting and command injection.
| actor_rollout_ref.model.path=${MODEL_PATH} \ | |
| actor_rollout_ref.model.path="${MODEL_PATH}" \ |
| actor_rollout_ref.ref.fsdp_config.param_offload=True \ | ||
| actor_rollout_ref.ref.entropy_from_logits_with_chunking=True \ | ||
| actor_rollout_ref.ref.ulysses_sequence_parallel_size=$sp_size \ | ||
| actor_rollout_ref.rollout.name=$ENGINE \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable $ENGINE is derived from the first command-line argument ($1) and used unquoted. This is a direct vector for command injection. For example, passing "vllm; id" as the first argument would cause the shell to execute the id command. Wrap the variable in double quotes to ensure it is treated as a single string argument.
| actor_rollout_ref.rollout.name=$ENGINE \ | |
| actor_rollout_ref.rollout.name="$ENGINE" \ |
| trainer.experiment_name="${exp_name}" \ | ||
| trainer.n_gpus_per_node=16 \ | ||
| trainer.nnodes=2 \ | ||
| trainer.default_local_dir=${CKPTS_DIR} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable ${CKPTS_DIR} is used unquoted in the command line. Similar to MODEL_PATH, if this environment variable is controlled by an attacker or contains unexpected characters, it can lead to command injection. Use double quotes to safely pass the variable to the python process.
| trainer.default_local_dir=${CKPTS_DIR} \ | |
| trainer.default_local_dir="${CKPTS_DIR}" \ |
| algorithm.rollout_correction.rollout_rs=${rollout_rs} \ | ||
| algorithm.rollout_correction.rollout_rs_threshold=${rollout_rs_threshold} \ | ||
| actor_rollout_ref.rollout.calculate_log_probs=True \ | ||
| trainer.device=npu \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| trainer.device=npu \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified according to the suggestion.
What does this PR do?
Add NPU GRPO training scripts for Qwen3-VL-30B (FSDP/VLLM backends). The reward curves of this scenario are also shown.
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
the test results in gpu and npu:

API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.