[recipe] feat: add QWen 30b moe dapo script that can run on a single 80GB node by vermouth1992 · Pull Request #2645 · verl-project/verl

vermouth1992 · 2025-07-20T15:25:52Z

What does this PR do?

As title
Achieves around 0.28 AIME'24 after 100 steps which takes around 1 day on a H800 single node
Note that we start from base model

Checklist Before Starting

Search for similar PRs. Paste at least one query link here: ...
Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

Read the Contribute Guide.
Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
Add / Update the documentation.
Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace.

gemini-code-assist

Code Review

This pull request introduces a new recipe script for running DAPO on a Qwen-3 30B model and updates the reward scoring logic to support a new MATH dataset. There's a critical configuration error in the script that will prevent it from running, as well as a high-severity issue with hardcoded paths that impacts reusability.

gemini-code-assist · 2025-07-20T15:27:15Z

recipe/dapo/test_dapo_qwen3_30b_math_single_node.sh

+infer_ppo_max_token_len=$(((max_prompt_length + max_response_length) * 3))
+offload=True
+gen_tp=4
+fsdp_size=32


The fsdp_size is hardcoded to 32, but the script is configured to run on a single node with 8 GPUs (NNODES=1, NGPUS_PER_NODE=8). The FSDP size must match the total number of GPUs allocated for the task, which is 8 in this case. With the current configuration, the training job will likely hang while waiting for 32 GPU resources that are not available.

Suggested change

fsdp_size=32

fsdp_size=$((NNODES * NGPUS_PER_NODE))

gemini-code-assist · 2025-07-20T15:27:16Z

recipe/dapo/test_dapo_qwen3_30b_math_single_node.sh

+MODEL_PATH=/mnt/hdfs/zhangchi.usc1992_lf_lq/models/Qwen3-30B-A3B-Base
+CKPTS_DIR=${CKPTS_DIR:-"${RAY_DATA_HOME}/ckpts/${project_name}/${exp_name}"}
+TRAIN_FILE=/mnt/hdfs/zhangchi.usc1992_ssd_hldy/dataset/dapo-math-17k.parquet
+TEST_FILE=/mnt/hdfs/zhangchi.usc1992_ssd_hldy/dataset/aime-2024.parquet


The script contains hardcoded, user-specific absolute paths for the model and datasets. This limits the script's portability and reusability. These paths should be parameterized by using environment variables and failing with an informative message if they are not set.

Suggested change

MODEL_PATH=/mnt/hdfs/zhangchi.usc1992_lf_lq/models/Qwen3-30B-A3B-Base

CKPTS_DIR=${CKPTS_DIR:-"${RAY_DATA_HOME}/ckpts/${project_name}/${exp_name}"}

TRAIN_FILE=/mnt/hdfs/zhangchi.usc1992_ssd_hldy/dataset/dapo-math-17k.parquet

TEST_FILE=/mnt/hdfs/zhangchi.usc1992_ssd_hldy/dataset/aime-2024.parquet

MODEL_PATH=${MODEL_PATH:?"MODEL_PATH must be set to the path of the base model."}

CKPTS_DIR=${CKPTS_DIR:-${RAY_DATA_HOME}/ckpts/${project_name}/${exp_name}}

TRAIN_FILE=${TRAIN_FILE:?"TRAIN_FILE must be set to the path of the training data."}

TEST_FILE=${TEST_FILE:?"TEST_FILE must be set to the path of the test data."}

vermouth1992 · 2025-07-20T15:29:09Z

/gemini review

gemini-code-assist

Code Review

This pull request adds a new script for DAPO training on a single node and updates the reward scoring logic. The script changes seem mostly correct for adapting the training to a smaller-scale environment. However, I've identified a critical issue in the distributed training configuration within the new script that could lead to runtime failures or incorrect behavior. The python change to support a new dataset appears correct.

gemini-code-assist · 2025-07-20T15:31:28Z

recipe/dapo/test_dapo_qwen3_30b_math_single_node.sh

+infer_ppo_max_token_len=$(((max_prompt_length + max_response_length) * 3))
+offload=True
+gen_tp=4
+fsdp_size=8


This parallelism configuration appears to be incorrect and could lead to runtime errors or silent performance issues. With a total of 8 GPUs for the node (NNODES=1, NGPUS_PER_NODE=8), and a sequence parallelism size (sp_size) of 4 (defined on line 49), the data parallelism degree (FSDP size) should be 2, such that dp_size * sp_size = world_size (i.e., 2 * 4 = 8).

Currently, fsdp_size is set to 8. If fsdp_size represents the data parallelism degree, the total required GPUs would be 8 * 4 = 32, which far exceeds the 8 available GPUs. This is a critical misconfiguration.

Assuming sp_size=4 is desired, fsdp_size should be changed to 2.

Suggested change

fsdp_size=8

fsdp_size=2

This parallelism configuration appears to be incorrect and could lead to runtime errors or silent performance issues. With a total of 8 GPUs for the node (NNODES=1, NGPUS_PER_NODE=8), and a sequence parallelism size (sp_size) of 4 (defined on line 49), the data parallelism degree (FSDP size) should be 2, such that dp_size * sp_size = world_size (i.e., 2 * 4 = 8).

Currently, fsdp_size is set to 8. If fsdp_size represents the data parallelism degree, the total required GPUs would be 8 * 4 = 32, which far exceeds the 8 available GPUs. This is a critical misconfiguration.

Assuming sp_size=4 is desired, fsdp_size should be changed to 2.

This is incorrect

…80GB node (verl-project#2645) ### What does this PR do? - As title - Achieves around 0.28 AIME'24 after 100 steps which takes around 1 day on a H800 single node - Note that we start from base model ### Checklist Before Starting - [ ] Search for similar PRs. Paste at least one query link here: ... - [ ] Format the PR title as `[{modules}] {type}: {description}` (This will be checked by the CI) - `{modules}` include `fsdp`, `megatron`, `sglang`, `vllm`, `rollout`, `trainer`, `ci`, `training_utils`, `recipe`, `hardware`, `deployment`, `ray`, `worker`, `single_controller`, `misc`, `perf`, `model`, `algo`, `env`, `tool`, `ckpt`, `doc`, `data` - If this PR involves multiple modules, separate them with `,` like `[megatron, fsdp, doc]` - `{type}` is in `feat`, `fix`, `refactor`, `chore`, `test` - If this PR breaks any API (CLI arguments, config, function signature, etc.), add `[BREAKING]` to the beginning of the title. - Example: `[BREAKING][fsdp, megatron] feat: dynamic batching` ### Test > For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc. ### API and Usage Example > Demonstrate how the API changes if any, and provide usage example(s) if possible. ```python # Add code snippet or script demonstrating how to use this ``` ### Design & Code Changes > Demonstrate the high-level design if this PR is complex, and list the specific changes. ### Checklist Before Submitting > [!IMPORTANT] > Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review. - [ ] Read the [Contribute Guide](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md). - [ ] Apply [pre-commit checks](https://github.com/volcengine/verl/blob/main/CONTRIBUTING.md#code-linting-and-formatting): `pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always` - [ ] Add / Update [the documentation](https://github.com/volcengine/verl/tree/main/docs). - [ ] Add unit or end-to-end test(s) to [the CI workflow](https://github.com/volcengine/verl/tree/main/.github/workflows) to cover all the code. If not feasible, explain why: ... - [ ] Once your PR is ready for CI, send a message in [the `ci-request` channel](https://verl-project.slack.com/archives/C091TCESWB1) in [the `verl` Slack workspace](https://join.slack.com/t/verl-project/shared_invite/zt-3855yhg8g-CTkqXu~hKojPCmo7k_yXTQ).

vermouth1992 added 4 commits July 19, 2025 16:24

update

f807a8c

update

d76f124

reduce train tokens

4cb75c5

modify script

282e970

vermouth1992 requested review from PeterSH6 and tongyx361 as code owners July 20, 2025 15:25

gemini-code-assist bot reviewed Jul 20, 2025

View reviewed changes

fix

816de8a

gemini-code-assist bot reviewed Jul 20, 2025

View reviewed changes

vermouth1992 enabled auto-merge (squash) July 21, 2025 00:49

eric-haibin-lin approved these changes Jul 21, 2025

View reviewed changes

vermouth1992 merged commit ac414d9 into main Jul 21, 2025
34 of 60 checks passed

vermouth1992 deleted the chi/dev/single_node_dapo branch July 21, 2025 01:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[recipe] feat: add QWen 30b moe dapo script that can run on a single 80GB node#2645

[recipe] feat: add QWen 30b moe dapo script that can run on a single 80GB node#2645
vermouth1992 merged 5 commits intomainfrom
chi/dev/single_node_dapo

vermouth1992 commented Jul 20, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 20, 2025

Uh oh!

gemini-code-assist bot Jul 20, 2025

Uh oh!

vermouth1992 commented Jul 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 20, 2025

Uh oh!

vermouth1992 Jul 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

vermouth1992 commented Jul 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

vermouth1992 commented Jul 20, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

vermouth1992 Jul 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vermouth1992 commented Jul 20, 2025 •

edited

Loading