feat: Automodel init for DTensorPolicyV2 #1509

adil-a · 2025-11-12T05:24:38Z

What does this PR do ?

Uses Automodel's FSDP2 manager for initializing the v2 worker.

Sharding on current main:

2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213) ================================================================================
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213) [PARALLELISM CONFIG]
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   world_size = 16
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   tensor_parallel_size (TP) = 1
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   context_parallel_size (CP) = 1
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   data_parallel_size (DP/FSDP) = 16
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   data_parallel_replicate_size = 1
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   sequence_parallel = False
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   FSDP shards model across 16 workers
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   Each worker has ~1/16 of model parameters
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213) ================================================================================
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213) ================================================================================
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213) [MODEL SHARDING DIAGNOSTICS - Rank 0]
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   Total parameters: 1,498,482,688
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   DTensor parameters: 1,498,482,688 (100.0%)
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   Regular parameters: 0 (0.0%)
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   Local storage (this worker): 0.37 GB
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   Global storage (full model): 5.99 GB
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   Shard ratio: 1/16.0 (this worker has 1/16 of model)
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)   Sample DTensor placements:
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)     model.embed_tokens.weight:
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)       Global shape: torch.Size([128256, 2048])
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)       Local shape: torch.Size([8016, 2048])
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)       Placements: (Replicate(), Shard(dim=0))
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)       Device mesh: DeviceMesh('cuda', [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]], mesh_dim_names=('dp_replicate', 'dp_shard_cp'))
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)       Device mesh shape: (1, 16)
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)     model.layers.0.self_attn.q_proj.weight:
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)       Global shape: torch.Size([2048, 2048])
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)       Local shape: torch.Size([128, 2048])
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)       Placements: (Replicate(), Shard(dim=0))
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)       Device mesh: DeviceMesh('cuda', [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]], mesh_dim_names=('dp_replicate', 'dp_shard_cp'))
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)       Device mesh shape: (1, 16)
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)     model.layers.0.self_attn.k_proj.weight:
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)       Global shape: torch.Size([512, 2048])
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)       Local shape: torch.Size([32, 2048])
2025-11-12 16:03:44
(DTensorPolicyWorkerV2 pid=1247213)       Placements: (Replicate(), Shard(dim=0))

Sharding on this branch:

2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604) ================================================================================
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604) [PARALLELISM CONFIG]
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)   world_size = 16
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)   tensor_parallel_size (TP) = 1
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)   context_parallel_size (CP) = 1
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)   data_parallel_size (DP/FSDP) = None
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)   data_parallel_replicate_size = 2
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)   sequence_parallel = False
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)   FSDP shards model across None workers
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)   Each worker has ~1/None of model parameters
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604) ================================================================================
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604) ================================================================================
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604) [MODEL SHARDING DIAGNOSTICS - Rank 0]
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)   Total parameters: 1,498,482,688
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)   DTensor parameters: 1,498,482,688 (100.0%)
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)   Regular parameters: 0 (0.0%)
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)   Local storage (this worker): 0.75 GB
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)   Global storage (full model): 5.99 GB
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)   Shard ratio: 1/8.0 (this worker has 1/8 of model)
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Global shape: torch.Size([128256, 2048])
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Local shape: torch.Size([16032, 2048])
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Placements: (Replicate(), Shard(dim=0))
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Device mesh: DeviceMesh('cuda', [[0, 1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14, 15]], mesh_dim_names=('dp_replicate', 'dp_shard_cp'))
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Device mesh shape: (2, 8)
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)     model.layers.0.self_attn.q_proj.weight:
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Global shape: torch.Size([2048, 2048])
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Local shape: torch.Size([256, 2048])
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Placements: (Replicate(), Shard(dim=0))
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Device mesh: DeviceMesh('cuda', [[0, 1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14, 15]], mesh_dim_names=('dp_replicate', 'dp_shard_cp'))
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Device mesh shape: (2, 8)
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)     model.layers.0.self_attn.k_proj.weight:
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Global shape: torch.Size([512, 2048])
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Local shape: torch.Size([64, 2048])
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Placements: (Replicate(), Shard(dim=0))
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Device mesh: DeviceMesh('cuda', [[0, 1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14, 15]], mesh_dim_names=('dp_replicate', 'dp_shard_cp'))
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604)       Device mesh shape: (2, 8)
2025-11-12 16:32:43
(DTensorPolicyWorkerV2 pid=1921604) ================================================================================

Summary by CodeRabbit

Refactor
- Improved internal training infrastructure with updated distributed model parallelization setup.
- Enhanced CPU offload support for distributed training scenarios.
- Optimized attention mechanism selection during distributed training based on configuration parameters.

Signed-off-by: adil-a <adil.asif2000@hotmail.com>

Signed-off-by: root <root@pool0-01523.cm.cluster>

Signed-off-by: adil-a <adil.asif2000@hotmail.com>

Signed-off-by: root <root@pool0-01749.cm.cluster>

github-actions · 2025-11-12T05:25:01Z

⚠️ File Consistency Check

Check based on commit: 128fb6f (PR #1509 from adil/fsdp-manager)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

coderabbitai · 2025-11-12T05:30:38Z

📝 Walkthrough

Walkthrough

Refactors FSDP2 initialization in the policy worker to use FSDP2Manager instead of manual device-mesh setup, adds cpu_offload handling, exposes manager-derived mesh attributes, implements dynamic attention implementation selection based on configuration, and updates gradient norm computation paths.

Changes

Cohort / File(s)	Summary
FSDP2 Manager Integration & Initialization `nemo_rl/models/policy/dtensor_policy_worker_v2.py`	Replaces manual FSDP2 device-mesh and parallelization with FSDP2Manager-based approach; exposes manager-derived mesh attributes (dp_mesh, dp_shard_cp_mesh, tp_mesh, cp_mesh) and size attributes; integrates cpu_offload handling with CUDA+CPU backend for init_process_group
Attention Implementation & Configuration `nemo_rl/models/policy/dtensor_policy_worker_v2.py`	Adds dynamic attn_implementation selection logic to choose between flash_attention_2 and sdpa based on seq_packing and context_parallel_size; injects computed value into model_config
Training & Gradient Computation `nemo_rl/models/policy/dtensor_policy_worker_v2.py`	Updates grad norm computation to reference dp_shard_cp_mesh instead of dp_cp_mesh; removes legacy OffloadPolicy import; restructures model wiring to use manager.parallelize

Sequence Diagram(s)

sequenceDiagram
    participant Worker as Policy Worker
    participant Manager as FSDP2Manager
    participant Model
    participant Training

    Worker->>Manager: Initialize with cpu_offload config
    activate Manager
    Manager->>Manager: Create device mesh<br/>(CUDA+CPU backend if offloading)
    Manager->>Manager: Configure offload policy
    Manager-->>Worker: Return mesh attributes<br/>(dp_mesh, tp_mesh, cp_mesh)
    deactivate Manager

    Worker->>Worker: Select attn_implementation<br/>(flash_attention_2 vs sdpa)<br/>based on seq_packing & context_parallel

    Worker->>Model: Inject attn_implementation<br/>into model_config

    Worker->>Manager: parallelize(model)
    activate Manager
    Manager->>Model: Apply FSDP2 sharding
    Manager-->>Worker: Model parallelized
    deactivate Manager

    Worker->>Training: Store mesh references<br/>(dp_shard_cp_mesh for grad norm)
    Training->>Training: Use dp_shard_cp_mesh<br/>for gradient computation

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–75 minutes

FSDP2Manager integration: Verify correct mesh creation, device placement, and offload policy configuration with cpu_offload handling
Attention implementation selection logic: Validate conditional logic for choosing between flash_attention_2 and sdpa; ensure proper config injection
Gradient norm computation updates: Confirm dp_shard_cp_mesh correctly replaces previous dp_cp_mesh references in training paths
Mesh attribute exposure and downstream references: Check all usages of newly exposed mesh attributes (dp_shard_cp_mesh, tp_mesh, etc.) are consistent and correctly applied

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR introduces major changes to DTensorPolicyV2 initialization including FSDP2Manager switch, gradient norm computation modifications, and attention implementation changes that directly affect training numerics, but no test results or convergence verification is documented.	Add test results demonstrating convergence behavior is unchanged, gradient norms are correct across DP/CP configurations, performance metrics, and verification that the unresolved gradient norm issue is resolved.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: introducing Automodel initialization for DTensorPolicyV2 through FSDP2 manager integration.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch adil/fsdp-manager

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a40247 and 128fb6f.

📒 Files selected for processing (1)

nemo_rl/models/policy/dtensor_policy_worker_v2.py (8 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

nemo_rl/models/policy/dtensor_policy_worker_v2.py

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

nemo_rl/models/policy/dtensor_policy_worker_v2.py

🧠 Learnings (1)

📚 Learning: 2025-10-30T20:50:44.126Z

Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.

Applied to files:

nemo_rl/models/policy/dtensor_policy_worker_v2.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Lint check
GitHub Check: Post submodule check comment / Comment on PR

nemo_rl/models/policy/dtensor_policy_worker_v2.py

joyang-nv

Thanks for refactoring!

terrykong · 2025-11-12T06:31:10Z

@adil-a do you mind adding the total step time before and after as well to the pr description?

github-actions · 2025-11-12T15:31:46Z

⚠️ File Consistency Check

Check based on commit: d5cc915 (PR #1509 from adil/fsdp-manager)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-12T15:32:12Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: d5cc915 (PR #1509 from adil/fsdp-manager)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1509 from adil/fsdp-manager): https://github.com/NVIDIA-NeMo/Automodel/commits/8134b0c039802fb3f6161571400ef7085dd1e9cb/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions · 2025-11-12T15:38:57Z

⚠️ File Consistency Check

Check based on commit: 0577c10 (PR #1509 from adil/fsdp-manager)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-12T15:39:33Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 0577c10 (PR #1509 from adil/fsdp-manager)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1509 from adil/fsdp-manager): https://github.com/NVIDIA-NeMo/Automodel/commits/8134b0c039802fb3f6161571400ef7085dd1e9cb/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions · 2025-11-12T15:41:29Z

⚠️ File Consistency Check

Check based on commit: 733529b (PR #1509 from adil/fsdp-manager)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-12T15:41:54Z

❌ Submodule Fast-Forward Check Failed

Check based on commit: 733529b (PR #1509 from adil/fsdp-manager)

❌ Submodules that need attention:

Automodel: ❌ Commits have DIVERGED from a common ancestor
TARGET (main branch): https://github.com/NVIDIA-NeMo/Automodel/commits/a2db048383cd54b3fafc928df4c30bf7bbf7c430/
CURRENT (PR #1509 from adil/fsdp-manager): https://github.com/NVIDIA-NeMo/Automodel/commits/8134b0c039802fb3f6161571400ef7085dd1e9cb/

Please ensure all submodule commits are fast-forwards of the main branch before merging.

github-actions · 2025-11-12T16:22:17Z

⚠️ File Consistency Check

Check based on commit: 128fb6f (PR #1509 from adil/fsdp-manager)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: adil-a <adil.asif2000@hotmail.com>

github-actions · 2025-11-12T16:42:04Z

⚠️ File Consistency Check

Check based on commit: 20d8bbc (PR #1509 from adil/fsdp-manager)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

adil-a and others added 6 commits November 11, 2025 21:04

adding FSDP2 manager

72f08fe

Signed-off-by: adil-a <adil.asif2000@hotmail.com>

lint

7bd6b85

Signed-off-by: adil-a <adil.asif2000@hotmail.com>

adding configurable dp replicate axis

e8af993

Signed-off-by: adil-a <adil.asif2000@hotmail.com>

temporary debug utils

cb0622f

Signed-off-by: root <root@pool0-01523.cm.cluster>

more diagnostics

5faeca9

Signed-off-by: adil-a <adil.asif2000@hotmail.com>

removing diagnostics code

128fb6f

Signed-off-by: root <root@pool0-01749.cm.cluster>

adil-a requested a review from a team as a code owner November 12, 2025 05:24

adil-a added the CI:L2 Run doctests, unit tests, functional tests, and convergence tests label Nov 12, 2025

adil-a temporarily deployed to nemo-ci November 12, 2025 05:28 — with GitHub Actions Inactive

coderabbitai bot reviewed Nov 12, 2025

View reviewed changes

nemo_rl/models/policy/dtensor_policy_worker_v2.py Show resolved Hide resolved

joyang-nv previously approved these changes Nov 12, 2025

View reviewed changes

adil-a dismissed joyang-nv’s stale review via d5cc915 November 12, 2025 15:31

adil-a requested a review from a team as a code owner November 12, 2025 15:41

adil-a force-pushed the adil/fsdp-manager branch from 733529b to 128fb6f Compare November 12, 2025 16:22

removing test temporarily

20d8bbc

Signed-off-by: adil-a <adil.asif2000@hotmail.com>

adil-a requested a review from a team as a code owner November 12, 2025 16:41

adil-a added CI:L2 Run doctests, unit tests, functional tests, and convergence tests and removed CI:L2 Run doctests, unit tests, functional tests, and convergence tests labels Nov 12, 2025

adil-a temporarily deployed to nemo-ci November 12, 2025 17:02 — with GitHub Actions Inactive

adil-a temporarily deployed to nemo-ci November 12, 2025 17:03 — with GitHub Actions Inactive

feat: Automodel init for DTensorPolicyV2 #1509

Are you sure you want to change the base?

feat: Automodel init for DTensorPolicyV2 #1509

Conversation

adil-a commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 12, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

coderabbitai bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joyang-nv left a comment

Choose a reason for hiding this comment

Uh oh!

terrykong commented Nov 12, 2025

Uh oh!

github-actions bot commented Nov 12, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Nov 12, 2025

❌ Submodule Fast-Forward Check Failed

❌ Submodules that need attention:

Uh oh!

github-actions bot commented Nov 12, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Nov 12, 2025

❌ Submodule Fast-Forward Check Failed

❌ Submodules that need attention:

Uh oh!

github-actions bot commented Nov 12, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Nov 12, 2025

❌ Submodule Fast-Forward Check Failed

❌ Submodules that need attention:

Uh oh!

github-actions bot commented Nov 12, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Nov 12, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

adil-a commented Nov 12, 2025 •

edited

Loading

coderabbitai bot commented Nov 12, 2025 •

edited

Loading