feat: LoRA SFT support for DTensorV2 path #1556

samodi-nv · 2025-11-21T00:50:17Z

Issues

Addresses #833

Test with thinking machine config

Description

This PR is a a work in progress to add LoRA support for the DTensor path.

Current status

Add functionality to SFT path
Verify LoRA accuracy compared to Thinking Machines blog

Notes

Support SFT + Lora. The result aligned with Thinking Machines blog.
The previous grad spike was due to a bug in the automodel initialization method. Modifications to the automodel have been merged into the main branch of the automodel. However, because our submodule is currently using a version significantly different from the main branch, a patch has been applied.
Our current commit usage for the automodel submodule is somewhat outdated. @RayenTian will later create a new dedicated branch for nemorl within the automodel repository and dump the changes to this branch. Once complete, this patch can be deleted.

Summary by CodeRabbit

New Features
- Added LoRA (Low-Rank Adaptation) configuration support for parameter-efficient fine-tuning in supervised fine-tuning workflows, including customizable settings for module targeting, dimensionality, and dropout.
- LoRA weights are now properly handled during checkpoint saving and loading.
Tests
- Added functional and unit tests for LoRA-enabled training and checkpoint management.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-11-25T04:54:23Z

⚠️ File Consistency Check

Check based on commit: fedecbc (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-26T03:43:55Z

⚠️ File Consistency Check

Check based on commit: 3356fc4 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-27T02:26:48Z

⚠️ File Consistency Check

Check based on commit: b7c0c10 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

github-actions · 2025-11-30T08:21:07Z

⚠️ File Consistency Check

Check based on commit: 7272936 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

…bug logging in llm_message_utils.py; adjust lora_dtype in dtensor_policy_worker_v2.py Signed-off-by: ruit <ruit@nvidia.com>

Signed-off-by: Jonas Yang <joyang@nvidia.com>

Signed-off-by: ruit <ruit@nvidia.com>

github-actions · 2025-11-30T08:31:13Z

⚠️ File Consistency Check

Check based on commit: bac01be (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

…ks for llm and vlm recipes; remove unused sft-llama3.1-8b-1n8g-dtensor-lora configuration and related test scripts; fix tokenizer model path in unit tests Signed-off-by: ruit <ruit@nvidia.com>

github-actions · 2025-11-30T08:37:25Z

⚠️ File Consistency Check

Check based on commit: 641b985 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

Signed-off-by: ruit <ruit@nvidia.com>

github-actions · 2025-11-30T09:26:55Z

⚠️ File Consistency Check

Check based on commit: b1a0fb6 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

…2; adjust return value for refit_info to only include weights Signed-off-by: ruit <ruit@nvidia.com>

github-actions · 2025-12-01T03:19:02Z

⚠️ File Consistency Check

Check based on commit: 20c357c (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

RayenTian · 2025-12-01T07:42:03Z

Hi, @samodi-nv. I made a few updates on top of your original PR:

The convergence issue was caused by the initialization method for lora weight in the automodel. The fixed code has already been merged into the automodel, but since we haven’t bumped to the latest commit yet, I temporarily added a patch in the dtensor worker. With this change, the results now line up correctly.
I removed some debug code and added a few unit tests.
I removed the Tulu 3 dataset from this PR, because that refactor: refactor env and data processor & add nemotron super 49b recipes #1506 also introduces Tulu 3 and refactors the dataset. It seems cleaner to wait for that PR to merge and then rebase.

After discussing with @joyang-nv , we’d like to first merge the SFT LoRA, and then add LoRA support for GRPO. Could you please review this PR again?

coderabbitai · 2025-12-01T08:03:32Z

📝 Walkthrough

Walkthrough

Introduces LoRA (Low-Rank Adaptation) configuration and integration support to the DTensor policy worker V2 for distributed training. Includes new configuration types, selective weight streaming for LoRA-only modes, checkpoint metadata handling, and test infrastructure for LoRA-based training scenarios.

Changes

Cohort / File(s)	Summary
Configuration Examples `examples/configs/recipes/llm/sft-llama3.2-1b-1n8g-fsdp2tp1.v3.yaml`, `examples/configs/sft.yaml`	Adds LoRA configuration blocks to DTensor config with parameters for LoRA dimensions, dropout, initialization, dtype, and Triton usage.
LoRA Configuration Types `nemo_rl/models/policy/__init__.py`	Introduces LoRAConfig TypedDict with LoRA-specific settings (enabled, target_modules, exclude_modules, dim, alpha, dropout, initialization, use_triton) and adds optional lora field to DTensorConfig.
DTensor Policy Worker Integration `nemo_rl/models/policy/dtensor_policy_worker_v2.py`	Adds LoRA application hooks, custom weight initialization patching, selective state dict filtering for LoRA-only weight streaming/broadcasting, and checkpoint metadata injection when LoRA is enabled.
DTensor V1 Compatibility `nemo_rl/models/policy/lm_policy.py`	Adds assertion to prevent LoRA usage with DTensor V1 policy worker, enforcing V2 requirement.
Functional Testing `tests/functional/test_automodel_lora_sft.sh`	Adds end-to-end LoRA SFT workflow test script with environment setup, experiment execution, and metrics validation.
Unit Testing Infrastructure `tests/unit/models/policy/test_dtensor_worker.py`	Extends test scaffolding with LoRA configuration helpers, test batch creation, token logprob calculation, and updated parametrization for LoRA-enabled training paths.
Checkpoint Testing `tests/unit/utils/test_automodel_checkpoint.py`	Adds LoRA application method to TestModel, mocking fixtures for distributed environments, and test coverage for LoRA weight save/load via safetensors with peft_config.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Primary attention area: nemo_rl/models/policy/dtensor_policy_worker_v2.py — Contains dense logic changes including custom weight initialization patching, state dict filtering across multiple methods (prepare_refit_info, save_checkpoint, stream/broadcast paths), and conditional LoRA application. Verify correctness of filtering logic and ensure streaming/broadcasting behavior is correct for both LoRA-enabled and LoRA-disabled paths.
Secondary attention: Test fixtures and parametrization updates in tests/unit/models/policy/test_dtensor_worker.py to ensure LoRA configuration paths are properly integrated with existing test flows.
Validation: Confirm compatibility check in lm_policy.py correctly prevents V1 + LoRA misconfiguration.

Suggested labels

CI

Suggested reviewers

parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 54.76% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: LoRA SFT support for DTensorV2 path' clearly and specifically describes the main feature being added: LoRA support for the SFT (Supervised Fine-Tuning) workflow on the DTensorV2 path. It is concise, accurate to the changeset, and directly corresponds to the primary objective of the PR.
Test Results For Major Changes	✅ Passed	PR includes functional and unit tests validating LoRA feature with numerical metrics checks; accuracy verified against external reference with convergence issues resolved and documented.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch samodi/automodel-lora

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (4)

examples/configs/sft.yaml (1)
48-59: LGTM - LoRA configuration block is well-structured.

The LoRA configuration provides sensible defaults and aligns with the LoRAConfig TypedDict. Minor note: Line 50 has a typo "precendence" → "precedence".
-      target_modules: [] # match all linear modules takes precendence
+      target_modules: [] # match all linear modules takes precedence
tests/unit/models/policy/test_dtensor_worker.py (2)
122-134: Mutable default arguments should be avoided.

Using mutable default arguments ([]) can lead to unexpected behavior if the lists are mutated. Although they aren't mutated here, it's best practice to use None and initialize within the function.
 def update_lora_config(
     config: PolicyConfig,
     enabled: bool = True,
-    target_modules: list[str] = [],
-    exclude_modules: list[str] = [],
+    target_modules: list[str] | None = None,
+    exclude_modules: list[str] | None = None,
     match_all_linear: bool = True,
     dim: int = 32,
     alpha: int = 32,
     dropout: float = 0.0,
     dropout_position: str = "post",
     lora_A_init: str = "xavier",
     use_triton: bool = True,
 ):
     if enabled:
         config["dtensor_cfg"]["_v2"] = True

     config["dtensor_cfg"]["lora"].update(
         {
             "enabled": enabled,
-            "target_modules": target_modules,
-            "exclude_modules": exclude_modules,
+            "target_modules": target_modules if target_modules is not None else [],
+            "exclude_modules": exclude_modules if exclude_modules is not None else [],
             "match_all_linear": match_all_linear,
             ...
         }
     )
292-294: Broad exception handling masks root cause errors.

Catching bare Exception and calling pytest.skip hides the actual failure reason, making debugging difficult. Consider catching specific exceptions or at least logging the full traceback.
     except Exception as e:
-        print(f"Error during setup: {e}")
+        import traceback
+        print(f"Error during setup: {e}\n{traceback.format_exc()}")
         pytest.skip(f"Setup failed: {e}")
nemo_rl/models/policy/dtensor_policy_worker_v2.py (1)
1755-1764: Potential AttributeError when peft_config is None.

On line 1758, self.peft_config.to_dict() is called when self.lora_enabled and self.peft_config is truthy. However, the conditional structure means if lora_enabled is True but peft_config is somehow None, this would fail. The current code should be safe due to the initialization logic, but consider simplifying:
         refit_info = {
             "weights": state_dict_info,
             "lora_enabled": self.lora_enabled,
-            "lora_config": self.peft_config.to_dict()
-            if self.lora_enabled and self.peft_config
-            else None,
+            "lora_config": self.peft_config.to_dict() if self.peft_config else None,
             "lora_weights": lora_weight_names if self.lora_enabled else None,
         }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25ff3f6 and 20c357c.

📒 Files selected for processing (8)

examples/configs/recipes/llm/sft-llama3.2-1b-1n8g-fsdp2tp1.v3.yaml (1 hunks)
examples/configs/sft.yaml (2 hunks)
nemo_rl/models/policy/__init__.py (2 hunks)
nemo_rl/models/policy/dtensor_policy_worker_v2.py (11 hunks)
nemo_rl/models/policy/lm_policy.py (1 hunks)
tests/functional/test_automodel_lora_sft.sh (1 hunks)
tests/unit/models/policy/test_dtensor_worker.py (8 hunks)
tests/unit/utils/test_automodel_checkpoint.py (4 hunks)

🧰 Additional context used

📓 Path-based instructions (7)

examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When adding support for a new model, create a recipe YAML under examples/configs/recipes/ in the appropriate domain subdirectory (llm, vlm, etc.)

Files:

examples/configs/recipes/llm/sft-llama3.2-1b-1n8g-fsdp2tp1.v3.yaml

examples/configs/recipes/llm/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Recipe YAML files should follow the naming pattern: --ng-[-modifiers][-long][.vN].yaml for LLM recipes

Files:

examples/configs/recipes/llm/sft-llama3.2-1b-1n8g-fsdp2tp1.v3.yaml

!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

examples/configs/recipes/llm/sft-llama3.2-1b-1n8g-fsdp2tp1.v3.yaml
nemo_rl/models/policy/lm_policy.py
tests/functional/test_automodel_lora_sft.sh
nemo_rl/models/policy/__init__.py
examples/configs/sft.yaml
tests/unit/utils/test_automodel_checkpoint.py
tests/unit/models/policy/test_dtensor_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Conform code to Python 3.12+
Indent code with 4 spaces. Do not use tabs
Use snake_case for file names
Use PascalCase for class names
Use snake_case for function and method names
Use snake_case for local variables
Prefix variable names that start with a number with 'k' (e.g., k_99th_percentile)
Use upper snake_case with 'G' prefix for global variables (e.g., G_MY_GLOBAL)
Use upper snake_case for constants
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
Prefer docstrings over comments for interfaces that may be used outside a file
Reserve comments for code within a function or interfaces that are local to a file
If a piece of code is commented out, include a comment describing its usage and why it's commented out. Remove debug comments before merging
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx
Avoid using reflection when functionality can be easily achieved without reflection
When using try-except blocks, limit the except clause to the smallest set of specific errors possible
When using try-except blocks for duck-typing, keep the body of the try as small as possible and use the else block for logic
YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values
For required configuration attributes, access config directly and expect presence (e.g., policy_cfg['precision']) without hidden defaults
Use typing.NotRequired to mark optional attributes in TypedDict for configuration
When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
Follow the Google Python Style Guide for Python code

Files:

nemo_rl/models/policy/lm_policy.py
nemo_rl/models/policy/__init__.py
tests/unit/utils/test_automodel_checkpoint.py
tests/unit/models/policy/test_dtensor_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

For any source file under nemo_rl/*.py that defines a class or function decorated with @ray.remote, add a coverage pragma (# pragma: no cover) because these run in separate Ray processes

Files:

nemo_rl/models/policy/lm_policy.py
nemo_rl/models/policy/__init__.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

**/*.{py,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)

Files:

nemo_rl/models/policy/lm_policy.py
tests/functional/test_automodel_lora_sft.sh
nemo_rl/models/policy/__init__.py
tests/unit/utils/test_automodel_checkpoint.py
tests/unit/models/policy/test_dtensor_worker.py
nemo_rl/models/policy/dtensor_policy_worker_v2.py

**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Use uv run instead of python to execute scripts
Follow the Google Shell Style Guide for shell scripts

Files:

tests/functional/test_automodel_lora_sft.sh

🧠 Learnings (5)

📚 Learning: 2025-09-19T07:28:29.887Z

Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.

Applied to files:

tests/functional/test_automodel_lora_sft.sh

📚 Learning: 2025-10-12T14:46:55.513Z

Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Applied to files:

tests/functional/test_automodel_lora_sft.sh

📚 Learning: 2025-11-24T17:24:41.976Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Applied to files:

tests/functional/test_automodel_lora_sft.sh

📚 Learning: 2025-11-24T17:24:41.976Z

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to **/*.py : Use typing.NotRequired to mark optional attributes in TypedDict for configuration

Applied to files:

nemo_rl/models/policy/__init__.py

📚 Learning: 2025-10-30T20:50:44.126Z

Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.

Applied to files:

examples/configs/sft.yaml

🧬 Code graph analysis (3)

tests/unit/utils/test_automodel_checkpoint.py (1)

nemo_rl/utils/automodel_checkpoint.py (2)

save_checkpoint (94-181)

load_checkpoint (184-240)

tests/unit/models/policy/test_dtensor_worker.py (4)

nemo_rl/models/policy/lm_policy.py (1)

shutdown (752-759)

nemo_rl/models/policy/dtensor_policy_worker_v2.py (1)

shutdown (2020-2025)

nemo_rl/distributed/virtual_cluster.py (1)

shutdown (477-496)

nemo_rl/models/policy/dtensor_policy_worker.py (1)

shutdown (1942-1947)

nemo_rl/models/policy/dtensor_policy_worker_v2.py (2)

nemo_rl/models/policy/dtensor_policy_worker.py (1)

prepare_refit_info (1729-1736)

nemo_rl/models/policy/interfaces.py (1)

prepare_refit_info (157-158)

🪛 Ruff (0.14.6)

tests/unit/utils/test_automodel_checkpoint.py

478-478: Unused method argument: init_distributed

(ARG002)

tests/unit/models/policy/test_dtensor_worker.py

125-125: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

126-126: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

292-292: Do not catch blind exception: Exception

(BLE001)

672-672: Unused function argument: use_v2

(ARG001)

702-702: Unused function argument: use_v2

(ARG001)

🪛 Shellcheck (0.11.0)

tests/functional/test_automodel_lora_sft.sh

[error] 39-39: Double quote array expansions to avoid re-splitting elements.

(SC2068)

🔇 Additional comments (9)

examples/configs/recipes/llm/sft-llama3.2-1b-1n8g-fsdp2tp1.v3.yaml (1)
8-10: LoRA is not explicitly enabled in this recipe.

The recipe sets lora.dim: 32 but doesn't set lora.enabled: true. Since the base sft.yaml defaults lora.enabled to false, this recipe won't actually enable LoRA training unless enabled: true is added or overridden at runtime.

If this is a LoRA recipe (as suggested by the v3 suffix), consider adding:
   dtensor_cfg:
     lora:
+      enabled: true
       dim: 32
tests/functional/test_automodel_lora_sft.sh (1)

1-4: Missing NVIDIA copyright header.

Per coding guidelines, shell scripts (excluding tests under tests/) should include the NVIDIA copyright header. However, I see this is under tests/functional/, so this may be intentional. Please verify if functional tests should have the header.

tests/unit/utils/test_automodel_checkpoint.py (1)

477-479: The init_distributed fixture argument is intentionally used for its side effect.

The static analysis warning about unused init_distributed is a false positive. The fixture is used for its setup/teardown side effects (initializing and destroying the process group). This pattern is common in pytest.

tests/unit/models/policy/test_dtensor_worker.py (1)

672-674: The use_v2 parameter is used indirectly via _get_use_v2(request).

The static analysis warning about unused use_v2 is a false positive. The parameter is accessed through request.node.callspec.params inside _get_use_v2(), which reads it from the pytest parametrization context.

nemo_rl/models/policy/dtensor_policy_worker_v2.py (5)

1763-1764: Refit info currently only returns weights, ignoring other LoRA metadata.

The prepare_refit_info method builds a comprehensive refit_info dict with lora_enabled, lora_config, and lora_weights, but then only returns refit_info["weights"]. The comment on line 1763 explains this is intentional ("Lora have not fully supported..."), but this may cause issues when GRPO LoRA support is added.

Consider tracking this as a follow-up item when GRPO LoRA support is implemented.

103-109: LGTM - Temporary patch for LoRA weight initialization.

The TODO comment clearly indicates this is a temporary patch pending an Automodel version bump. The Xavier/Kaiming initialization logic is correct.

265-266: LGTM - LoRA applied after base model load.

Correctly applies LoRA to linear modules after loading the base model on rank 0, before broadcasting state dict.

290-291: LGTM - LoRA applied to FSDP-wrapped model.

LoRA is correctly applied to the empty model before FSDP parallelization, ensuring consistent architecture across ranks.

1843-1857: LGTM - Filtered state dict iterator for LoRA-only broadcasting.

Clean implementation that filters state dict to only include LoRA weights when LoRA is enabled, reducing data transfer overhead during weight synchronization.

coderabbitai · 2025-12-01T08:03:35Z

nemo_rl/models/policy/__init__.py

+class LoRAConfig(TypedDict):
+    enabled: bool
+    target_modules: NotRequired[list[str]]
+    exclude_modules: NotRequired[list[str]]
+    match_all_linear: NotRequired[bool]
+    dim: NotRequired[int]
+    alpha: NotRequired[int]
+    dropout: NotRequired[float]
+    dropout_position: NotRequired[Literal["pre", "post"]]
+    lora_A_init: NotRequired[str]
+    use_triton: NotRequired[bool]
+


⚠️ Potential issue | 🟡 Minor

Missing lora_dtype field in LoRAConfig TypedDict.

The sft.yaml configuration includes lora_dtype: ${policy.precision}, but this field is not defined in the LoRAConfig TypedDict. This will cause type-checking inconsistencies.

Apply this diff to add the missing field:

class LoRAConfig(TypedDict): enabled: bool target_modules: NotRequired[list[str]] exclude_modules: NotRequired[list[str]] match_all_linear: NotRequired[bool] dim: NotRequired[int] alpha: NotRequired[int] dropout: NotRequired[float] dropout_position: NotRequired[Literal["pre", "post"]] lora_A_init: NotRequired[str] + lora_dtype: NotRequired[str] use_triton: NotRequired[bool]

🤖 Prompt for AI Agents

In nemo_rl/models/policy/__init__.py around lines 24 to 35, the LoRAConfig TypedDict is missing the lora_dtype field referenced in sft.yaml; add a NotRequired[literal union or type alias] entry named lora_dtype (e.g. NotRequired[str] or NotRequired[torch.dtype/str matching project conventions]) to the TypedDict so type-checkers recognize the configuration key, keeping optionality consistent with other NotRequired fields.

I think we need to fix this.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find and examine sft.yaml to verify lora_dtype usage fd -t f "sft.yaml" --exec cat {}

Length of output: 7258

@joyang-nv Confirmed! I verified that lora_dtype: ${policy.precision} is indeed used in examples/configs/sft.yaml (line 60) but is missing from the LoRAConfig TypedDict.

Here's the fix to add the missing field:

class LoRAConfig(TypedDict): enabled: bool target_modules: NotRequired[list[str]] exclude_modules: NotRequired[list[str]] match_all_linear: NotRequired[bool] dim: NotRequired[int] alpha: NotRequired[int] dropout: NotRequired[float] dropout_position: NotRequired[Literal["pre", "post"]] lora_A_init: NotRequired[str] lora_dtype: NotRequired[str] use_triton: NotRequired[bool]

This will ensure type consistency with the configuration files.

🧬 Code Graph Analysis Results

File: nemo_rl/models/generation/interfaces.py, lines 118-131

class GenerationConfig(TypedDict): """Configuration for generation.""" backend: str max_new_tokens: int temperature: float top_p: float top_k: int | None model_name: NotRequired[str] # Not Required b/c GRPO writes this stop_token_ids: list[int] | None stop_strings: list[str] | None colocated: NotRequired[ColocationConfig] # This isn't meant to be passed by the user, but is populated by nemo_rl.models.generation.__init__.configure_generation_config _pad_token_id: NotRequired[int]

File: nemo_rl/models/policy/megatron_policy_worker.py, lines 260-272

def freeze_moe_router(megatron_model): if not isinstance(megatron_model, list): megatron_model = [megatron_model] for model_module in megatron_model: # Handle both wrapped (Float16Module) and unwrapped models if isinstance(model_module, Float16Module): model_module = model_module.module # Handle VLM models if hasattr(model_module, "language_model"): model_module = model_module.language_model for layer in model_module.decoder.layers: if hasattr(layer, "mlp") and hasattr(layer.mlp, "router"): layer.mlp.router.weight.requires_grad = False

could you make all of these required if they are required when enabled=True? you can get the type checker to respect notrequired if false and required if true by doing something like this:

https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/models/policy/__init__.py#L191

i see later on there's this comment about the lora_dtype

but will that override the value chosen here. +1 on resolving this before merging

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

coderabbitai · 2025-12-01T08:03:35Z

nemo_rl/models/policy/dtensor_policy_worker_v2.py

+        # patch the init_lora_weights method to use the xavier initialization
+        _lora_mod.LinearLoRA.init_lora_weights = _patched_init_lora_weights


⚠️ Potential issue | 🟠 Major

Global monkey-patch affects all LoRA instances in the process.

Patching _lora_mod.LinearLoRA.init_lora_weights globally affects all LoRA instances across the entire process, not just this worker. This could cause issues if multiple workers or other code paths rely on the original initialization behavior.

Consider applying the patch in a more scoped manner, or add a guard to prevent re-patching.

+# Guard to prevent re-patching in multi-worker scenarios +_LORA_INIT_PATCHED = False + # TODO: @ruit remove this once the bump Automodel to 2d20e33a19d5e53a271b1403b507475e68ad14dc def _patched_init_lora_weights(self, init_method: str): ... ... - # patch the init_lora_weights method to use the xavier initialization - _lora_mod.LinearLoRA.init_lora_weights = _patched_init_lora_weights + # patch the init_lora_weights method to use the xavier initialization + global _LORA_INIT_PATCHED + if not _LORA_INIT_PATCHED: + _lora_mod.LinearLoRA.init_lora_weights = _patched_init_lora_weights + _LORA_INIT_PATCHED = True

coderabbitai · 2025-12-01T08:03:35Z

nemo_rl/models/policy/lm_policy.py

+                assert config.get("lora", {}).get("enabled", False) is False, (
+                    "LoRA is not supported for DTensorPolicyWorker V1"
+                )


⚠️ Potential issue | 🔴 Critical

Bug: Incorrect config path for LoRA check.

The assertion checks config.get("lora", ...) but LoRA configuration is nested under dtensor_cfg.lora, not at the top level. This means the guard will never trigger, allowing LoRA to be incorrectly used with V1.

Apply this diff to fix the config path:

- assert config.get("lora", {}).get("enabled", False) is False, ( + assert config.get("dtensor_cfg", {}).get("lora", {}).get("enabled", False) is False, ( "LoRA is not supported for DTensorPolicyWorker V1" )

🤖 Prompt for AI Agents

In nemo_rl/models/policy/lm_policy.py around lines 117 to 119, the assertion is checking the wrong config path — it reads config.get("lora", ...) but LoRA lives under config["dtensor_cfg"]["lora"]; update the assertion to check config.get("dtensor_cfg", {}).get("lora", {}).get("enabled", False) is False so the guard correctly prevents LoRA from being enabled for DTensorPolicyWorker V1.

coderabbitai · 2025-12-01T08:03:36Z

tests/functional/test_automodel_lora_sft.sh

+    checkpointing.enabled=true \
+    checkpointing.save_period=3 \
+    checkpointing.checkpoint_dir=/tmp/lora_sft_checkpoints \
+    $@ \


⚠️ Potential issue | 🟡 Minor

Quote the array expansion to prevent word splitting.

Static analysis correctly identifies that $@ should be quoted as "$@" to avoid re-splitting elements containing spaces.

- $@ \ + "$@" \

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

$@ \

"$@" \

🧰 Tools

🪛 Shellcheck (0.11.0)

[error] 39-39: Double quote array expansions to avoid re-splitting elements.

(SC2068)

🤖 Prompt for AI Agents

In tests/functional/test_automodel_lora_sft.sh around line 39, the unquoted array expansion `$@` may cause word splitting; change it to `"$@"` in the command invocation so each original argument is preserved as a single parameter (i.e., replace `$@` with `"$@"`).

Oh, I think this is a good catch!

@joyang-nv, thank you for confirming! I noticed something interesting though - there's a learning in the knowledge base (ID: ccad141e-7829-44b1-98d0-ec79c01decb2) that states:

"In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use cd $PROJECT_ROOT without quotes or error handling, and pass arguments with $@ unquoted."

This learning conflicts with the current suggestion to quote $@. Since you've confirmed that quoting is the correct approach, should I remove this obsolete learning from the knowledge base? This will help ensure future reviews recommend the proper shell scripting practice of quoting array expansions.

🧠 Learnings used

Learnt from: zpqiu Repo: NVIDIA-NeMo/RL PR: 1324 File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30 Timestamp: 2025-10-12T14:46:55.513Z Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Learnt from: CR Repo: NVIDIA-NeMo/RL PR: 0 File: CODING_GUIDELINES.md:0-0 Timestamp: 2025-11-24T17:24:41.976Z Learning: Applies to tests/test_suites/**/*.sh : Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

coderabbitai · 2025-12-01T08:03:36Z

tests/unit/utils/test_automodel_checkpoint.py

+@pytest.fixture
+def mock_distributed():
+    """Mock torch.distributed calls for non-distributed tests."""
+    with (
+        patch("torch.distributed.is_initialized", return_value=False),
+        patch("torch.distributed.get_rank", return_value=0),
+    ):
+        yield
+    torch.distributed.destroy_process_group()
+


⚠️ Potential issue | 🟠 Major

Bug: mock_distributed fixture calls destroy_process_group after mocking is_initialized.

The fixture patches is_initialized to return False, but then unconditionally calls destroy_process_group() after yield. If distributed was never actually initialized (which is the case when mocking), this will raise an error.

Apply this diff to fix the cleanup logic:

@pytest.fixture def mock_distributed(): """Mock torch.distributed calls for non-distributed tests.""" with ( patch("torch.distributed.is_initialized", return_value=False), patch("torch.distributed.get_rank", return_value=0), ): yield - torch.distributed.destroy_process_group() + # Don't call destroy_process_group - we were mocking, not actually initializing

🤖 Prompt for AI Agents

In tests/unit/utils/test_automodel_checkpoint.py around lines 92 to 101, the teardown unconditionally calls torch.distributed.destroy_process_group() which can raise if the process group isn’t initialized; update the fixture to call torch.distributed.destroy_process_group() only if torch.distributed.is_initialized() returns True (i.e., guard the cleanup with an if check) so destruction runs only when a process group actually exists.

coderabbitai · 2025-12-01T08:03:36Z

tests/unit/utils/test_automodel_checkpoint.py

+        initial_distribute = torch.distributed.is_initialized()
+        print(f"Initial distribute: {initial_distribute}")


🛠️ Refactor suggestion | 🟠 Major

Remove debug print statement.

This appears to be leftover debug code that should be removed before merging.

- initial_distribute = torch.distributed.is_initialized() - print(f"Initial distribute: {initial_distribute}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

initial_distribute = torch.distributed.is_initialized()

print(f"Initial distribute: {initial_distribute}")

🤖 Prompt for AI Agents

In tests/unit/utils/test_automodel_checkpoint.py around lines 520-521 there is a leftover debug print statement printing the initial distributed status; remove the print(f"Initial distribute: {initial_distribute}") line so the test file contains only the assignment to initial_distribute (or remove both lines if the variable is unused) and ensure no debug prints remain.

Only P1. :)

@joyang-nv Got it! Thanks for the clarification on the priority. The debug print statement should still be removed before merging. 👍

RayenTian · 2025-12-01T08:03:38Z

Hi, @ffrujeri. Would you please help review this, especially the tests? I made significant changes to tests/unit/models/policy/test_dtensor_worker.py.

Signed-off-by: ruit <ruit@nvidia.com>

github-actions · 2025-12-02T05:07:31Z

⚠️ File Consistency Check

Check based on commit: b33ce28 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

_{This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.}

joyang-nv

Can you also add a test case similar to 3rdparty/Automodel-workspace/Automodel/tests/unit_tests/_peft/test_lora.py::test_forward_output_consistency()?

joyang-nv · 2025-12-02T14:44:29Z

nemo_rl/models/policy/__init__.py

+class LoRAConfig(TypedDict):
+    enabled: bool
+    target_modules: NotRequired[list[str]]
+    exclude_modules: NotRequired[list[str]]
+    match_all_linear: NotRequired[bool]
+    dim: NotRequired[int]
+    alpha: NotRequired[int]
+    dropout: NotRequired[float]
+    dropout_position: NotRequired[Literal["pre", "post"]]
+    lora_A_init: NotRequired[str]
+    use_triton: NotRequired[bool]
+


I think we need to fix this.

joyang-nv · 2025-12-02T14:49:54Z

tests/functional/test_automodel_lora_sft.sh

+    checkpointing.enabled=true \
+    checkpointing.save_period=3 \
+    checkpointing.checkpoint_dir=/tmp/lora_sft_checkpoints \
+    $@ \


Oh, I think this is a good catch!

joyang-nv · 2025-12-02T14:50:50Z

tests/unit/utils/test_automodel_checkpoint.py

+        initial_distribute = torch.distributed.is_initialized()
+        print(f"Initial distribute: {initial_distribute}")


Only P1. :)

hemildesai · 2025-12-02T16:48:09Z

nemo_rl/models/policy/dtensor_policy_worker_v2.py

            )

+            if self.peft_config is not None:
+                apply_lora_to_linear_modules(model, self.peft_config)


Looks like this function is called twice (on this line and line 291). Is that expected?

terrykong · 2025-12-02T18:24:29Z

nemo_rl/models/policy/__init__.py

+class LoRAConfig(TypedDict):
+    enabled: bool
+    target_modules: NotRequired[list[str]]
+    exclude_modules: NotRequired[list[str]]
+    match_all_linear: NotRequired[bool]
+    dim: NotRequired[int]
+    alpha: NotRequired[int]
+    dropout: NotRequired[float]
+    dropout_position: NotRequired[Literal["pre", "post"]]
+    lora_A_init: NotRequired[str]
+    use_triton: NotRequired[bool]
+


could you make all of these required if they are required when enabled=True? you can get the type checker to respect notrequired if false and required if true by doing something like this:

https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/models/policy/__init__.py#L191

terrykong · 2025-12-02T18:25:09Z

nemo_rl/models/policy/__init__.py

+    enabled: bool
+    target_modules: NotRequired[list[str]]
+    exclude_modules: NotRequired[list[str]]
+    match_all_linear: NotRequired[bool]
+    dim: NotRequired[int]
+    alpha: NotRequired[int]
+    dropout: NotRequired[float]
+    dropout_position: NotRequired[Literal["pre", "post"]]
+    lora_A_init: NotRequired[str]
+    use_triton: NotRequired[bool]


could you please help document these flags?

terrykong · 2025-12-02T18:27:13Z

nemo_rl/models/policy/dtensor_policy_worker_v2.py



+# TODO: @ruit remove this once the bump Automodel to 2d20e33a19d5e53a271b1403b507475e68ad14dc (https://github.com/NVIDIA-NeMo/RL/issues/1586)
+def _patched_init_lora_weights(self, init_method: str):


i can see this function silently getting ignored. could you add a unit test that does a check and will fail when it's okay to remove? something in spirit to this

RL/tests/unit/models/huggingface/test_smolvlm_embeddings_bug.py

Line 172 in 40e7040

"If this fails, that means the upstream bug has been fixed. You can close this issue: https://github.com/huggingface/transformers/issues/41190"

terrykong · 2025-12-02T18:32:04Z

nemo_rl/models/policy/__init__.py

+class LoRAConfig(TypedDict):
+    enabled: bool
+    target_modules: NotRequired[list[str]]
+    exclude_modules: NotRequired[list[str]]
+    match_all_linear: NotRequired[bool]
+    dim: NotRequired[int]
+    alpha: NotRequired[int]
+    dropout: NotRequired[float]
+    dropout_position: NotRequired[Literal["pre", "post"]]
+    lora_A_init: NotRequired[str]
+    use_triton: NotRequired[bool]
+


i see later on there's this comment about the lora_dtype

but will that override the value chosen here. +1 on resolving this before merging

samodi-nv self-assigned this Nov 21, 2025

NVIDIA-NeMo deleted a comment from github-actions bot Nov 21, 2025

RayenTian mentioned this pull request Nov 24, 2025

test: LoRA support for DTensorV2 path for CI #1559

Closed

4 tasks

RayenTian force-pushed the samodi/automodel-lora branch from 45bb8b8 to fedecbc Compare November 25, 2025 04:53

RayenTian force-pushed the samodi/automodel-lora branch from fedecbc to 3356fc4 Compare November 26, 2025 03:43

samodi-nv and others added 5 commits November 30, 2025 00:30

initial commit

c4f13da

fix: update model name and configuration in sft_lora.yaml; enhance de…

286b471

…bug logging in llm_message_utils.py; adjust lora_dtype in dtensor_policy_worker_v2.py Signed-off-by: ruit <ruit@nvidia.com>

Deepcoyp of peft config.

509e0ad

Signed-off-by: Jonas Yang <joyang@nvidia.com>

remove debug code

ef5e932

Signed-off-by: ruit <ruit@nvidia.com>

add unit test and clean code

bac01be

Signed-off-by: ruit <ruit@nvidia.com>

RayenTian force-pushed the samodi/automodel-lora branch from 7272936 to bac01be Compare November 30, 2025 08:30

refactor: update .pre-commit-config.yaml to enable minimize-check hoo…

641b985

…ks for llm and vlm recipes; remove unused sft-llama3.1-8b-1n8g-dtensor-lora configuration and related test scripts; fix tokenizer model path in unit tests Signed-off-by: ruit <ruit@nvidia.com>

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Nov 30, 2025

RayenTian temporarily deployed to nemo-ci November 30, 2025 08:37 — with GitHub Actions Inactive

remove unit test param

b1a0fb6

Signed-off-by: ruit <ruit@nvidia.com>

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 30, 2025

RayenTian temporarily deployed to nemo-ci November 30, 2025 09:27 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci November 30, 2025 09:28 — with GitHub Actions Inactive

RayenTian had a problem deploying to nemo-ci November 30, 2025 14:33 — with GitHub Actions Failure

RayenTian had a problem deploying to nemo-ci December 1, 2025 02:06 — with GitHub Actions Failure

fix: update LoRA weight initialization method in DTensorPolicyWorkerV…

20c357c

…2; adjust return value for refit_info to only include weights Signed-off-by: ruit <ruit@nvidia.com>

RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Dec 1, 2025

RayenTian temporarily deployed to nemo-ci December 1, 2025 03:18 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci December 1, 2025 03:19 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci December 1, 2025 05:17 — with GitHub Actions Inactive

RayenTian requested a review from joyang-nv December 1, 2025 06:55

RayenTian changed the title ~~feat: LoRA support for DTensorV2 path~~ feat: LoRA SFT support for DTensorV2 path Dec 1, 2025

RayenTian marked this pull request as ready for review December 1, 2025 07:57

RayenTian requested review from a team as code owners December 1, 2025 07:57

RayenTian requested a review from ffrujeri December 1, 2025 08:01

coderabbitai bot reviewed Dec 1, 2025

View reviewed changes

remove grpo related code

b33ce28

Signed-off-by: ruit <ruit@nvidia.com>

RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 2, 2025

RayenTian temporarily deployed to nemo-ci December 2, 2025 05:09 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci December 2, 2025 06:14 — with GitHub Actions Inactive

RayenTian temporarily deployed to nemo-ci December 2, 2025 08:58 — with GitHub Actions Inactive

joyang-nv requested changes Dec 2, 2025

View reviewed changes

joyang-nv requested a review from hemildesai December 2, 2025 14:55

hemildesai mentioned this pull request Dec 2, 2025

feat: refactor dtensor v2 policy __init__ and introduce core types #1588

Draft

4 tasks

hemildesai reviewed Dec 2, 2025

View reviewed changes

terrykong reviewed Dec 2, 2025

View reviewed changes

terrykong linked an issue Dec 4, 2025 that may be closed by this pull request

LoRA DTensor SFT #1596

Open

		# patch the init_lora_weights method to use the xavier initialization
		_lora_mod.LinearLoRA.init_lora_weights = _patched_init_lora_weights

		initial_distribute = torch.distributed.is_initialized()
		print(f"Initial distribute: {initial_distribute}")



		# TODO: @ruit remove this once the bump Automodel to 2d20e33a19d5e53a271b1403b507475e68ad14dc (https://github.com/NVIDIA-NeMo/RL/issues/1586)
		def _patched_init_lora_weights(self, init_method: str):

feat: LoRA SFT support for DTensorV2 path #1556

Are you sure you want to change the base?

feat: LoRA SFT support for DTensorV2 path #1556

Conversation

samodi-nv commented Nov 21, 2025 • edited by RayenTian Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issues

Test with thinking machine config

Description

Notes

Summary by CodeRabbit

Uh oh!

github-actions bot commented Nov 25, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Nov 26, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Nov 27, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Nov 30, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Nov 30, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Nov 30, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Nov 30, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

github-actions bot commented Dec 1, 2025

⚠️ File Consistency Check

⚠️ DTensor Policy Worker Synchronization Warning

Uh oh!

RayenTian commented Dec 1, 2025

Uh oh!

coderabbitai bot commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

samodi-nv commented Nov 21, 2025 •

edited by RayenTian

Loading

coderabbitai bot commented Dec 1, 2025 •

edited

Loading

coderabbitai bot Dec 1, 2025 •

edited

Loading

coderabbitai bot Dec 1, 2025 •

edited

Loading

coderabbitai bot Dec 1, 2025 •

edited

Loading