Skip to content

Conversation

@samodi-nv
Copy link

@samodi-nv samodi-nv commented Nov 21, 2025

Issues

Addresses #833

Test with thinking machine config

image

Description

This PR is a a work in progress to add LoRA support for the DTensor path.

Current status

Notes

  1. Support SFT + Lora. The result aligned with Thinking Machines blog.
  2. The previous grad spike was due to a bug in the automodel initialization method. Modifications to the automodel have been merged into the main branch of the automodel. However, because our submodule is currently using a version significantly different from the main branch, a patch has been applied.
  3. Our current commit usage for the automodel submodule is somewhat outdated. @RayenTian will later create a new dedicated branch for nemorl within the automodel repository and dump the changes to this branch. Once complete, this patch can be deleted.

Summary by CodeRabbit

  • New Features

    • Added LoRA (Low-Rank Adaptation) configuration support for parameter-efficient fine-tuning in supervised fine-tuning workflows, including customizable settings for module targeting, dimensionality, and dropout.
    • LoRA weights are now properly handled during checkpoint saving and loading.
  • Tests

    • Added functional and unit tests for LoRA-enabled training and checkpoint management.

✏️ Tip: You can customize this high-level summary in your review settings.

@samodi-nv samodi-nv self-assigned this Nov 21, 2025
@NVIDIA-NeMo NVIDIA-NeMo deleted a comment from github-actions bot Nov 21, 2025
@RayenTian RayenTian force-pushed the samodi/automodel-lora branch from 45bb8b8 to fedecbc Compare November 25, 2025 04:53
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: fedecbc (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@RayenTian RayenTian force-pushed the samodi/automodel-lora branch from fedecbc to 3356fc4 Compare November 26, 2025 03:43
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 3356fc4 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: b7c0c10 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 7272936 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

samodi-nv and others added 5 commits November 30, 2025 00:30
…bug logging in llm_message_utils.py; adjust lora_dtype in dtensor_policy_worker_v2.py

Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: Jonas Yang <joyang@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
Signed-off-by: ruit <ruit@nvidia.com>
@RayenTian RayenTian force-pushed the samodi/automodel-lora branch from 7272936 to bac01be Compare November 30, 2025 08:30
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: bac01be (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

…ks for llm and vlm recipes; remove unused sft-llama3.1-8b-1n8g-dtensor-lora configuration and related test scripts; fix tokenizer model path in unit tests

Signed-off-by: ruit <ruit@nvidia.com>
@RayenTian RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Nov 30, 2025
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: 641b985 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

Signed-off-by: ruit <ruit@nvidia.com>
@github-actions
Copy link

⚠️ File Consistency Check

Check based on commit: b1a0fb6 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Nov 30, 2025
…2; adjust return value for refit_info to only include weights

Signed-off-by: ruit <ruit@nvidia.com>
@RayenTian RayenTian added the CI:L1 Run doctests, unit tests, and functional tests label Dec 1, 2025
@github-actions
Copy link

github-actions bot commented Dec 1, 2025

⚠️ File Consistency Check

Check based on commit: 20c357c (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@RayenTian
Copy link
Contributor

Hi, @samodi-nv. I made a few updates on top of your original PR:

  1. The convergence issue was caused by the initialization method for lora weight in the automodel. The fixed code has already been merged into the automodel, but since we haven’t bumped to the latest commit yet, I temporarily added a patch in the dtensor worker. With this change, the results now line up correctly.
  2. I removed some debug code and added a few unit tests.
  3. I removed the Tulu 3 dataset from this PR, because that refactor: refactor env and data processor & add nemotron super 49b recipes #1506 also introduces Tulu 3 and refactors the dataset. It seems cleaner to wait for that PR to merge and then rebase.

After discussing with @joyang-nv , we’d like to first merge the SFT LoRA, and then add LoRA support for GRPO. Could you please review this PR again?

@RayenTian RayenTian changed the title feat: LoRA support for DTensorV2 path feat: LoRA SFT support for DTensorV2 path Dec 1, 2025
@RayenTian RayenTian marked this pull request as ready for review December 1, 2025 07:57
@RayenTian RayenTian requested review from a team as code owners December 1, 2025 07:57
@RayenTian RayenTian requested a review from ffrujeri December 1, 2025 08:01
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 1, 2025

📝 Walkthrough

Walkthrough

Introduces LoRA (Low-Rank Adaptation) configuration and integration support to the DTensor policy worker V2 for distributed training. Includes new configuration types, selective weight streaming for LoRA-only modes, checkpoint metadata handling, and test infrastructure for LoRA-based training scenarios.

Changes

Cohort / File(s) Summary
Configuration Examples
examples/configs/recipes/llm/sft-llama3.2-1b-1n8g-fsdp2tp1.v3.yaml, examples/configs/sft.yaml
Adds LoRA configuration blocks to DTensor config with parameters for LoRA dimensions, dropout, initialization, dtype, and Triton usage.
LoRA Configuration Types
nemo_rl/models/policy/__init__.py
Introduces LoRAConfig TypedDict with LoRA-specific settings (enabled, target_modules, exclude_modules, dim, alpha, dropout, initialization, use_triton) and adds optional lora field to DTensorConfig.
DTensor Policy Worker Integration
nemo_rl/models/policy/dtensor_policy_worker_v2.py
Adds LoRA application hooks, custom weight initialization patching, selective state dict filtering for LoRA-only weight streaming/broadcasting, and checkpoint metadata injection when LoRA is enabled.
DTensor V1 Compatibility
nemo_rl/models/policy/lm_policy.py
Adds assertion to prevent LoRA usage with DTensor V1 policy worker, enforcing V2 requirement.
Functional Testing
tests/functional/test_automodel_lora_sft.sh
Adds end-to-end LoRA SFT workflow test script with environment setup, experiment execution, and metrics validation.
Unit Testing Infrastructure
tests/unit/models/policy/test_dtensor_worker.py
Extends test scaffolding with LoRA configuration helpers, test batch creation, token logprob calculation, and updated parametrization for LoRA-enabled training paths.
Checkpoint Testing
tests/unit/utils/test_automodel_checkpoint.py
Adds LoRA application method to TestModel, mocking fixtures for distributed environments, and test coverage for LoRA weight save/load via safetensors with peft_config.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Primary attention area: nemo_rl/models/policy/dtensor_policy_worker_v2.py — Contains dense logic changes including custom weight initialization patching, state dict filtering across multiple methods (prepare_refit_info, save_checkpoint, stream/broadcast paths), and conditional LoRA application. Verify correctness of filtering logic and ensure streaming/broadcasting behavior is correct for both LoRA-enabled and LoRA-disabled paths.
  • Secondary attention: Test fixtures and parametrization updates in tests/unit/models/policy/test_dtensor_worker.py to ensure LoRA configuration paths are properly integrated with existing test flows.
  • Validation: Confirm compatibility check in lm_policy.py correctly prevents V1 + LoRA misconfiguration.

Suggested labels

CI

Suggested reviewers

  • parthchadha

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.76% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: LoRA SFT support for DTensorV2 path' clearly and specifically describes the main feature being added: LoRA support for the SFT (Supervised Fine-Tuning) workflow on the DTensorV2 path. It is concise, accurate to the changeset, and directly corresponds to the primary objective of the PR.
Test Results For Major Changes ✅ Passed PR includes functional and unit tests validating LoRA feature with numerical metrics checks; accuracy verified against external reference with convergence issues resolved and documented.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch samodi/automodel-lora

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (4)
examples/configs/sft.yaml (1)

48-59: LGTM - LoRA configuration block is well-structured.

The LoRA configuration provides sensible defaults and aligns with the LoRAConfig TypedDict. Minor note: Line 50 has a typo "precendence" → "precedence".

-      target_modules: [] # match all linear modules takes precendence
+      target_modules: [] # match all linear modules takes precedence
tests/unit/models/policy/test_dtensor_worker.py (2)

122-134: Mutable default arguments should be avoided.

Using mutable default arguments ([]) can lead to unexpected behavior if the lists are mutated. Although they aren't mutated here, it's best practice to use None and initialize within the function.

 def update_lora_config(
     config: PolicyConfig,
     enabled: bool = True,
-    target_modules: list[str] = [],
-    exclude_modules: list[str] = [],
+    target_modules: list[str] | None = None,
+    exclude_modules: list[str] | None = None,
     match_all_linear: bool = True,
     dim: int = 32,
     alpha: int = 32,
     dropout: float = 0.0,
     dropout_position: str = "post",
     lora_A_init: str = "xavier",
     use_triton: bool = True,
 ):
     if enabled:
         config["dtensor_cfg"]["_v2"] = True

     config["dtensor_cfg"]["lora"].update(
         {
             "enabled": enabled,
-            "target_modules": target_modules,
-            "exclude_modules": exclude_modules,
+            "target_modules": target_modules if target_modules is not None else [],
+            "exclude_modules": exclude_modules if exclude_modules is not None else [],
             "match_all_linear": match_all_linear,
             ...
         }
     )

292-294: Broad exception handling masks root cause errors.

Catching bare Exception and calling pytest.skip hides the actual failure reason, making debugging difficult. Consider catching specific exceptions or at least logging the full traceback.

     except Exception as e:
-        print(f"Error during setup: {e}")
+        import traceback
+        print(f"Error during setup: {e}\n{traceback.format_exc()}")
         pytest.skip(f"Setup failed: {e}")
nemo_rl/models/policy/dtensor_policy_worker_v2.py (1)

1755-1764: Potential AttributeError when peft_config is None.

On line 1758, self.peft_config.to_dict() is called when self.lora_enabled and self.peft_config is truthy. However, the conditional structure means if lora_enabled is True but peft_config is somehow None, this would fail. The current code should be safe due to the initialization logic, but consider simplifying:

         refit_info = {
             "weights": state_dict_info,
             "lora_enabled": self.lora_enabled,
-            "lora_config": self.peft_config.to_dict()
-            if self.lora_enabled and self.peft_config
-            else None,
+            "lora_config": self.peft_config.to_dict() if self.peft_config else None,
             "lora_weights": lora_weight_names if self.lora_enabled else None,
         }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 25ff3f6 and 20c357c.

📒 Files selected for processing (8)
  • examples/configs/recipes/llm/sft-llama3.2-1b-1n8g-fsdp2tp1.v3.yaml (1 hunks)
  • examples/configs/sft.yaml (2 hunks)
  • nemo_rl/models/policy/__init__.py (2 hunks)
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py (11 hunks)
  • nemo_rl/models/policy/lm_policy.py (1 hunks)
  • tests/functional/test_automodel_lora_sft.sh (1 hunks)
  • tests/unit/models/policy/test_dtensor_worker.py (8 hunks)
  • tests/unit/utils/test_automodel_checkpoint.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (7)
examples/configs/recipes/**/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When adding support for a new model, create a recipe YAML under examples/configs/recipes/ in the appropriate domain subdirectory (llm, vlm, etc.)

Files:

  • examples/configs/recipes/llm/sft-llama3.2-1b-1n8g-fsdp2tp1.v3.yaml
examples/configs/recipes/llm/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Recipe YAML files should follow the naming pattern: --ng-[-modifiers][-long][.vN].yaml for LLM recipes

Files:

  • examples/configs/recipes/llm/sft-llama3.2-1b-1n8g-fsdp2tp1.v3.yaml
!(**/tests/**|**/test_*.py|**/test_*.sh)

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

Add the NVIDIA copyright header to all Python files and shell scripts (excluding tests). The header should include the current year

Files:

  • examples/configs/recipes/llm/sft-llama3.2-1b-1n8g-fsdp2tp1.v3.yaml
  • nemo_rl/models/policy/lm_policy.py
  • tests/functional/test_automodel_lora_sft.sh
  • nemo_rl/models/policy/__init__.py
  • examples/configs/sft.yaml
  • tests/unit/utils/test_automodel_checkpoint.py
  • tests/unit/models/policy/test_dtensor_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Conform code to Python 3.12+
Indent code with 4 spaces. Do not use tabs
Use snake_case for file names
Use PascalCase for class names
Use snake_case for function and method names
Use snake_case for local variables
Prefix variable names that start with a number with 'k' (e.g., k_99th_percentile)
Use upper snake_case with 'G' prefix for global variables (e.g., G_MY_GLOBAL)
Use upper snake_case for constants
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
Prefer docstrings over comments for interfaces that may be used outside a file
Reserve comments for code within a function or interfaces that are local to a file
If a piece of code is commented out, include a comment describing its usage and why it's commented out. Remove debug comments before merging
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx
Avoid using reflection when functionality can be easily achieved without reflection
When using try-except blocks, limit the except clause to the smallest set of specific errors possible
When using try-except blocks for duck-typing, keep the body of the try as small as possible and use the else block for logic
YAML is the single source of truth for configuration defaults. Do not set non-None defaults in code for configuration values
For required configuration attributes, access config directly and expect presence (e.g., policy_cfg['precision']) without hidden defaults
Use typing.NotRequired to mark optional attributes in TypedDict for configuration
When adding a new config key to a TypedDict subclass, document the key's purpose, valid values/types, and recommended default, and reflect the default in exemplar YAMLs under examples/configs/*.yaml
Follow the Google Python Style Guide for Python code

Files:

  • nemo_rl/models/policy/lm_policy.py
  • nemo_rl/models/policy/__init__.py
  • tests/unit/utils/test_automodel_checkpoint.py
  • tests/unit/models/policy/test_dtensor_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

For any source file under nemo_rl/*.py that defines a class or function decorated with @ray.remote, add a coverage pragma (# pragma: no cover) because these run in separate Ray processes

Files:

  • nemo_rl/models/policy/lm_policy.py
  • nemo_rl/models/policy/__init__.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py
**/*.{py,sh}

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

The NVIDIA copyright header should appear at the top of all Python files and shell scripts (excluding tests)

Files:

  • nemo_rl/models/policy/lm_policy.py
  • tests/functional/test_automodel_lora_sft.sh
  • nemo_rl/models/policy/__init__.py
  • tests/unit/utils/test_automodel_checkpoint.py
  • tests/unit/models/policy/test_dtensor_worker.py
  • nemo_rl/models/policy/dtensor_policy_worker_v2.py
**/*.sh

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.sh: Use uv run instead of python to execute scripts
Follow the Google Shell Style Guide for shell scripts

Files:

  • tests/functional/test_automodel_lora_sft.sh
🧠 Learnings (5)
📚 Learning: 2025-09-19T07:28:29.887Z
Learnt from: shuo-nvidia
Repo: NVIDIA-NeMo/RL PR: 1006
File: tests/test_suites/llm/distillation-qwen3-32b-to-4b-base-2n8g-fsdp2tp2-long.v1.sh:1-4
Timestamp: 2025-09-19T07:28:29.887Z
Learning: The NVIDIA-NeMo/RL project prefers to maintain consistent formatting across test scripts rather than applying individual bash hardening improvements like `set -euo pipefail` or proper quoting for sourcing files.

Applied to files:

  • tests/functional/test_automodel_lora_sft.sh
📚 Learning: 2025-10-12T14:46:55.513Z
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Applied to files:

  • tests/functional/test_automodel_lora_sft.sh
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Applied to files:

  • tests/functional/test_automodel_lora_sft.sh
📚 Learning: 2025-11-24T17:24:41.976Z
Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to **/*.py : Use typing.NotRequired to mark optional attributes in TypedDict for configuration

Applied to files:

  • nemo_rl/models/policy/__init__.py
📚 Learning: 2025-10-30T20:50:44.126Z
Learnt from: adil-a
Repo: NVIDIA-NeMo/RL PR: 1440
File: examples/configs/sft_automodel.yaml:48-58
Timestamp: 2025-10-30T20:50:44.126Z
Learning: In DTensor configurations for MoE (Mixture of Experts) models, expert_parallel_size and data_parallel_size can be applied together without multiplying the GPU requirements. Expert Parallelism (EP) only applies to MoE layers, while Data Parallelism/FSDP applies to non-MoE layers. Therefore, configurations like expert_parallel_size: 8 and data_parallel_size: 8 are valid on an 8-GPU cluster for MoE models.

Applied to files:

  • examples/configs/sft.yaml
🧬 Code graph analysis (3)
tests/unit/utils/test_automodel_checkpoint.py (1)
nemo_rl/utils/automodel_checkpoint.py (2)
  • save_checkpoint (94-181)
  • load_checkpoint (184-240)
tests/unit/models/policy/test_dtensor_worker.py (4)
nemo_rl/models/policy/lm_policy.py (1)
  • shutdown (752-759)
nemo_rl/models/policy/dtensor_policy_worker_v2.py (1)
  • shutdown (2020-2025)
nemo_rl/distributed/virtual_cluster.py (1)
  • shutdown (477-496)
nemo_rl/models/policy/dtensor_policy_worker.py (1)
  • shutdown (1942-1947)
nemo_rl/models/policy/dtensor_policy_worker_v2.py (2)
nemo_rl/models/policy/dtensor_policy_worker.py (1)
  • prepare_refit_info (1729-1736)
nemo_rl/models/policy/interfaces.py (1)
  • prepare_refit_info (157-158)
🪛 Ruff (0.14.6)
tests/unit/utils/test_automodel_checkpoint.py

478-478: Unused method argument: init_distributed

(ARG002)

tests/unit/models/policy/test_dtensor_worker.py

125-125: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


126-126: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


292-292: Do not catch blind exception: Exception

(BLE001)


672-672: Unused function argument: use_v2

(ARG001)


702-702: Unused function argument: use_v2

(ARG001)

🪛 Shellcheck (0.11.0)
tests/functional/test_automodel_lora_sft.sh

[error] 39-39: Double quote array expansions to avoid re-splitting elements.

(SC2068)

🔇 Additional comments (9)
examples/configs/recipes/llm/sft-llama3.2-1b-1n8g-fsdp2tp1.v3.yaml (1)

8-10: LoRA is not explicitly enabled in this recipe.

The recipe sets lora.dim: 32 but doesn't set lora.enabled: true. Since the base sft.yaml defaults lora.enabled to false, this recipe won't actually enable LoRA training unless enabled: true is added or overridden at runtime.

If this is a LoRA recipe (as suggested by the v3 suffix), consider adding:

   dtensor_cfg:
     lora:
+      enabled: true
       dim: 32
tests/functional/test_automodel_lora_sft.sh (1)

1-4: Missing NVIDIA copyright header.

Per coding guidelines, shell scripts (excluding tests under tests/) should include the NVIDIA copyright header. However, I see this is under tests/functional/, so this may be intentional. Please verify if functional tests should have the header.

tests/unit/utils/test_automodel_checkpoint.py (1)

477-479: The init_distributed fixture argument is intentionally used for its side effect.

The static analysis warning about unused init_distributed is a false positive. The fixture is used for its setup/teardown side effects (initializing and destroying the process group). This pattern is common in pytest.

tests/unit/models/policy/test_dtensor_worker.py (1)

672-674: The use_v2 parameter is used indirectly via _get_use_v2(request).

The static analysis warning about unused use_v2 is a false positive. The parameter is accessed through request.node.callspec.params inside _get_use_v2(), which reads it from the pytest parametrization context.

nemo_rl/models/policy/dtensor_policy_worker_v2.py (5)

1763-1764: Refit info currently only returns weights, ignoring other LoRA metadata.

The prepare_refit_info method builds a comprehensive refit_info dict with lora_enabled, lora_config, and lora_weights, but then only returns refit_info["weights"]. The comment on line 1763 explains this is intentional ("Lora have not fully supported..."), but this may cause issues when GRPO LoRA support is added.

Consider tracking this as a follow-up item when GRPO LoRA support is implemented.


103-109: LGTM - Temporary patch for LoRA weight initialization.

The TODO comment clearly indicates this is a temporary patch pending an Automodel version bump. The Xavier/Kaiming initialization logic is correct.


265-266: LGTM - LoRA applied after base model load.

Correctly applies LoRA to linear modules after loading the base model on rank 0, before broadcasting state dict.


290-291: LGTM - LoRA applied to FSDP-wrapped model.

LoRA is correctly applied to the empty model before FSDP parallelization, ensuring consistent architecture across ranks.


1843-1857: LGTM - Filtered state dict iterator for LoRA-only broadcasting.

Clean implementation that filters state dict to only include LoRA weights when LoRA is enabled, reducing data transfer overhead during weight synchronization.

Comment on lines +24 to +35
class LoRAConfig(TypedDict):
enabled: bool
target_modules: NotRequired[list[str]]
exclude_modules: NotRequired[list[str]]
match_all_linear: NotRequired[bool]
dim: NotRequired[int]
alpha: NotRequired[int]
dropout: NotRequired[float]
dropout_position: NotRequired[Literal["pre", "post"]]
lora_A_init: NotRequired[str]
use_triton: NotRequired[bool]

Copy link
Contributor

@coderabbitai coderabbitai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Missing lora_dtype field in LoRAConfig TypedDict.

The sft.yaml configuration includes lora_dtype: ${policy.precision}, but this field is not defined in the LoRAConfig TypedDict. This will cause type-checking inconsistencies.

Apply this diff to add the missing field:

 class LoRAConfig(TypedDict):
     enabled: bool
     target_modules: NotRequired[list[str]]
     exclude_modules: NotRequired[list[str]]
     match_all_linear: NotRequired[bool]
     dim: NotRequired[int]
     alpha: NotRequired[int]
     dropout: NotRequired[float]
     dropout_position: NotRequired[Literal["pre", "post"]]
     lora_A_init: NotRequired[str]
+    lora_dtype: NotRequired[str]
     use_triton: NotRequired[bool]
🤖 Prompt for AI Agents
In nemo_rl/models/policy/__init__.py around lines 24 to 35, the LoRAConfig
TypedDict is missing the lora_dtype field referenced in sft.yaml; add a
NotRequired[literal union or type alias] entry named lora_dtype (e.g.
NotRequired[str] or NotRequired[torch.dtype/str matching project conventions])
to the TypedDict so type-checkers recognize the configuration key, keeping
optionality consistent with other NotRequired fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to fix this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find and examine sft.yaml to verify lora_dtype usage
fd -t f "sft.yaml" --exec cat {}

Length of output: 7258


@joyang-nv Confirmed! I verified that lora_dtype: ${policy.precision} is indeed used in examples/configs/sft.yaml (line 60) but is missing from the LoRAConfig TypedDict.

Here's the fix to add the missing field:

class LoRAConfig(TypedDict):
    enabled: bool
    target_modules: NotRequired[list[str]]
    exclude_modules: NotRequired[list[str]]
    match_all_linear: NotRequired[bool]
    dim: NotRequired[int]
    alpha: NotRequired[int]
    dropout: NotRequired[float]
    dropout_position: NotRequired[Literal["pre", "post"]]
    lora_A_init: NotRequired[str]
    lora_dtype: NotRequired[str]
    use_triton: NotRequired[bool]

This will ensure type consistency with the configuration files.


🧬 Code Graph Analysis Results
  • File: nemo_rl/models/generation/interfaces.py, lines 118-131
class GenerationConfig(TypedDict):
    """Configuration for generation."""

    backend: str
    max_new_tokens: int
    temperature: float
    top_p: float
    top_k: int | None
    model_name: NotRequired[str]  # Not Required b/c GRPO writes this
    stop_token_ids: list[int] | None
    stop_strings: list[str] | None
    colocated: NotRequired[ColocationConfig]
    # This isn't meant to be passed by the user, but is populated by nemo_rl.models.generation.__init__.configure_generation_config
    _pad_token_id: NotRequired[int]
  • File: nemo_rl/models/policy/megatron_policy_worker.py, lines 260-272
def freeze_moe_router(megatron_model):
            if not isinstance(megatron_model, list):
                megatron_model = [megatron_model]
            for model_module in megatron_model:
                # Handle both wrapped (Float16Module) and unwrapped models
                if isinstance(model_module, Float16Module):
                    model_module = model_module.module
                # Handle VLM models
                if hasattr(model_module, "language_model"):
                    model_module = model_module.language_model
                for layer in model_module.decoder.layers:
                    if hasattr(layer, "mlp") and hasattr(layer.mlp, "router"):
                        layer.mlp.router.weight.requires_grad = False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you make all of these required if they are required when enabled=True? you can get the type checker to respect notrequired if false and required if true by doing something like this:

https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/models/policy/__init__.py#L191

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see later on there's this comment about the lora_dtype
image

but will that override the value chosen here. +1 on resolving this before merging

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Comment on lines +246 to +247
# patch the init_lora_weights method to use the xavier initialization
_lora_mod.LinearLoRA.init_lora_weights = _patched_init_lora_weights
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Global monkey-patch affects all LoRA instances in the process.

Patching _lora_mod.LinearLoRA.init_lora_weights globally affects all LoRA instances across the entire process, not just this worker. This could cause issues if multiple workers or other code paths rely on the original initialization behavior.

Consider applying the patch in a more scoped manner, or add a guard to prevent re-patching.

+# Guard to prevent re-patching in multi-worker scenarios
+_LORA_INIT_PATCHED = False
+
 # TODO: @ruit remove this once the bump Automodel to 2d20e33a19d5e53a271b1403b507475e68ad14dc
 def _patched_init_lora_weights(self, init_method: str):
     ...

 ...
-        # patch the init_lora_weights method to use the xavier initialization
-        _lora_mod.LinearLoRA.init_lora_weights = _patched_init_lora_weights
+        # patch the init_lora_weights method to use the xavier initialization
+        global _LORA_INIT_PATCHED
+        if not _LORA_INIT_PATCHED:
+            _lora_mod.LinearLoRA.init_lora_weights = _patched_init_lora_weights
+            _LORA_INIT_PATCHED = True

Comment on lines +117 to +119
assert config.get("lora", {}).get("enabled", False) is False, (
"LoRA is not supported for DTensorPolicyWorker V1"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Bug: Incorrect config path for LoRA check.

The assertion checks config.get("lora", ...) but LoRA configuration is nested under dtensor_cfg.lora, not at the top level. This means the guard will never trigger, allowing LoRA to be incorrectly used with V1.

Apply this diff to fix the config path:

-                assert config.get("lora", {}).get("enabled", False) is False, (
+                assert config.get("dtensor_cfg", {}).get("lora", {}).get("enabled", False) is False, (
                     "LoRA is not supported for DTensorPolicyWorker V1"
                 )
🤖 Prompt for AI Agents
In nemo_rl/models/policy/lm_policy.py around lines 117 to 119, the assertion is
checking the wrong config path — it reads config.get("lora", ...) but LoRA lives
under config["dtensor_cfg"]["lora"]; update the assertion to check
config.get("dtensor_cfg", {}).get("lora", {}).get("enabled", False) is False so
the guard correctly prevents LoRA from being enabled for DTensorPolicyWorker V1.

checkpointing.enabled=true \
checkpointing.save_period=3 \
checkpointing.checkpoint_dir=/tmp/lora_sft_checkpoints \
$@ \
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Quote the array expansion to prevent word splitting.

Static analysis correctly identifies that $@ should be quoted as "$@" to avoid re-splitting elements containing spaces.

-    $@ \
+    "$@" \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
$@ \
"$@" \
🧰 Tools
🪛 Shellcheck (0.11.0)

[error] 39-39: Double quote array expansions to avoid re-splitting elements.

(SC2068)

🤖 Prompt for AI Agents
In tests/functional/test_automodel_lora_sft.sh around line 39, the unquoted
array expansion `$@` may cause word splitting; change it to `"$@"` in the
command invocation so each original argument is preserved as a single parameter
(i.e., replace `$@` with `"$@"`).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think this is a good catch!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joyang-nv, thank you for confirming! I noticed something interesting though - there's a learning in the knowledge base (ID: ccad141e-7829-44b1-98d0-ec79c01decb2) that states:

"In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use cd $PROJECT_ROOT without quotes or error handling, and pass arguments with $@ unquoted."

This learning conflicts with the current suggestion to quote $@. Since you've confirmed that quoting is the correct approach, should I remove this obsolete learning from the knowledge base? This will help ensure future reviews recommend the proper shell scripting practice of quoting array expansions.


🧠 Learnings used
Learnt from: zpqiu
Repo: NVIDIA-NeMo/RL PR: 1324
File: tests/test_suites/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-megatron-tp2pp2cp2-pack.sh:16-30
Timestamp: 2025-10-12T14:46:55.513Z
Learning: In the NVIDIA-NeMo/RL repository, test scripts under tests/ follow a consistent pattern: use `cd $PROJECT_ROOT` without quotes or error handling, and pass arguments with `$@` unquoted. Maintain this consistency when adding new test scripts.

Learnt from: CR
Repo: NVIDIA-NeMo/RL PR: 0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-11-24T17:24:41.976Z
Learning: Applies to tests/test_suites/**/*.sh : Driver shell scripts should match the YAML base name with .sh extension and invoke training entrypoint with uv run

Comment on lines +92 to +101
@pytest.fixture
def mock_distributed():
"""Mock torch.distributed calls for non-distributed tests."""
with (
patch("torch.distributed.is_initialized", return_value=False),
patch("torch.distributed.get_rank", return_value=0),
):
yield
torch.distributed.destroy_process_group()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Bug: mock_distributed fixture calls destroy_process_group after mocking is_initialized.

The fixture patches is_initialized to return False, but then unconditionally calls destroy_process_group() after yield. If distributed was never actually initialized (which is the case when mocking), this will raise an error.

Apply this diff to fix the cleanup logic:

 @pytest.fixture
 def mock_distributed():
     """Mock torch.distributed calls for non-distributed tests."""
     with (
         patch("torch.distributed.is_initialized", return_value=False),
         patch("torch.distributed.get_rank", return_value=0),
     ):
         yield
-    torch.distributed.destroy_process_group()
+    # Don't call destroy_process_group - we were mocking, not actually initializing
🤖 Prompt for AI Agents
In tests/unit/utils/test_automodel_checkpoint.py around lines 92 to 101, the
teardown unconditionally calls torch.distributed.destroy_process_group() which
can raise if the process group isn’t initialized; update the fixture to call
torch.distributed.destroy_process_group() only if
torch.distributed.is_initialized() returns True (i.e., guard the cleanup with an
if check) so destruction runs only when a process group actually exists.

Comment on lines +520 to +521
initial_distribute = torch.distributed.is_initialized()
print(f"Initial distribute: {initial_distribute}")
Copy link
Contributor

@coderabbitai coderabbitai bot Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Remove debug print statement.

This appears to be leftover debug code that should be removed before merging.

-        initial_distribute = torch.distributed.is_initialized()
-        print(f"Initial distribute: {initial_distribute}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
initial_distribute = torch.distributed.is_initialized()
print(f"Initial distribute: {initial_distribute}")
🤖 Prompt for AI Agents
In tests/unit/utils/test_automodel_checkpoint.py around lines 520-521 there is a
leftover debug print statement printing the initial distributed status; remove
the print(f"Initial distribute: {initial_distribute}") line so the test file
contains only the assignment to initial_distribute (or remove both lines if the
variable is unused) and ensure no debug prints remain.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only P1. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joyang-nv Got it! Thanks for the clarification on the priority. The debug print statement should still be removed before merging. 👍

@RayenTian
Copy link
Contributor

Hi, @ffrujeri. Would you please help review this, especially the tests? I made significant changes to tests/unit/models/policy/test_dtensor_worker.py.

Signed-off-by: ruit <ruit@nvidia.com>
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

⚠️ File Consistency Check

Check based on commit: b33ce28 (PR #1556 from samodi/automodel-lora)

⚠️ DTensor Policy Worker Synchronization Warning

The file nemo_rl/models/policy/dtensor_policy_worker_v2.py was modified in this PR, but nemo_rl/models/policy/dtensor_policy_worker.py was not updated.

Why this matters:
These files contain related DTensor policy worker implementations that should be kept synchronized to ensure consistency across different versions.

Action required:

  • Please review if the changes in nemo_rl/models/policy/dtensor_policy_worker_v2.py should also be applied to nemo_rl/models/policy/dtensor_policy_worker.py
  • Update nemo_rl/models/policy/dtensor_policy_worker.py if necessary to maintain consistency
  • If the files are intentionally different, please add a comment in the PR explaining why

Files to check:

  • Modified: nemo_rl/models/policy/dtensor_policy_worker_v2.py
  • Not modified: nemo_rl/models/policy/dtensor_policy_worker.py

This check ensures that related file implementations remain synchronized across the codebase. If you believe this warning is incorrect or the files should intentionally differ, please add a comment explaining the reasoning.

@RayenTian RayenTian added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Dec 2, 2025
Copy link
Member

@joyang-nv joyang-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a test case similar to 3rdparty/Automodel-workspace/Automodel/tests/unit_tests/_peft/test_lora.py::test_forward_output_consistency()?

Comment on lines +24 to +35
class LoRAConfig(TypedDict):
enabled: bool
target_modules: NotRequired[list[str]]
exclude_modules: NotRequired[list[str]]
match_all_linear: NotRequired[bool]
dim: NotRequired[int]
alpha: NotRequired[int]
dropout: NotRequired[float]
dropout_position: NotRequired[Literal["pre", "post"]]
lora_A_init: NotRequired[str]
use_triton: NotRequired[bool]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to fix this.

checkpointing.enabled=true \
checkpointing.save_period=3 \
checkpointing.checkpoint_dir=/tmp/lora_sft_checkpoints \
$@ \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think this is a good catch!

Comment on lines +520 to +521
initial_distribute = torch.distributed.is_initialized()
print(f"Initial distribute: {initial_distribute}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only P1. :)

)

if self.peft_config is not None:
apply_lora_to_linear_modules(model, self.peft_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this function is called twice (on this line and line 291). Is that expected?

Comment on lines +24 to +35
class LoRAConfig(TypedDict):
enabled: bool
target_modules: NotRequired[list[str]]
exclude_modules: NotRequired[list[str]]
match_all_linear: NotRequired[bool]
dim: NotRequired[int]
alpha: NotRequired[int]
dropout: NotRequired[float]
dropout_position: NotRequired[Literal["pre", "post"]]
lora_A_init: NotRequired[str]
use_triton: NotRequired[bool]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you make all of these required if they are required when enabled=True? you can get the type checker to respect notrequired if false and required if true by doing something like this:

https://github.com/NVIDIA-NeMo/RL/blob/main/nemo_rl/models/policy/__init__.py#L191

Comment on lines +25 to +34
enabled: bool
target_modules: NotRequired[list[str]]
exclude_modules: NotRequired[list[str]]
match_all_linear: NotRequired[bool]
dim: NotRequired[int]
alpha: NotRequired[int]
dropout: NotRequired[float]
dropout_position: NotRequired[Literal["pre", "post"]]
lora_A_init: NotRequired[str]
use_triton: NotRequired[bool]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please help document these flags?



# TODO: @ruit remove this once the bump Automodel to 2d20e33a19d5e53a271b1403b507475e68ad14dc (https://github.com/NVIDIA-NeMo/RL/issues/1586)
def _patched_init_lora_weights(self, init_method: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can see this function silently getting ignored. could you add a unit test that does a check and will fail when it's okay to remove? something in spirit to this

"If this fails, that means the upstream bug has been fixed. You can close this issue: https://github.com/huggingface/transformers/issues/41190"

Comment on lines +24 to +35
class LoRAConfig(TypedDict):
enabled: bool
target_modules: NotRequired[list[str]]
exclude_modules: NotRequired[list[str]]
match_all_linear: NotRequired[bool]
dim: NotRequired[int]
alpha: NotRequired[int]
dropout: NotRequired[float]
dropout_position: NotRequired[Literal["pre", "post"]]
lora_A_init: NotRequired[str]
use_triton: NotRequired[bool]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see later on there's this comment about the lora_dtype
image

but will that override the value chosen here. +1 on resolving this before merging

@terrykong terrykong linked an issue Dec 4, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LoRA DTensor SFT

6 participants