Skip to content

Conversation

@ashors1
Copy link
Contributor

@ashors1 ashors1 commented Oct 6, 2025

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Issues

List issues that this PR closes (syntax):

closes #1261

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

Release Notes

  • Chores

    • Standardized checkpoint metric naming convention across all training configurations to use "val:" or "train:" prefixes for clearer metric source identification.
    • Enhanced checkpointing logic to parse and validate the new metric naming format, with improved warning messages when metrics are unavailable.
  • Documentation

    • Updated checkpoint configuration documentation to clarify the required metric naming format.

Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
@ashors1 ashors1 requested a review from terrykong October 7, 2025 16:13
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
@ashors1
Copy link
Contributor Author

ashors1 commented Oct 7, 2025

Thanks for the comments @samodi-nv! I've addressed them

Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
@ashors1 ashors1 force-pushed the ashors/ckpt_metric branch from a662a80 to 678dbf3 Compare October 17, 2025 05:13
Signed-off-by: ashors1 <ashors@nvidia.com>
@ashors1 ashors1 marked this pull request as ready for review October 17, 2025 06:20
@ashors1 ashors1 requested review from a team as code owners October 17, 2025 06:20
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 17, 2025

📝 Walkthrough

Walkthrough

This pull request refactors metric-based checkpointing across the codebase by introducing a namespaced metric format with "train:" or "val:" prefixes. Configuration files are updated to use the new format, and algorithm implementations are enhanced to parse these prefixes, validate metric existence, and handle missing metrics gracefully with warnings.

Changes

Cohort / File(s) Summary
Configuration Updates
examples/configs/distillation_math.yaml, examples/configs/dpo.yaml, examples/configs/grpo_math_1B.yaml, examples/configs/grpo_math_1B_megatron.yaml, examples/configs/grpo_sliding_puzzle.yaml, examples/configs/rm.yaml, examples/configs/sft.yaml, examples/configs/sft_openmathinstruct2.yaml, examples/configs/sft_openmathinstruct2_megatron.yaml, examples/configs/sft_vlm_3B.yaml, examples/configs/vlm_grpo_3B.yaml, examples/configs/vlm_grpo_3B_megatron.yaml
Updated checkpointing.metric_name from formats like "val_loss"/"val_reward" to namespaced formats like "val:loss"/"val:reward". Added clarifying comments specifying that metrics must be prefixed with either "val:" or "train:" followed by the metric name.
Algorithm Implementations
nemo_rl/algorithms/distillation.py, nemo_rl/algorithms/dpo.py, nemo_rl/algorithms/grpo.py, nemo_rl/algorithms/rm.py, nemo_rl/algorithms/sft.py
Implemented logic to parse metric_name with "train:"/"val:" prefixes, extract the source and metric name, select appropriate metrics dictionary, validate metric existence, issue warnings for missing metrics, and store metric values under full metric name keys in checkpointing save state.
Checkpoint Utilities
nemo_rl/utils/checkpoint.py
Added documentation in CheckpointingConfig docstring specifying the required metric name format (prefixed with "val:" or "train:").

Sequence Diagram

sequenceDiagram
    participant Config as Config
    participant Algo as Algorithm (SFT/GRPO/etc)
    participant Metrics as Metrics Storage
    participant SaveState as Checkpoint SaveState

    Config->>Algo: metric_name = "val:loss"
    Algo->>Algo: Parse prefix<br/>("val" or "train")
    alt Prefix valid ("val" or "train")
        Algo->>Algo: Extract metric_name
        alt Prefix is "val"
            Algo->>Metrics: Select val_metrics
        else Prefix is "train"
            Algo->>Metrics: Select train metrics
        end
        alt Metric exists
            Metrics-->>Algo: metric_value
            Algo->>SaveState: Store metric under<br/>"val:loss" key
        else Metric missing
            Algo-->>Algo: Emit warning
            Algo->>SaveState: Remove "val:loss"<br/>entry if present
        end
    else Invalid prefix
        Algo-->>Algo: Emit warning
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The changes demonstrate consistent patterns across configuration files (homogeneous updates), but the algorithm implementations introduce heterogeneous logic for metric parsing, validation, and conditional branching based on metric availability. Multiple files require reasoning about the new control flow and error-handling paths.

Possibly related PRs

Suggested reviewers

  • terrykong

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Test Results For Major Changes ⚠️ Warning This PR introduces major breaking changes to the checkpointing metric naming system across five algorithm implementations (SFT, GRPO, DPO, RM, distillation), requiring migration from simple metric names to a "prefix:metric" format. The PR description explicitly contains only "placeholders for usage examples and pre-check items" with "Additional information marked as '...'" and no concrete test documentation. Searches of the test suite reveal that no tests exist for metric_name validation in test_sft.py, test_dpo.py, test_grpo.py, or test_rm.py. The only metric_name reference found in existing tests is set to None. Additionally, multiple outstanding review comments indicate the code still requires fixes for error handling, validation strictness, and stacklevel parameters. The complete absence of test results documenting validation of this substantial change prevents verification that the new functionality works correctly and does not introduce regressions. The PR should not be merged until the PR description is updated to include: (1) test results from unit and integration tests validating the new metric_name parsing logic across all affected algorithms, (2) tests confirming that both valid formats ("train:metric", "val:metric") work correctly and invalid formats raise appropriate errors, (3) evidence that checkpoint selection and top-k saving functionality operates correctly with the new format, (4) confirmation that model training convergence is unaffected by these changes, and (5) verification that all outstanding review comments have been addressed and validated by tests before merge.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "fix: support arbitrary values for checkpointing.metric_name" directly and accurately captures the primary change in this PR. The changeset focuses on updating configuration files and algorithm implementations to properly parse and use configured metric names with a new prefix format (val: or train:), replacing the previous behavior of defaulting to hardcoded metric names. The title is concise, specific, and clearly communicates the main objective without vagueness or unnecessary noise.
Linked Issues Check ✅ Passed The pull request successfully addresses the core requirements from issue #1261. The code changes implement the required behavior: parsing the configured checkpointing.metric_name from the configuration, validating the format (train: or val: prefix), selecting the appropriate metrics source (training or validation), checking for metric existence in the corresponding metrics dictionary, and issuing warnings when metrics are missing. Configuration files are updated to use the new prefix format across all examples (distillation, DPO, GRPO, RM, SFT, VLM variants), and algorithm implementations in distillation.py, dpo.py, grpo.py, rm.py, and sft.py all include the full_metric_name parsing and validation logic. The checkpoint utility documentation was also clarified to specify the required format.
Out of Scope Changes Check ✅ Passed All changes in this pull request are directly aligned with the stated objective of making checkpointing.metric_name respect configured values rather than defaulting to hardcoded metric names. The configuration file updates introduce the new prefix format (val: or train:), the algorithm implementations add the necessary parsing and validation logic, and the utility documentation clarifies the expected format. No unrelated changes, refactorings, or improvements to other areas are present. Every modification serves the primary purpose of enabling arbitrary metric name support with the new namespaced format.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ashors/ckpt_metric

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
examples/configs/rm.yaml (1)

152-155: Update stale example to match new prefix requirement

The comment still shows the pre-change form. Recommend this fix for consistency:

-  #     metric_name: "validation-<NameOfValidationDataset1>_loss"
+  #     metric_name: "val:validation-<NameOfValidationDataset1>_loss"
nemo_rl/algorithms/rm.py (1)

308-312: Wrong config key in assertion (RM uses rm, not dpo).

Assertion references master_config["dpo"]["val_period"]; should be ["rm"]["val_period"]. This can raise KeyError or mask validation misconfig.

Apply:

-        assert val_dataloader is not None or master_config["dpo"]["val_period"] == 0, (
+        assert val_dataloader is not None or master_config["rm"]["val_period"] == 0, (
             "val_dataloader is None, so dpo.val_period must be 0"
         )
nemo_rl/algorithms/grpo.py (1)

1054-1056: Wrong config key in GRPO validate assertion.

Should reference grpo.val_period, not dpo.val_period.

Apply:

-        assert val_dataloader is not None or master_config["dpo"]["val_period"] == 0, (
+        assert val_dataloader is not None or master_config["grpo"]["val_period"] == 0, (
             "val_dataloader is None, so dpo.val_period must be 0"
         )
🧹 Nitpick comments (3)
nemo_rl/algorithms/sft.py (1)

511-520: Consider more explicit parsing logic for clarity.

The logic at line 519 uses "val" in parts[0] to determine the metric source. While the assertion above ensures the format is correct, using an explicit comparison would be clearer:

-                        train_or_val = "val" if "val" in parts[0] else "train"
+                        train_or_val = parts[0]  # Already validated to be "val" or "train"

This makes the intent clearer and leverages the assertion's validation.

nemo_rl/algorithms/distillation.py (1)

734-759: Parse metric prefix safely and preserve metric names containing colons

Current parsing uses split(":") without maxsplit, which fails for metrics containing colons after the prefix, and relies on substring checks instead of exact comparison. Use split(":", 1) for safe parsing.

Apply consistently across all algorithm files:

-                        parts = full_metric_name.split(":")
-                        train_or_val = "val" if "val" in parts[0] else "train"
-                        metric_name = parts[1]
+                        train_or_val, metric_name = full_metric_name.split(":", 1)
+                        assert train_or_val in ("train", "val"), (
+                            f"Invalid metric prefix '{train_or_val}'. Expected 'train' or 'val'."
+                        )

Files to update:

  • nemo_rl/algorithms/distillation.py:741
  • nemo_rl/algorithms/sft.py:518
  • nemo_rl/algorithms/rm.py:582
  • nemo_rl/algorithms/dpo.py:654
  • nemo_rl/algorithms/grpo.py:914 and grpo.py:1714

Optionally, improve the warning message for clarity:

-                            warnings.warn(
-                                f"You asked to save checkpoints based on {metric_name} but the metric is not found in the {train_or_val} metrics. "
-                                "This checkpoint will not be saved as top-k.",
-                                stacklevel=2,
-                            )
+                            warnings.warn(
+                                f"Checkpoint metric '{full_metric_name}' not found in {train_or_val} metrics; skipping top-k update.",
+                                stacklevel=2,
+                            )
nemo_rl/algorithms/grpo.py (1)

907-931: Deduplicate metric_name parsing via a small utility.

Parsing logic is duplicated across RM/DPO/GRPO (sync + async). Consider a shared helper in nemo_rl/utils/checkpoint.py, e.g., parse_checkpoint_metric_name(full_metric_name) -> tuple[prefix, metric], and reuse. Reduces drift and enforces a single policy.

If helpful, I can draft the utility and apply call-site changes across modules.

Also applies to: 1709-1731

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dee3fd9 and 8a0a168.

📒 Files selected for processing (18)
  • examples/configs/distillation_math.yaml (1 hunks)
  • examples/configs/dpo.yaml (2 hunks)
  • examples/configs/grpo_math_1B.yaml (1 hunks)
  • examples/configs/grpo_math_1B_megatron.yaml (1 hunks)
  • examples/configs/grpo_sliding_puzzle.yaml (1 hunks)
  • examples/configs/rm.yaml (1 hunks)
  • examples/configs/sft.yaml (1 hunks)
  • examples/configs/sft_openmathinstruct2.yaml (1 hunks)
  • examples/configs/sft_openmathinstruct2_megatron.yaml (1 hunks)
  • examples/configs/sft_vlm_3B.yaml (1 hunks)
  • examples/configs/vlm_grpo_3B.yaml (1 hunks)
  • examples/configs/vlm_grpo_3B_megatron.yaml (1 hunks)
  • nemo_rl/algorithms/distillation.py (1 hunks)
  • nemo_rl/algorithms/dpo.py (1 hunks)
  • nemo_rl/algorithms/grpo.py (2 hunks)
  • nemo_rl/algorithms/rm.py (1 hunks)
  • nemo_rl/algorithms/sft.py (1 hunks)
  • nemo_rl/utils/checkpoint.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
examples/configs/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/
.yaml

Files:

  • examples/configs/grpo_math_1B.yaml
  • examples/configs/sft_openmathinstruct2.yaml
  • examples/configs/rm.yaml
  • examples/configs/distillation_math.yaml
  • examples/configs/vlm_grpo_3B_megatron.yaml
  • examples/configs/sft_openmathinstruct2_megatron.yaml
  • examples/configs/sft_vlm_3B.yaml
  • examples/configs/sft.yaml
  • examples/configs/grpo_math_1B_megatron.yaml
  • examples/configs/vlm_grpo_3B.yaml
  • examples/configs/grpo_sliding_puzzle.yaml
  • examples/configs/dpo.yaml
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • nemo_rl/utils/checkpoint.py
  • nemo_rl/algorithms/distillation.py
  • nemo_rl/algorithms/grpo.py
  • nemo_rl/algorithms/sft.py
  • nemo_rl/algorithms/dpo.py
  • nemo_rl/algorithms/rm.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/utils/checkpoint.py
  • nemo_rl/algorithms/distillation.py
  • nemo_rl/algorithms/grpo.py
  • nemo_rl/algorithms/sft.py
  • nemo_rl/algorithms/dpo.py
  • nemo_rl/algorithms/rm.py
🧠 Learnings (1)
📚 Learning: 2025-09-18T13:26:43.307Z
Learnt from: zpqiu
PR: NVIDIA-NeMo/RL#1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-8b-base-2n8g-fsdp2tp2.v1.yaml:19-26
Timestamp: 2025-09-18T13:26:43.307Z
Learning: In on-policy distillation workflows, validation can use downstream task performance (like math problem solving) as RL-like reward metrics rather than traditional distillation metrics like KL divergence. In this case, "val_reward" with "higher_is_better: true" is the correct checkpoint monitoring configuration.

Applied to files:

  • examples/configs/grpo_math_1B.yaml
  • examples/configs/sft_openmathinstruct2.yaml
  • examples/configs/rm.yaml
  • examples/configs/distillation_math.yaml
  • examples/configs/vlm_grpo_3B_megatron.yaml
  • examples/configs/sft_openmathinstruct2_megatron.yaml
  • examples/configs/sft_vlm_3B.yaml
  • examples/configs/sft.yaml
  • examples/configs/grpo_math_1B_megatron.yaml
  • examples/configs/vlm_grpo_3B.yaml
  • examples/configs/grpo_sliding_puzzle.yaml
  • examples/configs/dpo.yaml
🪛 Ruff (0.14.0)
nemo_rl/algorithms/grpo.py

922-922: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)


1722-1722: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

nemo_rl/algorithms/sft.py

531-531: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

nemo_rl/algorithms/dpo.py

662-662: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

nemo_rl/algorithms/rm.py

590-590: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

🔇 Additional comments (13)
nemo_rl/utils/checkpoint.py (1)

43-44: LGTM! Clear documentation of the required metric format.

The documentation clearly specifies that metric_name must use "val:" or "train:" prefixes, which aligns with the implementation across algorithm files and addresses the requirement from past review comments.

examples/configs/grpo_math_1B_megatron.yaml (1)

32-36: LGTM! Config correctly adopts the new metric naming convention.

The change properly updates the metric_name to use the "val:" prefix and includes a helpful inline comment. The checkpoint_must_save_by field addition aligns with broader checkpointing patterns.

Based on learnings, "val:reward" with "higher_is_better: true" is the correct configuration for RL-based reward metrics.

examples/configs/vlm_grpo_3B_megatron.yaml (1)

29-33: LGTM! Consistent with the new metric naming convention.

The configuration correctly adopts the "val:" prefix format with appropriate inline documentation.

examples/configs/grpo_sliding_puzzle.yaml (1)

14-18: LGTM! Consistent adoption of the new format.

examples/configs/distillation_math.yaml (1)

22-26: LGTM! Config correctly updated.

The metric_name properly uses the new format. The past review comment about documenting format options in checkpoint.py has been addressed in this PR.

examples/configs/sft_openmathinstruct2.yaml (1)

15-19: LGTM! Correctly configured for loss metric.

The metric_name properly uses the "val:loss" format, and higher_is_better: false is correctly set for loss metrics.

examples/configs/grpo_math_1B.yaml (1)

37-41: LGTM! Final config correctly updated.

The metric_name properly adopts the "val:reward" format with appropriate documentation.

examples/configs/sft_vlm_3B.yaml (1)

19-19: LGTM: namespaced checkpoint metric

Switch to "val:loss" with clarifying comment matches the new convention; higher_is_better: false remains correct for loss.

examples/configs/sft_openmathinstruct2_megatron.yaml (1)

17-18: LGTM: prefixed metric format

"val:loss" and the inline note align with the new full_metric_name workflow.

examples/configs/dpo.yaml (1)

25-26: LGTM: DPO metric now explicitly from validation set

Using "val:validation-default_loss" and updating the comment example reduces ambiguity and matches the new requirement.

Also applies to: 183-184

examples/configs/vlm_grpo_3B.yaml (1)

34-36: LGTM: reward metric correctly namespaced

"val:reward" with higher_is_better: true matches GRPO usage.

examples/configs/rm.yaml (1)

18-20: LGTM: namespaced loss metric

"val:loss" and higher_is_better: false are consistent.

examples/configs/sft.yaml (1)

18-19: LGTM: prefixed loss metric

"val:loss" matches the new convention; no further changes needed.

Copy link
Contributor

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. small comment on warning

Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
@ashors1 ashors1 added the CI:L0 Run doctests and unit tests label Oct 22, 2025
Signed-off-by: ashors1 <ashors@nvidia.com>
@terrykong terrykong added r0.4.0 CI:L1 Run doctests, unit tests, and functional tests and removed r0.4.0 CI:L1 Run doctests, unit tests, and functional tests labels Oct 28, 2025
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: Anna Shors <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
@terrykong terrykong added CI:L1 Run doctests, unit tests, and functional tests and removed CI:L1 Run doctests, unit tests, and functional tests labels Oct 30, 2025
@terrykong terrykong merged commit 7b32363 into main Oct 30, 2025
40 of 41 checks passed
@terrykong terrykong deleted the ashors/ckpt_metric branch October 30, 2025 12:58
chtruong814 pushed a commit that referenced this pull request Oct 30, 2025
Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: Terry Kong <terrycurtiskong@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests r0.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

config option checkpointing.metric_name is not respected

4 participants