Skip to content

Conversation

terrykong
Copy link
Contributor

@terrykong terrykong commented Oct 17, 2025

We've had nightly tests fail because the yaml configs don't adhere to the appropriate MasterConfig. This introduces pydantic in the unit tests to validate all our configs.

Introducing has helped fill in some missing config fields.

We don't use pydantic to validate at runtime since that may be jarring to use an old config with extra keys since we do not want to forbid users from using a config structure that works for them. Instead, we only use this for testing to make sure the current configs in the repo adhere to the current state of the code. This way, if a user changes a config value from NotRequired[int] to int, we should be able to catch that now on the entire MasterConfig.

This also includes some small config fixes:

  1. removing stop_token_ids from the configs since if someone swaps out the model, it may be surprising that the stop tokens are kept from the previous tokenizer
  2. add a check for defer_fp32_logits==True if logprob_chunk_size>0

Summary by CodeRabbit

  • New Features

    • Added training optimization configuration options including bias activation fusion, optimizer offloading, and custom FSDP settings.
    • Expanded configuration flexibility to allow optional values for resource allocation and generation parameters.
  • Bug Fixes

    • Updated padding token handling in generation workflows.
    • Fixed default values for optimization flags.
  • Configuration Updates

    • Removed explicit stop token IDs from generation configs for improved default termination behavior.
    • Enhanced type safety for optional configuration parameters.

@terrykong terrykong requested review from a team as code owners October 17, 2025 05:50
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 17, 2025
Copy link
Contributor

coderabbitai bot commented Oct 17, 2025

📝 Walkthrough

Walkthrough

The PR refactors configuration type definitions across generation and policy modules to use NotRequired and union types with None, introduces internal _pad_token_id handling for padding tokens, creates disabled-variant TypedDict classes for optional policy configuration blocks, removes stop_token_ids from multiple GRPO example configs, and updates corresponding generation and policy code to align with these structural changes.

Changes

Cohort / File(s) Summary
Type Interface & Schema Updates
docs/design-docs/generation.md, nemo_rl/models/generation/interfaces.py, nemo_rl/algorithms/loss_functions.py
Updated TypedDict field signatures to support nullable/optional values: GenerationConfig.top_k (int → int | None), ClippedPGLossConfig.ratio_clip_c (float → float | None). New OptionalResourcesConfig TypedDict with nullable node/GPU counts. Added _pad_token_id field to GenerationConfig for internal padding token management.
Data Config Type Changes
nemo_rl/data/__init__.py, nemo_rl/environments/math_environment.py, nemo_rl/evals/eval.py
Extended DataConfig fields to allow None: prompt_file, system_prompt_file, output_key, split, seed (all NotRequired[str | None]). Made MathDataConfig.problem_key and solution_key optional. Replaced Optional[...] with NotRequired[... | None] in MathEnvConfig. Wrapped MathEnvConfig in new _PassThroughMathConfig.
Policy Config Refactoring
nemo_rl/models/policy/__init__.py
Major restructuring: introduced *ConfigDisabled variants (DTensor, SequencePacking, Megatron, DynamicBatching) for explicit disabled states. Changed enabled fields from bool to Literal[True]. Updated policy block fields to union types (e.g., DTensorConfig | DTensorConfigDisabled). Modified MegatronConfig (env_vars nullable, pipeline stage fields nullable, optimizer/scheduler now required). Added SinglePytorchMilestonesConfig. Extended PolicyConfig.logprob_chunk_size to NotRequired[int | None]. Added max_grad_norm: NotRequired[float | int | None].
Checkpoint Config Updates
nemo_rl/utils/checkpoint.py
Changed CheckpointingConfig.model_save_format from str to str | None (and NotRequired[str]NotRequired[str | None]).
Generation & VLLM Worker Code
nemo_rl/models/generation/vllm/vllm_generation.py, nemo_rl/models/generation/vllm/vllm_worker.py, nemo_rl/models/generation/vllm/vllm_worker_async.py, nemo_rl/models/generation/__init__.py
Replaced pad_token_id with _pad_token_id throughout for internal padding token retrieval (3 locations in vllm_generation.py, verify_right_padding calls, output tensor creation). Updated pad token source in vllm_worker and vllm_worker_async. Changed top_k retrieval from .get() to direct indexing. Added warning guard in generation __init__.py when _pad_token_id exists in config.
Policy & Rollouts Updates
nemo_rl/models/policy/lm_policy.py, nemo_rl/models/policy/megatron_policy_worker.py, nemo_rl/experience/rollouts.py
Updated lm_policy.generate to use _pad_token_id instead of pad_token_id. Modified save_checkpoint condition to check model_save_format is not None. Set defer_fp32_logits default to False in Megatron initialization; added validation assertion that defer_fp32_logits must be True when logprob_chunk_size > 0. Changed rollouts.generate_responses_async to use tokenizer.pad_token_id directly instead of reading from policy config.
Example Config Files: Stop Tokens Removal
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-{1n8g-megatron-fp8-{e2e,rollouts.v2},2n8g-fsdp2tp1-noncolocated,4n8g-fsdp2tp1-long.v3}.yaml, examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-{1n8g-fsdp2tp1.v3,1n8g-megatron}.yaml, examples/configs/recipes/llm/grpo-qwen2.5-{32b-32n8g-fsdp2tp8sp-actckpt-{long.v3,}.v3,7b-instruct-4n8g-{fsdp2tp4sp.v3,megatron},math-1.5b-instruct-1n8g-fsdp2tp1.v3}.yaml
Removed stop_token_ids field from generation configs across 12 recipe files (both Llama 3.1/3.2 and Qwen2.5 variants).
Example Config Files: Megatron & Optimizer Settings
examples/configs/dpo.yaml, examples/configs/grpo_math_1B.yaml, examples/configs/sft_openmathinstruct2_megatron.yaml, examples/configs/vlm_grpo_3B.yaml, examples/configs/vlm_grpo_3B_megatron.yaml
Added boolean fields: bias_activation_fusion (True/False), use_custom_fsdp (False). Changed defer_fp32_logits from null to False in grpo_math_1B. Added optimizer offload settings (optimizer_cpu_offload, optimizer_offload_fraction) in vlm_grpo_3B. Removed optimizer: null from vlm_grpo_3B_megatron.
Example Config Files: Env Vars & Minor Additions
examples/configs/sft.yaml
Added empty env_vars: {} under policy.dtensor_cfg and policy.megatron_cfg.
Test Updates: Config Validation
tests/unit/test_config_validation.py
Refactored validation logic to use Pydantic TypeAdapter for TypedDict-based config validation. Introduced validate_config_section() helper function. Replaced multi-branch nested validation with centralized call supporting multiple MasterConfig variants (Distillation, DPO, GRPO, RM, SFT, Eval). Added OmegaConf resolver for multiplication operator.
Test Updates: Recipe & VLLM
tests/unit/test_recipes_and_test_suites.py, tests/unit/models/generation/test_vllm_generation.py, tests/unit/models/generation/test_vllm_large_model.py
Added "rm" entry to ALGO_MAPPING_TO_BASE_YAML. Removed test_all_recipes_can_merge_configs_with_base_config test function. Updated pad token retrieval from pad_token_id to _pad_token_id in vLLM test paths.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

This PR involves substantial type system refactoring across multiple modules (policy configs, generation interfaces, data configs) combined with targeted code updates to padding token handling. While many changes follow consistent patterns (e.g., Optional[X]NotRequired[X | None], pad_token_id_pad_token_id), the scope includes dense TypedDict restructuring with new disabled-variant classes, conditional logic additions (assertions, warnings), and distributed adjustments across generation workers and config validation. The heterogeneity of changes (type updates, config removals, new public types, code logic shifts) demands separate reasoning per logical area despite some repetitive patterns.

Possibly related PRs

Suggested labels

config-refactor, type-system, generation, policy

Suggested reviewers

  • parthchadha
  • yfw

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.47% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Test Results For Major Changes ✅ Passed This PR makes test infrastructure improvements by introducing Pydantic-based YAML config validation that operates only in tests, not at runtime. The PR objectives explicitly state that validation is test-only to avoid breaking existing users. The changes primarily consist of TypedDict updates for configuration schemas and YAML example file modifications. The PR description provides some testing information ("adding pydantic helped identify missing config fields"), though not extensive. Since these changes do not affect runtime functionality or numerical behavior, and are limited to test validation, the absence of performance benchmarks or convergence regression data is appropriate. The changes are not major from a runtime perspective, being focused on improved testing and configuration validation.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "chore: use pydantic for yaml validation" accurately reflects the main objective and primary code change. The PR introduces Pydantic's TypeAdapter in the test validation infrastructure (tests/unit/test_config_validation.py) to validate repository YAML configs against the project's MasterConfig TypedDicts, replacing the previous custom recursive validation logic. The title is specific, concise, and clearly communicates the primary change without being overly broad or vague. All supporting changes—such as type annotation updates in configuration classes and YAML file modifications—are in service of this core objective to enable Pydantic-based validation.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch tk/pydantic-test-validation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@terrykong terrykong requested a review from ashors1 October 17, 2025 05:53
@terrykong terrykong changed the title chore: use pydantic for yaml validation chore: use pydantic for yaml test validation Oct 17, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/test_recipes_and_test_suites.py (1)

42-45: Remove unused ALLOWED_ADDITIONAL_CONFIG_KEYS constant.

The ripgrep search confirms this constant is defined but never referenced anywhere in the codebase. Since test_all_recipes_can_merge_configs_with_base_config was removed, this constant is now dead code and should be deleted from lines 42–45.

🧹 Nitpick comments (12)
nemo_rl/models/generation/__init__.py (1)

31-36: Add stacklevel when emitting this warning.

Please pass stacklevel=2 (or similar) to warnings.warn so callers get a useful file/line in the warning trace; this avoids Ruff’s B028 finding and makes the diagnostics actionable.

-        warnings.warn(
+        warnings.warn(
             "'_pad_token_id' found in generation config and will be overridden with tokenizer.pad_token_id. "
             "Note: '_pad_token_id' is intended for internal use and has no effect when set in user-provided configs.",
-            UserWarning,
+            UserWarning,
+            stacklevel=2,
         )
nemo_rl/models/policy/megatron_policy_worker.py (1)

765-766: Mirror the no-hidden-default change for the reference model path.

Keep config handling consistent across both model instantiations.

As per coding guidelines

-                wrap_cast_model_output_to_fp32=(
-                    not self.cfg["megatron_cfg"].get("defer_fp32_logits", False)
-                ),
+                wrap_cast_model_output_to_fp32=(
+                    not self.cfg["megatron_cfg"]["defer_fp32_logits"]
+                ),
nemo_rl/evals/eval.py (1)

52-55: MasterConfig.env shape change is breaking; consider backwards-compatible typing.

Switching env from MathEnvConfig to {"math": MathEnvConfig} will fail existing configs and the new Pydantic tests. If the intent is pass-through without runtime change, accept both shapes.

-class _PassThroughMathConfig(TypedDict):
-    math: MathEnvConfig
+class _PassThroughMathConfig(TypedDict, total=False):
+    # Optional wrapper to allow env: {math: {...}} while preserving legacy env: {...}
+    math: MathEnvConfig
@@
-class MasterConfig(TypedDict):
+class MasterConfig(TypedDict):
@@
-    env: _PassThroughMathConfig
+    # Accept either legacy MathEnvConfig or wrapped {"math": MathEnvConfig}
+    env: MathEnvConfig | _PassThroughMathConfig

Please confirm all example eval YAMLs were updated to the new nested env shape; otherwise tests will start failing unexpectedly.

Also applies to: 62-62

nemo_rl/environments/math_environment.py (1)

43-46: LGTM on NotRequired fields; document and avoid code-level defaults.

Making stop_strings and verifier_type NotRequired is consistent. However, the comment “None defaults to 'math'” plus code using cfg.get("verifier_type", "math") introduces a hidden default. Prefer requiring this in YAML (and failing fast) or document the default in YAML examples, not in code.

As per coding guidelines

tests/unit/test_config_validation.py (1)

112-114: Path check for eval configs is POSIX-specific; make OS-agnostic.

Use pathlib to inspect parts or .as_posix() to avoid false negatives on Windows.

-    if "/evals/" in config_file:
+    from pathlib import Path
+    if "evals" in Path(config_file).parts:
nemo_rl/models/generation/interfaces.py (1)

107-116: Clarify “optional” semantics for resources; consider total=False.

As written, keys are required but may be null. If the intent is optional presence, make the TypedDict partial.

-class OptionalResourcesConfig(TypedDict):
+class OptionalResourcesConfig(TypedDict, total=False):
     # Same as ResourcesConfig, but fields can be null and are validated in grpo.py
     gpus_per_node: int | None
     num_nodes: int | None

Confirm downstream code handles missing keys and None consistently (e.g., cluster construction).

nemo_rl/models/policy/__init__.py (6)

110-112: Make pipeline-stage layer counts optional to avoid forcing keys in YAML.

These fields are often omitted unless PP is used. Recommend NotRequired to prevent needless failures during test validation.

-    num_layers_in_first_pipeline_stage: int | None
-    num_layers_in_last_pipeline_stage: int | None
+    num_layers_in_first_pipeline_stage: NotRequired[int | None]
+    num_layers_in_last_pipeline_stage: NotRequired[int | None]

33-35: custom_parallel_plan should be optional.

Configs typically omit this. Requiring the key (even as None) can break existing YAMLs.

-    custom_parallel_plan: str | None
+    custom_parallel_plan: NotRequired[str | None]

152-153: Prefer a proper type alias (Python 3.12) to avoid creating a runtime module variable.

Use PEP 695’s type to declare an alias for static analysis without introducing a global.

-SchedulerMilestones = dict[str, list[int]]
+type SchedulerMilestones = dict[str, list[int]]

171-201: Add a concise docstring for PolicyConfig (public surface).

Improves clarity and meets our docstring guideline for public interfaces.

-class PolicyConfig(TypedDict):
+class PolicyConfig(TypedDict):
+    """Top-level policy configuration.
+
+    Keys marked NotRequired may be omitted in YAML. Optional keys that appear may be set to null.
+    Required sub-blocks with Disabled variants (e.g., dtensor_cfg, dynamic_batching) must specify enabled=True/False.
+    """

As per coding guidelines


20-22: Document Disabled variants.

Consider single-line docstrings for Disabled types to make intent explicit (e.g., “Disabled variant for , requires enabled=False”). Light but helpful for readers and tooling.

Also applies to: 37-39, 98-101, 155-157


118-121: Enforce cross-field constraint in tests: logprob_chunk_size ⇒ defer_fp32_logits=True.

The comments state this requirement, but typing can’t express it. Since Pydantic is used in tests, add a validator to assert this relation to catch misconfigured YAMLs early. I can draft the test-side validator if helpful.

Also applies to: 177-181

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dee3fd9 and fc72ba0.

📒 Files selected for processing (36)
  • docs/design-docs/generation.md (1 hunks)
  • examples/configs/dpo.yaml (2 hunks)
  • examples/configs/grpo_math_1B.yaml (1 hunks)
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-e2e.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml (0 hunks)
  • examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml (0 hunks)
  • examples/configs/sft.yaml (2 hunks)
  • examples/configs/sft_openmathinstruct2_megatron.yaml (1 hunks)
  • examples/configs/vlm_grpo_3B.yaml (2 hunks)
  • examples/configs/vlm_grpo_3B_megatron.yaml (2 hunks)
  • nemo_rl/algorithms/loss_functions.py (1 hunks)
  • nemo_rl/data/__init__.py (2 hunks)
  • nemo_rl/environments/math_environment.py (2 hunks)
  • nemo_rl/evals/eval.py (1 hunks)
  • nemo_rl/experience/rollouts.py (1 hunks)
  • nemo_rl/models/generation/__init__.py (2 hunks)
  • nemo_rl/models/generation/interfaces.py (2 hunks)
  • nemo_rl/models/generation/vllm/vllm_generation.py (3 hunks)
  • nemo_rl/models/generation/vllm/vllm_worker.py (2 hunks)
  • nemo_rl/models/generation/vllm/vllm_worker_async.py (2 hunks)
  • nemo_rl/models/policy/__init__.py (5 hunks)
  • nemo_rl/models/policy/lm_policy.py (2 hunks)
  • nemo_rl/models/policy/megatron_policy_worker.py (3 hunks)
  • nemo_rl/utils/checkpoint.py (2 hunks)
  • tests/unit/models/generation/test_vllm_generation.py (1 hunks)
  • tests/unit/models/generation/test_vllm_large_model.py (1 hunks)
  • tests/unit/test_config_validation.py (1 hunks)
  • tests/unit/test_recipes_and_test_suites.py (1 hunks)
💤 Files with no reviewable changes (11)
  • examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml
  • examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml
  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml
  • examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.yaml
  • examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml
  • examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml
  • examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-e2e.yaml
  • examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • nemo_rl/models/generation/vllm/vllm_worker.py
  • nemo_rl/experience/rollouts.py
  • nemo_rl/algorithms/loss_functions.py
  • nemo_rl/models/generation/vllm/vllm_generation.py
  • tests/unit/models/generation/test_vllm_generation.py
  • nemo_rl/environments/math_environment.py
  • nemo_rl/data/__init__.py
  • nemo_rl/models/generation/interfaces.py
  • nemo_rl/models/policy/lm_policy.py
  • tests/unit/models/generation/test_vllm_large_model.py
  • tests/unit/test_config_validation.py
  • nemo_rl/models/generation/__init__.py
  • nemo_rl/models/generation/vllm/vllm_worker_async.py
  • tests/unit/test_recipes_and_test_suites.py
  • nemo_rl/utils/checkpoint.py
  • nemo_rl/evals/eval.py
  • nemo_rl/models/policy/megatron_policy_worker.py
  • nemo_rl/models/policy/__init__.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/models/generation/vllm/vllm_worker.py
  • nemo_rl/experience/rollouts.py
  • nemo_rl/algorithms/loss_functions.py
  • nemo_rl/models/generation/vllm/vllm_generation.py
  • nemo_rl/environments/math_environment.py
  • nemo_rl/data/__init__.py
  • nemo_rl/models/generation/interfaces.py
  • nemo_rl/models/policy/lm_policy.py
  • nemo_rl/models/generation/__init__.py
  • nemo_rl/models/generation/vllm/vllm_worker_async.py
  • nemo_rl/utils/checkpoint.py
  • nemo_rl/evals/eval.py
  • nemo_rl/models/policy/megatron_policy_worker.py
  • nemo_rl/models/policy/__init__.py
examples/configs/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/
.yaml

Files:

  • examples/configs/sft.yaml
  • examples/configs/grpo_math_1B.yaml
  • examples/configs/vlm_grpo_3B.yaml
  • examples/configs/vlm_grpo_3B_megatron.yaml
  • examples/configs/sft_openmathinstruct2_megatron.yaml
  • examples/configs/dpo.yaml
docs/**/*.md

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When a markdown doc under docs/**/*.md is added or renamed, update docs/index.md to include it in the appropriate section

Files:

  • docs/design-docs/generation.md
🧠 Learnings (1)
📚 Learning: 2025-09-19T02:44:38.451Z
Learnt from: shuo-nvidia
PR: NVIDIA-NeMo/RL#1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:73-84
Timestamp: 2025-09-19T02:44:38.451Z
Learning: The scheduler configuration format with a separate "milestones: [20]" entry (not wrapped under name/kwargs) is a valid and established pattern used across GRPO, DPO, and distillation configs in the NeMo RL codebase. This format specifies transition points between different schedulers (e.g., LinearLR for warmup steps, then ConstantLR).

Applied to files:

  • nemo_rl/models/policy/__init__.py
🧬 Code graph analysis (9)
nemo_rl/models/generation/vllm/vllm_worker.py (1)
nemo_rl/models/generation/interfaces.py (1)
  • verify_right_padding (23-99)
nemo_rl/experience/rollouts.py (2)
tests/unit/environments/test_retriever.py (1)
  • tokenizer (84-93)
tests/unit/environments/test_code_environment.py (1)
  • tokenizer (85-94)
nemo_rl/data/__init__.py (1)
tests/unit/data/test_data_processor.py (1)
  • system_prompt_file (191-195)
tests/unit/models/generation/test_vllm_large_model.py (2)
tests/unit/environments/test_retriever.py (1)
  • tokenizer (84-93)
tests/unit/environments/test_code_environment.py (1)
  • tokenizer (85-94)
tests/unit/test_config_validation.py (4)
tests/unit/data/packing/test_algorithms.py (1)
  • algorithms (97-104)
nemo_rl/evals/eval.py (1)
  • MasterConfig (57-63)
nemo_rl/algorithms/grpo.py (1)
  • MasterConfig (128-136)
nemo_rl/algorithms/distillation.py (1)
  • MasterConfig (105-116)
nemo_rl/models/generation/__init__.py (2)
tests/unit/models/generation/test_vllm_generation.py (1)
  • tokenizer (239-242)
tests/unit/models/generation/test_vllm_large_model.py (1)
  • tokenizer (82-85)
nemo_rl/models/generation/vllm/vllm_worker_async.py (1)
nemo_rl/models/generation/interfaces.py (1)
  • verify_right_padding (23-99)
nemo_rl/evals/eval.py (4)
nemo_rl/environments/math_environment.py (1)
  • MathEnvConfig (41-45)
nemo_rl/models/generation/interfaces.py (1)
  • GenerationConfig (118-131)
nemo_rl/models/policy/__init__.py (1)
  • TokenizerConfig (131-135)
nemo_rl/data/__init__.py (1)
  • MathDataConfig (49-51)
nemo_rl/models/policy/__init__.py (2)
nemo_rl/models/generation/interfaces.py (1)
  • GenerationConfig (118-131)
nemo_rl/models/policy/megatron_policy_worker.py (1)
  • freeze_moe_router (249-261)
🪛 Ruff (0.14.0)
tests/unit/test_config_validation.py

50-50: Prefer TypeError exception for invalid type

(TRY004)


50-50: Avoid specifying long messages outside the exception class

(TRY003)


81-84: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


106-106: Avoid specifying long messages outside the exception class

(TRY003)


131-133: Avoid specifying long messages outside the exception class

(TRY003)

nemo_rl/models/generation/__init__.py

31-31: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Lint check
  • GitHub Check: Post automodel integration comment / Comment on PR
  • GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (14)
tests/unit/test_recipes_and_test_suites.py (1)

1-249: No clarification needed—review comment is incorrect.

The pydantic validation is properly implemented in a dedicated test file (tests/unit/test_config_validation.py) which imports pydantic's TypeAdapter and ValidationError and validates YAML configs using multiple MasterConfig classes. The file under review (test_recipes_and_test_suites.py) correctly contains only test suite metadata validation and recipe YAML inventory checks—it is not intended to contain pydantic config validation logic. The removed test (test_all_recipes_can_merge_configs_with_base_config) was replaced by validation logic that now lives in the dedicated test_config_validation.py file, representing a proper separation of concerns.

Likely an incorrect or invalid review comment.

examples/configs/vlm_grpo_3B.yaml (2)

82-84: Well documented config additions.

The new config keys include proper documentation explaining their purpose and performance impact, following the coding guidelines for exemplar configs.

Based on coding guidelines.


109-111: Well documented config additions.

The optimizer offload fields include proper documentation, following the coding guidelines for exemplar configs.

Based on coding guidelines.

nemo_rl/utils/checkpoint.py (1)

45-45: Typing update correctly reflects nullable field.

The type changes make model_save_format explicitly nullable, which aligns with the existing logic at line 101 that uses .get("model_save_format", "safetensors") to handle None/missing values.

Also applies to: 60-60

nemo_rl/data/__init__.py (2)

18-20: Good TODO comments for future refactoring.

The TODO comments properly document the intention to split these monolithic TypedDict classes into more specific, strongly-typed variants (e.g., PreferenceDataConfig, ResponseDataConfig, MMLUConfig, AIMEConfig). This will enable more rigorous type checking.

Also applies to: 46-48


23-35: Nullable typing changes align with pydantic validation.

Adding | None to NotRequired fields makes the optionality explicit and helps pydantic catch missing vs null distinction in YAML configs.

nemo_rl/experience/rollouts.py (1)

163-163: Padding source correctly unified to tokenizer.

Using tokenizer.pad_token_id directly ensures consistent padding across generation paths, aligning with the internal _pad_token_id pattern used in vLLM generation code.

tests/unit/models/generation/test_vllm_large_model.py (1)

171-171: Correct usage of internal _pad_token_id with fallback.

The test properly accesses the private _pad_token_id field with a defensive fallback to tokenizer.pad_token_id, ensuring robustness across different configuration states.

nemo_rl/models/generation/interfaces.py (1)

125-132: top_k and stop_ now allow None; verify all call sites handle None.*

This changes the contract and may flow into vLLM/engine params. Ensure None is interpreted as “disabled” or mapped to a concrete value before invoking backends.

Would you like a quick scan script to find usages of ["top_k"], ["stop_token_ids"], ["stop_strings"] and report non-None assumptions?

nemo_rl/models/policy/megatron_policy_worker.py (1)

290-291: Review comment is incorrect; defer_fp32_logits is optional, not required.

Verification found 60+ YAML config files in examples/configs/ missing the defer_fp32_logits key, including core configs like grpo_math_8B.yaml, sft.yaml, dpo.yaml, and rm.yaml. The current code uses .get("defer_fp32_logits", False) precisely to handle this optionality.

The coding guideline to "access required config attributes directly" applies only when attributes are actually required. Here, the key is demonstrably optional across the codebase. Changing to direct indexing would cause KeyError on all existing configs lacking this key, breaking them.

The defensive .get() with default is the correct approach for optional configuration keys.

Likely an incorrect or invalid review comment.

nemo_rl/models/policy/__init__.py (4)

84-85: Good relaxation: lr_decay_iters is now optional.

Marking lr_decay_iters as NotRequired reduces spurious YAML breakages. LGTM.


135-136: Tokenizer kwargs broadened to allow None — good for backward compatibility.

LGTM.


148-150: Both YAML styles for milestones are confirmed compatible; TypedDict validates correctly.

The verification confirms all configs using milestones employ two equivalent YAML representations that both parse to list[int]. The inline style (- milestones: [10]) and expanded style (- milestones:\n - 10) are semantically identical at runtime and satisfy the TypedDict definition in SinglePytorchMilestonesConfig. All 30+ config files checked adhere to the expected pattern within scheduler configurations. No validation issues detected.


187-190: No issues found—making dtensor_cfg and dynamic_batching required is safe.

All base PolicyConfig YAML files (grpo_math_1B.yaml, grpo_math_1B_megatron.yaml, dpo.yaml, sft.yaml, rm.yaml) explicitly define both required fields. Child configs inherit from these base configs via the defaults: mechanism, and nemo_rl/utils/config.py's load_config_with_inheritance() function uses OmegaConf to properly resolve and merge parent configs with child configs, ensuring all required fields are present after config loading completes.

The initial scan showing "missing" fields were false positives—files like eval.yaml and recipe configs that appeared to lack these fields either (1) are not PolicyConfig files (evaluation/infrastructure configs), or (2) inherit missing fields from base configs via OmegaConf's merge mechanism during actual loading. All PolicyConfig instances used in practice will have both fields.

@terrykong terrykong force-pushed the tk/pydantic-test-validation branch from 061d4da to 00f97cc Compare October 17, 2025 16:23
@terrykong terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Oct 17, 2025
@terrykong terrykong requested a review from yuki-97 October 17, 2025 23:52
@terrykong
Copy link
Contributor Author

@yuki-97 could you also help review? in particular this change: 723c4b3

Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
Signed-off-by: Terry Kong <terryk@nvidia.com>
@terrykong terrykong force-pushed the tk/pydantic-test-validation branch from 723c4b3 to 3bbd97c Compare October 17, 2025 23:53
Signed-off-by: Terry Kong <terryk@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests documentation Improvements or additions to documentation r0.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants