chore: use pydantic for yaml test validation #1382

terrykong · 2025-10-17T05:50:08Z

We've had nightly tests fail because the yaml configs don't adhere to the appropriate MasterConfig. This introduces pydantic in the unit tests to validate all our configs.

Introducing has helped fill in some missing config fields.

We don't use pydantic to validate at runtime since that may be jarring to use an old config with extra keys since we do not want to forbid users from using a config structure that works for them. Instead, we only use this for testing to make sure the current configs in the repo adhere to the current state of the code. This way, if a user changes a config value from NotRequired[int] to int, we should be able to catch that now on the entire MasterConfig.

This also includes some small config fixes:

removing stop_token_ids from the configs since if someone swaps out the model, it may be surprising that the stop tokens are kept from the previous tokenizer
add a check for defer_fp32_logits==True if logprob_chunk_size>0

Summary by CodeRabbit

New Features
- Added training optimization configuration options including bias activation fusion, optimizer offloading, and custom FSDP settings.
- Expanded configuration flexibility to allow optional values for resource allocation and generation parameters.
Bug Fixes
- Updated padding token handling in generation workflows.
- Fixed default values for optimization flags.
Configuration Updates
- Removed explicit stop token IDs from generation configs for improved default termination behavior.
- Enhanced type safety for optional configuration parameters.

coderabbitai · 2025-10-17T05:50:36Z

📝 Walkthrough

Walkthrough

The PR refactors configuration type definitions across generation and policy modules to use NotRequired and union types with None, introduces internal _pad_token_id handling for padding tokens, creates disabled-variant TypedDict classes for optional policy configuration blocks, removes stop_token_ids from multiple GRPO example configs, and updates corresponding generation and policy code to align with these structural changes.

Changes

Cohort / File(s)	Summary
Type Interface & Schema Updates `docs/design-docs/generation.md`, `nemo_rl/models/generation/interfaces.py`, `nemo_rl/algorithms/loss_functions.py`	Updated TypedDict field signatures to support nullable/optional values: `GenerationConfig.top_k` (int → int \| None), `ClippedPGLossConfig.ratio_clip_c` (float → float \| None). New `OptionalResourcesConfig` TypedDict with nullable node/GPU counts. Added `_pad_token_id` field to `GenerationConfig` for internal padding token management.
Data Config Type Changes `nemo_rl/data/__init__.py`, `nemo_rl/environments/math_environment.py`, `nemo_rl/evals/eval.py`	Extended `DataConfig` fields to allow None: `prompt_file`, `system_prompt_file`, `output_key`, `split`, `seed` (all `NotRequired[str \| None]`). Made `MathDataConfig.problem_key` and `solution_key` optional. Replaced `Optional[...]` with `NotRequired[... \| None]` in `MathEnvConfig`. Wrapped `MathEnvConfig` in new `_PassThroughMathConfig`.
Policy Config Refactoring `nemo_rl/models/policy/__init__.py`	Major restructuring: introduced `*ConfigDisabled` variants (DTensor, SequencePacking, Megatron, DynamicBatching) for explicit disabled states. Changed `enabled` fields from `bool` to `Literal[True]`. Updated policy block fields to union types (e.g., `DTensorConfig \| DTensorConfigDisabled`). Modified `MegatronConfig` (env_vars nullable, pipeline stage fields nullable, optimizer/scheduler now required). Added `SinglePytorchMilestonesConfig`. Extended `PolicyConfig.logprob_chunk_size` to `NotRequired[int \| None]`. Added `max_grad_norm: NotRequired[float \| int \| None]`.
Checkpoint Config Updates `nemo_rl/utils/checkpoint.py`	Changed `CheckpointingConfig.model_save_format` from `str` to `str \| None` (and `NotRequired[str]` → `NotRequired[str \| None]`).
Generation & VLLM Worker Code `nemo_rl/models/generation/vllm/vllm_generation.py`, `nemo_rl/models/generation/vllm/vllm_worker.py`, `nemo_rl/models/generation/vllm/vllm_worker_async.py`, `nemo_rl/models/generation/__init__.py`	Replaced `pad_token_id` with `_pad_token_id` throughout for internal padding token retrieval (3 locations in vllm_generation.py, verify_right_padding calls, output tensor creation). Updated pad token source in vllm_worker and vllm_worker_async. Changed `top_k` retrieval from `.get()` to direct indexing. Added warning guard in generation `__init__.py` when `_pad_token_id` exists in config.
Policy & Rollouts Updates `nemo_rl/models/policy/lm_policy.py`, `nemo_rl/models/policy/megatron_policy_worker.py`, `nemo_rl/experience/rollouts.py`	Updated `lm_policy.generate` to use `_pad_token_id` instead of `pad_token_id`. Modified save_checkpoint condition to check `model_save_format` is not None. Set `defer_fp32_logits` default to False in Megatron initialization; added validation assertion that `defer_fp32_logits` must be True when `logprob_chunk_size > 0`. Changed `rollouts.generate_responses_async` to use `tokenizer.pad_token_id` directly instead of reading from policy config.
Example Config Files: Stop Tokens Removal `examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-{1n8g-megatron-fp8-{e2e,rollouts.v2},2n8g-fsdp2tp1-noncolocated,4n8g-fsdp2tp1-long.v3}.yaml`, `examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-{1n8g-fsdp2tp1.v3,1n8g-megatron}.yaml`, `examples/configs/recipes/llm/grpo-qwen2.5-{32b-32n8g-fsdp2tp8sp-actckpt-{long.v3,}.v3,7b-instruct-4n8g-{fsdp2tp4sp.v3,megatron},math-1.5b-instruct-1n8g-fsdp2tp1.v3}.yaml`	Removed `stop_token_ids` field from generation configs across 12 recipe files (both Llama 3.1/3.2 and Qwen2.5 variants).
Example Config Files: Megatron & Optimizer Settings `examples/configs/dpo.yaml`, `examples/configs/grpo_math_1B.yaml`, `examples/configs/sft_openmathinstruct2_megatron.yaml`, `examples/configs/vlm_grpo_3B.yaml`, `examples/configs/vlm_grpo_3B_megatron.yaml`	Added boolean fields: `bias_activation_fusion` (True/False), `use_custom_fsdp` (False). Changed `defer_fp32_logits` from null to False in grpo_math_1B. Added optimizer offload settings (`optimizer_cpu_offload`, `optimizer_offload_fraction`) in vlm_grpo_3B. Removed `optimizer: null` from vlm_grpo_3B_megatron.
Example Config Files: Env Vars & Minor Additions `examples/configs/sft.yaml`	Added empty `env_vars: {}` under `policy.dtensor_cfg` and `policy.megatron_cfg`.
Test Updates: Config Validation `tests/unit/test_config_validation.py`	Refactored validation logic to use Pydantic `TypeAdapter` for TypedDict-based config validation. Introduced `validate_config_section()` helper function. Replaced multi-branch nested validation with centralized call supporting multiple `MasterConfig` variants (Distillation, DPO, GRPO, RM, SFT, Eval). Added OmegaConf resolver for multiplication operator.
Test Updates: Recipe & VLLM `tests/unit/test_recipes_and_test_suites.py`, `tests/unit/models/generation/test_vllm_generation.py`, `tests/unit/models/generation/test_vllm_large_model.py`	Added "rm" entry to `ALGO_MAPPING_TO_BASE_YAML`. Removed `test_all_recipes_can_merge_configs_with_base_config` test function. Updated pad token retrieval from `pad_token_id` to `_pad_token_id` in vLLM test paths.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

This PR involves substantial type system refactoring across multiple modules (policy configs, generation interfaces, data configs) combined with targeted code updates to padding token handling. While many changes follow consistent patterns (e.g., Optional[X] → NotRequired[X | None], pad_token_id → _pad_token_id), the scope includes dense TypedDict restructuring with new disabled-variant classes, conditional logic additions (assertions, warnings), and distributed adjustments across generation workers and config validation. The heterogeneity of changes (type updates, config removals, new public types, code logic shifts) demands separate reasoning per logical area despite some repetitive patterns.

Possibly related PRs

feat: add on policy distillation algorithm #1006: Makes GenerationConfig.top_k nullable and updates multiple distillation configs to set top_k: null, directly intersecting with this PR's top_k type changes.
feat: Expose async vLLM engine as HTTP server #1110: Modifies vLLM generation and worker code (vllm_generation.py, vllm_worker_async.py), overlapping with this PR's pad_token_id and configuration key refactoring in the same files.
feat: Deepseek migration to Megatron-Bridge + CP support #1059: Modifies megatron policy worker code and Megatron config handling in megatron_policy_worker.py, overlapping with this PR's defer_fp32_logits and logprob_chunk_size validation logic.

Suggested labels

config-refactor, type-system, generation, policy

Suggested reviewers

parthchadha
yfw

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.47% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Test Results For Major Changes	✅ Passed	This PR makes test infrastructure improvements by introducing Pydantic-based YAML config validation that operates only in tests, not at runtime. The PR objectives explicitly state that validation is test-only to avoid breaking existing users. The changes primarily consist of TypedDict updates for configuration schemas and YAML example file modifications. The PR description provides some testing information ("adding pydantic helped identify missing config fields"), though not extensive. Since these changes do not affect runtime functionality or numerical behavior, and are limited to test validation, the absence of performance benchmarks or convergence regression data is appropriate. The changes are not major from a runtime perspective, being focused on improved testing and configuration validation.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The PR title "chore: use pydantic for yaml validation" accurately reflects the main objective and primary code change. The PR introduces Pydantic's TypeAdapter in the test validation infrastructure (tests/unit/test_config_validation.py) to validate repository YAML configs against the project's MasterConfig TypedDicts, replacing the previous custom recursive validation logic. The title is specific, concise, and clearly communicates the primary change without being overly broad or vague. All supporting changes—such as type annotation updates in configuration classes and YAML file modifications—are in service of this core objective to enable Pydantic-based validation.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch tk/pydantic-test-validation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/unit/test_recipes_and_test_suites.py (1)

42-45: Remove unused ALLOWED_ADDITIONAL_CONFIG_KEYS constant.

The ripgrep search confirms this constant is defined but never referenced anywhere in the codebase. Since test_all_recipes_can_merge_configs_with_base_config was removed, this constant is now dead code and should be deleted from lines 42–45.

🧹 Nitpick comments (12)

nemo_rl/models/generation/__init__.py (1)
31-36: Add stacklevel when emitting this warning.

Please pass stacklevel=2 (or similar) to warnings.warn so callers get a useful file/line in the warning trace; this avoids Ruff’s B028 finding and makes the diagnostics actionable.
-        warnings.warn(
+        warnings.warn(
             "'_pad_token_id' found in generation config and will be overridden with tokenizer.pad_token_id. "
             "Note: '_pad_token_id' is intended for internal use and has no effect when set in user-provided configs.",
-            UserWarning,
+            UserWarning,
+            stacklevel=2,
         )
nemo_rl/models/policy/megatron_policy_worker.py (1)
765-766: Mirror the no-hidden-default change for the reference model path.

Keep config handling consistent across both model instantiations.

As per coding guidelines
-                wrap_cast_model_output_to_fp32=(
-                    not self.cfg["megatron_cfg"].get("defer_fp32_logits", False)
-                ),
+                wrap_cast_model_output_to_fp32=(
+                    not self.cfg["megatron_cfg"]["defer_fp32_logits"]
+                ),
nemo_rl/evals/eval.py (1)
52-55: MasterConfig.env shape change is breaking; consider backwards-compatible typing.

Switching env from MathEnvConfig to {"math": MathEnvConfig} will fail existing configs and the new Pydantic tests. If the intent is pass-through without runtime change, accept both shapes.
-class _PassThroughMathConfig(TypedDict):
-    math: MathEnvConfig
+class _PassThroughMathConfig(TypedDict, total=False):
+    # Optional wrapper to allow env: {math: {...}} while preserving legacy env: {...}
+    math: MathEnvConfig
@@
-class MasterConfig(TypedDict):
+class MasterConfig(TypedDict):
@@
-    env: _PassThroughMathConfig
+    # Accept either legacy MathEnvConfig or wrapped {"math": MathEnvConfig}
+    env: MathEnvConfig | _PassThroughMathConfig
Please confirm all example eval YAMLs were updated to the new nested env shape; otherwise tests will start failing unexpectedly.

Also applies to: 62-62
nemo_rl/environments/math_environment.py (1)

43-46: LGTM on NotRequired fields; document and avoid code-level defaults.

Making stop_strings and verifier_type NotRequired is consistent. However, the comment “None defaults to 'math'” plus code using cfg.get("verifier_type", "math") introduces a hidden default. Prefer requiring this in YAML (and failing fast) or document the default in YAML examples, not in code.

As per coding guidelines
tests/unit/test_config_validation.py (1)
112-114: Path check for eval configs is POSIX-specific; make OS-agnostic.

Use pathlib to inspect parts or .as_posix() to avoid false negatives on Windows.
-    if "/evals/" in config_file:
+    from pathlib import Path
+    if "evals" in Path(config_file).parts:
nemo_rl/models/generation/interfaces.py (1)
107-116: Clarify “optional” semantics for resources; consider total=False.

As written, keys are required but may be null. If the intent is optional presence, make the TypedDict partial.
-class OptionalResourcesConfig(TypedDict):
+class OptionalResourcesConfig(TypedDict, total=False):
     # Same as ResourcesConfig, but fields can be null and are validated in grpo.py
     gpus_per_node: int | None
     num_nodes: int | None
Confirm downstream code handles missing keys and None consistently (e.g., cluster construction).
nemo_rl/models/policy/__init__.py (6)
110-112: Make pipeline-stage layer counts optional to avoid forcing keys in YAML.

These fields are often omitted unless PP is used. Recommend NotRequired to prevent needless failures during test validation.
-    num_layers_in_first_pipeline_stage: int | None
-    num_layers_in_last_pipeline_stage: int | None
+    num_layers_in_first_pipeline_stage: NotRequired[int | None]
+    num_layers_in_last_pipeline_stage: NotRequired[int | None]
33-35: custom_parallel_plan should be optional.

Configs typically omit this. Requiring the key (even as None) can break existing YAMLs.
-    custom_parallel_plan: str | None
+    custom_parallel_plan: NotRequired[str | None]
152-153: Prefer a proper type alias (Python 3.12) to avoid creating a runtime module variable.

Use PEP 695’s type to declare an alias for static analysis without introducing a global.
-SchedulerMilestones = dict[str, list[int]]
+type SchedulerMilestones = dict[str, list[int]]
171-201: Add a concise docstring for PolicyConfig (public surface).

Improves clarity and meets our docstring guideline for public interfaces.
-class PolicyConfig(TypedDict):
+class PolicyConfig(TypedDict):
+    """Top-level policy configuration.
+
+    Keys marked NotRequired may be omitted in YAML. Optional keys that appear may be set to null.
+    Required sub-blocks with Disabled variants (e.g., dtensor_cfg, dynamic_batching) must specify enabled=True/False.
+    """
As per coding guidelines

20-22: Document Disabled variants.

Consider single-line docstrings for Disabled types to make intent explicit (e.g., “Disabled variant for , requires enabled=False”). Light but helpful for readers and tooling.

Also applies to: 37-39, 98-101, 155-157

118-121: Enforce cross-field constraint in tests: logprob_chunk_size ⇒ defer_fp32_logits=True.

The comments state this requirement, but typing can’t express it. Since Pydantic is used in tests, add a validator to assert this relation to catch misconfigured YAMLs early. I can draft the test-side validator if helpful.

Also applies to: 177-181

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dee3fd9 and fc72ba0.

📒 Files selected for processing (36)

docs/design-docs/generation.md (1 hunks)
examples/configs/dpo.yaml (2 hunks)
examples/configs/grpo_math_1B.yaml (1 hunks)
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-e2e.yaml (0 hunks)
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.yaml (0 hunks)
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml (0 hunks)
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml (0 hunks)
examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml (0 hunks)
examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml (0 hunks)
examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml (0 hunks)
examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml (0 hunks)
examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml (0 hunks)
examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml (0 hunks)
examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml (0 hunks)
examples/configs/sft.yaml (2 hunks)
examples/configs/sft_openmathinstruct2_megatron.yaml (1 hunks)
examples/configs/vlm_grpo_3B.yaml (2 hunks)
examples/configs/vlm_grpo_3B_megatron.yaml (2 hunks)
nemo_rl/algorithms/loss_functions.py (1 hunks)
nemo_rl/data/__init__.py (2 hunks)
nemo_rl/environments/math_environment.py (2 hunks)
nemo_rl/evals/eval.py (1 hunks)
nemo_rl/experience/rollouts.py (1 hunks)
nemo_rl/models/generation/__init__.py (2 hunks)
nemo_rl/models/generation/interfaces.py (2 hunks)
nemo_rl/models/generation/vllm/vllm_generation.py (3 hunks)
nemo_rl/models/generation/vllm/vllm_worker.py (2 hunks)
nemo_rl/models/generation/vllm/vllm_worker_async.py (2 hunks)
nemo_rl/models/policy/__init__.py (5 hunks)
nemo_rl/models/policy/lm_policy.py (2 hunks)
nemo_rl/models/policy/megatron_policy_worker.py (3 hunks)
nemo_rl/utils/checkpoint.py (2 hunks)
tests/unit/models/generation/test_vllm_generation.py (1 hunks)
tests/unit/models/generation/test_vllm_large_model.py (1 hunks)
tests/unit/test_config_validation.py (1 hunks)
tests/unit/test_recipes_and_test_suites.py (1 hunks)

💤 Files with no reviewable changes (11)

examples/configs/recipes/llm/grpo-qwen2.5-math-1.5b-instruct-1n8g-fsdp2tp1.v3.yaml
examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt-long.v3.yaml
examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-megatron.yaml
examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-fsdp2tp4sp.v3.yaml
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-rollouts.v2.yaml
examples/configs/recipes/llm/grpo-qwen2.5-32b-32n8g-fsdp2tp8sp-actckpt.v3.yaml
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-2n8g-fsdp2tp1-noncolocated.yaml
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-4n8g-fsdp2tp1-long.v3.yaml
examples/configs/recipes/llm/grpo-llama3.2-1b-instruct-1n8g-fsdp2tp1.v3.yaml
examples/configs/recipes/llm/grpo-llama3.1-8b-instruct-1n8g-megatron-fp8-e2e.yaml
examples/configs/recipes/llm/grpo-qwen2.5-7b-instruct-4n8g-megatron.yaml

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

nemo_rl/models/generation/vllm/vllm_worker.py
nemo_rl/experience/rollouts.py
nemo_rl/algorithms/loss_functions.py
nemo_rl/models/generation/vllm/vllm_generation.py
tests/unit/models/generation/test_vllm_generation.py
nemo_rl/environments/math_environment.py
nemo_rl/data/__init__.py
nemo_rl/models/generation/interfaces.py
nemo_rl/models/policy/lm_policy.py
tests/unit/models/generation/test_vllm_large_model.py
tests/unit/test_config_validation.py
nemo_rl/models/generation/__init__.py
nemo_rl/models/generation/vllm/vllm_worker_async.py
tests/unit/test_recipes_and_test_suites.py
nemo_rl/utils/checkpoint.py
nemo_rl/evals/eval.py
nemo_rl/models/policy/megatron_policy_worker.py
nemo_rl/models/policy/__init__.py

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

nemo_rl/models/generation/vllm/vllm_worker.py
nemo_rl/experience/rollouts.py
nemo_rl/algorithms/loss_functions.py
nemo_rl/models/generation/vllm/vllm_generation.py
nemo_rl/environments/math_environment.py
nemo_rl/data/__init__.py
nemo_rl/models/generation/interfaces.py
nemo_rl/models/policy/lm_policy.py
nemo_rl/models/generation/__init__.py
nemo_rl/models/generation/vllm/vllm_worker_async.py
nemo_rl/utils/checkpoint.py
nemo_rl/evals/eval.py
nemo_rl/models/policy/megatron_policy_worker.py
nemo_rl/models/policy/__init__.py

examples/configs/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/.yaml

Files:

examples/configs/sft.yaml
examples/configs/grpo_math_1B.yaml
examples/configs/vlm_grpo_3B.yaml
examples/configs/vlm_grpo_3B_megatron.yaml
examples/configs/sft_openmathinstruct2_megatron.yaml
examples/configs/dpo.yaml

docs/**/*.md

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

When a markdown doc under docs/**/*.md is added or renamed, update docs/index.md to include it in the appropriate section

Files:

docs/design-docs/generation.md

🧠 Learnings (1)

📚 Learning: 2025-09-19T02:44:38.451Z

Learnt from: shuo-nvidia
PR: NVIDIA-NeMo/RL#1006
File: examples/configs/recipes/llm/distillation-qwen3-32b-to-1.7b-base-1n8g-fsdp2tp1.v1.yaml:73-84
Timestamp: 2025-09-19T02:44:38.451Z
Learning: The scheduler configuration format with a separate "milestones: [20]" entry (not wrapped under name/kwargs) is a valid and established pattern used across GRPO, DPO, and distillation configs in the NeMo RL codebase. This format specifies transition points between different schedulers (e.g., LinearLR for warmup steps, then ConstantLR).

Applied to files:

nemo_rl/models/policy/__init__.py

🧬 Code graph analysis (9)

nemo_rl/models/generation/vllm/vllm_worker.py (1)

nemo_rl/models/generation/interfaces.py (1)

verify_right_padding (23-99)

nemo_rl/experience/rollouts.py (2)

tests/unit/environments/test_retriever.py (1)

tokenizer (84-93)

tests/unit/environments/test_code_environment.py (1)

tokenizer (85-94)

nemo_rl/data/__init__.py (1)

tests/unit/data/test_data_processor.py (1)

system_prompt_file (191-195)

tests/unit/models/generation/test_vllm_large_model.py (2)

tests/unit/environments/test_retriever.py (1)

tokenizer (84-93)

tests/unit/environments/test_code_environment.py (1)

tokenizer (85-94)

tests/unit/test_config_validation.py (4)

tests/unit/data/packing/test_algorithms.py (1)

algorithms (97-104)

nemo_rl/evals/eval.py (1)

MasterConfig (57-63)

nemo_rl/algorithms/grpo.py (1)

MasterConfig (128-136)

nemo_rl/algorithms/distillation.py (1)

MasterConfig (105-116)

nemo_rl/models/generation/__init__.py (2)

tests/unit/models/generation/test_vllm_generation.py (1)

tokenizer (239-242)

tests/unit/models/generation/test_vllm_large_model.py (1)

tokenizer (82-85)

nemo_rl/models/generation/vllm/vllm_worker_async.py (1)

nemo_rl/models/generation/interfaces.py (1)

verify_right_padding (23-99)

nemo_rl/evals/eval.py (4)

nemo_rl/environments/math_environment.py (1)

MathEnvConfig (41-45)

nemo_rl/models/generation/interfaces.py (1)

GenerationConfig (118-131)

nemo_rl/models/policy/__init__.py (1)

TokenizerConfig (131-135)

nemo_rl/data/__init__.py (1)

MathDataConfig (49-51)

nemo_rl/models/policy/__init__.py (2)

nemo_rl/models/generation/interfaces.py (1)

GenerationConfig (118-131)

nemo_rl/models/policy/megatron_policy_worker.py (1)

freeze_moe_router (249-261)

🪛 Ruff (0.14.0)

tests/unit/test_config_validation.py

50-50: Prefer TypeError exception for invalid type

(TRY004)

50-50: Avoid specifying long messages outside the exception class

(TRY003)

81-84: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

106-106: Avoid specifying long messages outside the exception class

(TRY003)

131-133: Avoid specifying long messages outside the exception class

(TRY003)

nemo_rl/models/generation/__init__.py

31-31: No explicit stacklevel keyword argument found

Set stacklevel=2

(B028)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Lint check
GitHub Check: Post automodel integration comment / Comment on PR
GitHub Check: Post submodule check comment / Comment on PR

🔇 Additional comments (14)

tests/unit/test_recipes_and_test_suites.py (1)

1-249: No clarification needed—review comment is incorrect.

The pydantic validation is properly implemented in a dedicated test file (tests/unit/test_config_validation.py) which imports pydantic's TypeAdapter and ValidationError and validates YAML configs using multiple MasterConfig classes. The file under review (test_recipes_and_test_suites.py) correctly contains only test suite metadata validation and recipe YAML inventory checks—it is not intended to contain pydantic config validation logic. The removed test (test_all_recipes_can_merge_configs_with_base_config) was replaced by validation logic that now lives in the dedicated test_config_validation.py file, representing a proper separation of concerns.

Likely an incorrect or invalid review comment.

examples/configs/vlm_grpo_3B.yaml (2)

82-84: Well documented config additions.

The new config keys include proper documentation explaining their purpose and performance impact, following the coding guidelines for exemplar configs.

Based on coding guidelines.

109-111: Well documented config additions.

The optimizer offload fields include proper documentation, following the coding guidelines for exemplar configs.

Based on coding guidelines.

nemo_rl/utils/checkpoint.py (1)

45-45: Typing update correctly reflects nullable field.

The type changes make model_save_format explicitly nullable, which aligns with the existing logic at line 101 that uses .get("model_save_format", "safetensors") to handle None/missing values.

Also applies to: 60-60

nemo_rl/data/__init__.py (2)

18-20: Good TODO comments for future refactoring.

The TODO comments properly document the intention to split these monolithic TypedDict classes into more specific, strongly-typed variants (e.g., PreferenceDataConfig, ResponseDataConfig, MMLUConfig, AIMEConfig). This will enable more rigorous type checking.

Also applies to: 46-48

23-35: Nullable typing changes align with pydantic validation.

Adding | None to NotRequired fields makes the optionality explicit and helps pydantic catch missing vs null distinction in YAML configs.

nemo_rl/experience/rollouts.py (1)

163-163: Padding source correctly unified to tokenizer.

Using tokenizer.pad_token_id directly ensures consistent padding across generation paths, aligning with the internal _pad_token_id pattern used in vLLM generation code.

tests/unit/models/generation/test_vllm_large_model.py (1)

171-171: Correct usage of internal _pad_token_id with fallback.

The test properly accesses the private _pad_token_id field with a defensive fallback to tokenizer.pad_token_id, ensuring robustness across different configuration states.

nemo_rl/models/generation/interfaces.py (1)

125-132: top_k and stop_ now allow None; verify all call sites handle None.*

This changes the contract and may flow into vLLM/engine params. Ensure None is interpreted as “disabled” or mapped to a concrete value before invoking backends.

Would you like a quick scan script to find usages of ["top_k"], ["stop_token_ids"], ["stop_strings"] and report non-None assumptions?

nemo_rl/models/policy/megatron_policy_worker.py (1)

290-291: Review comment is incorrect; defer_fp32_logits is optional, not required.

Verification found 60+ YAML config files in examples/configs/ missing the defer_fp32_logits key, including core configs like grpo_math_8B.yaml, sft.yaml, dpo.yaml, and rm.yaml. The current code uses .get("defer_fp32_logits", False) precisely to handle this optionality.

The coding guideline to "access required config attributes directly" applies only when attributes are actually required. Here, the key is demonstrably optional across the codebase. Changing to direct indexing would cause KeyError on all existing configs lacking this key, breaking them.

The defensive .get() with default is the correct approach for optional configuration keys.

Likely an incorrect or invalid review comment.

nemo_rl/models/policy/__init__.py (4)

84-85: Good relaxation: lr_decay_iters is now optional.

Marking lr_decay_iters as NotRequired reduces spurious YAML breakages. LGTM.

135-136: Tokenizer kwargs broadened to allow None — good for backward compatibility.

LGTM.

148-150: Both YAML styles for milestones are confirmed compatible; TypedDict validates correctly.

The verification confirms all configs using milestones employ two equivalent YAML representations that both parse to list[int]. The inline style (- milestones: [10]) and expanded style (- milestones:\n - 10) are semantically identical at runtime and satisfy the TypedDict definition in SinglePytorchMilestonesConfig. All 30+ config files checked adhere to the expected pattern within scheduler configurations. No validation issues detected.

187-190: No issues found—making dtensor_cfg and dynamic_batching required is safe.

All base PolicyConfig YAML files (grpo_math_1B.yaml, grpo_math_1B_megatron.yaml, dpo.yaml, sft.yaml, rm.yaml) explicitly define both required fields. Child configs inherit from these base configs via the defaults: mechanism, and nemo_rl/utils/config.py's load_config_with_inheritance() function uses OmegaConf to properly resolve and merge parent configs with child configs, ensuring all required fields are present after config loading completes.

The initial scan showing "missing" fields were false positives—files like eval.yaml and recipe configs that appeared to lack these fields either (1) are not PolicyConfig files (evaluation/infrastructure configs), or (2) inherit missing fields from base configs via OmegaConf's merge mechanism during actual loading. All PolicyConfig instances used in practice will have both fields.

examples/configs/dpo.yaml

examples/configs/grpo_math_1B.yaml

examples/configs/sft_openmathinstruct2_megatron.yaml

examples/configs/sft.yaml

examples/configs/vlm_grpo_3B_megatron.yaml

nemo_rl/data/__init__.py

nemo_rl/models/policy/megatron_policy_worker.py

tests/unit/test_config_validation.py

tests/unit/test_recipes_and_test_suites.py

tests/unit/test_config_validation.py

terrykong · 2025-10-17T23:52:39Z

@yuki-97 could you also help review? in particular this change: 723c4b3

Signed-off-by: Terry Kong <terryk@nvidia.com>

nemo_rl/data/__init__.py

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong requested review from a team as code owners October 17, 2025 05:50

github-actions bot added the documentation Improvements or additions to documentation label Oct 17, 2025

terrykong requested a review from ashors1 October 17, 2025 05:53

terrykong changed the title ~~chore: use pydantic for yaml validation~~ chore: use pydantic for yaml test validation Oct 17, 2025

terrykong added the r0.4.0 label Oct 17, 2025

coderabbitai bot reviewed Oct 17, 2025

View reviewed changes

ashors1 reviewed Oct 17, 2025

View reviewed changes

tests/unit/test_config_validation.py Outdated Show resolved Hide resolved

terrykong force-pushed the tk/pydantic-test-validation branch from 061d4da to 00f97cc Compare October 17, 2025 16:23

terrykong added the CI:L1 Run doctests, unit tests, and functional tests label Oct 17, 2025

terrykong temporarily deployed to nemo-ci October 17, 2025 16:25 — with GitHub Actions Inactive

terrykong temporarily deployed to nemo-ci October 17, 2025 16:40 — with GitHub Actions Inactive

terrykong requested a review from yuki-97 October 17, 2025 23:52

terrykong added 6 commits October 17, 2025 23:53

feat: switch to pydantic for config test validation

382b290

Signed-off-by: Terry Kong <terryk@nvidia.com>

fix configs

f01cf97

Signed-off-by: Terry Kong <terryk@nvidia.com>

fixed

384a442

Signed-off-by: Terry Kong <terryk@nvidia.com>

cleanup

216e5cd

Signed-off-by: Terry Kong <terryk@nvidia.com>

address test failure

51e0de9

Signed-off-by: Terry Kong <terryk@nvidia.com>

fix eval configs

3bbd97c

Signed-off-by: Terry Kong <terryk@nvidia.com>

terrykong force-pushed the tk/pydantic-test-validation branch from 723c4b3 to 3bbd97c Compare October 17, 2025 23:53

yuki-97 reviewed Oct 18, 2025

View reviewed changes

nemo_rl/data/__init__.py Outdated Show resolved Hide resolved

no more shuffle eval

f74640a

Signed-off-by: Terry Kong <terryk@nvidia.com>

chore: use pydantic for yaml test validation #1382

Are you sure you want to change the base?

chore: use pydantic for yaml test validation #1382

Uh oh!

Conversation

terrykong commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

terrykong commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

terrykong commented Oct 17, 2025 •

edited

Loading

coderabbitai bot commented Oct 17, 2025 •

edited

Loading