Skip to content

Conversation

parthchadha
Copy link
Contributor

@parthchadha parthchadha commented Oct 16, 2025

What does this PR do ?

This PR adds the capability to do inflight weight updates which prevents stall in async RL pipeline and provides increased throughput.

Convergence plots for LLama8B, 4K seq len:
The plot below shows 3 type of runs:

  1. Sync baseline with 4 nodes
  2. Async baseline with 1,2,8 staleness factor and 5 nodes (4 for training and 1 for generation)
  3. Async with inflight weight update with 1,2,8 staleness factor and 5 nodes (4 for training and 1 for generation)
Screenshot 2025-10-16 at 12 41 38 PM Screenshot 2025-10-16 at 12 42 11 PM

Timing tokens per sec per GPU: (higher the better)
Sync at ~187
Async best at ~ 478

Total step time:
Sync at ~60s
Async best at ~ 21s

Screenshot 2025-10-16 at 12 43 00 PM Screenshot 2025-10-16 at 12 43 31 PM

Issues

List issues that this PR closes (syntax):

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

  • ...

Summary by CodeRabbit

  • New Features

    • Added in_flight_weight_updates configuration option to control async GRPO weight update behavior
    • Enhanced vLLM V1 async engine integration with smarter generation waiting logic
  • Bug Fixes

    • Improved metric logging robustness with better error handling for non-scalar values
  • Chores

    • Updated checkpoint saving configuration handling

@parthchadha parthchadha requested review from a team as code owners October 16, 2025 19:44
Copy link
Contributor

coderabbitai bot commented Oct 16, 2025

📝 Walkthrough

Walkthrough

The changes implement in-flight weight updates for async GRPO, add defensive error handling in the TensorBoard logger, and pass checkpointing configuration to checkpoint saving operations.

Changes

Cohort / File(s) Summary
Configuration Updates
examples/configs/grpo_math_1B.yaml
Adds in_flight_weight_updates: false configuration option to async GRPO settings
Async Processing & Algorithm
nemo_rl/algorithms/async_utils.py, nemo_rl/algorithms/grpo.py
Implements conditional waiting for pending generations based on in-flight weight updates in prepare_for_refit; adds messaging about active threads during weight updates; passes checkpointing config to policy checkpoint saving
Logger Robustness
nemo_rl/utils/logger.py
Skips non-scalar metric values with warnings; wraps TensorBoard logging in try/except for defensive error handling

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The changes span multiple files with a mix of logic additions (conditional wait behavior in async utilities), parameter passing (checkpointing config), and error handling (logger improvements). While the individual changes are straightforward, they affect distinct concerns requiring separate reasoning for each modification, and the conditional logic in async utilities requires understanding the interaction between vLLM engine configurations and weight update flows.

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test Results For Major Changes ✅ Passed The PR adds a major feature for inflight weight updates in asynchronous RL pipelines, which could affect both performance and convergence. According to the PR description, convergence plots for Llama8B with 4K sequence length are included, comparing synchronous baseline, asynchronous baseline with varying staleness factors, and asynchronous with inflight weight updates. These plots provide the necessary before-and-after performance numbers and convergence verification with explicit configuration context, satisfying the check requirements for major changes affecting performance and numerics.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "feat: add capability to update weights inflight during generation" directly and clearly describes the main objective of this PR. The changes across all modified files center on implementing in-flight weight updates: the configuration file adds a new toggle for this feature, async_utils.py implements the conditional wait logic for in-flight updates, and supporting changes in grpo.py and logger.py enable this functionality. The title is specific, concise (65 characters), and uses clear technical language that a teammate scanning git history would immediately understand as referring to allowing weight updates to occur while generation is in progress.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pchadha/inflight-update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@parthchadha parthchadha added the CI:L1 Run doctests, unit tests, and functional tests label Oct 16, 2025
@parthchadha parthchadha changed the title feat: add capability to update weights inflight (stall, update and continue) feat: add capability to update weights inflight during generation Oct 16, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
nemo_rl/utils/logger.py (1)

136-147: Good defensive error handling for TensorBoard logging.

The changes correctly filter out non-scalar metrics and add exception handling to prevent logging failures from disrupting the training pipeline. This aligns with the robustness improvements mentioned in the PR objectives.

The static analysis tool flags the blind Exception catch at line 145. While defensive logging is appropriate, you could be more specific:

             try:
                 self.writer.add_scalar(name, value, step)
-            except Exception as e:
+            except (ValueError, TypeError, RuntimeError) as e:
                 print(f"Warning: Failed to log metric '{name}' to TensorBoard: {e}")
                 continue

This catches the most common TensorBoard logging errors while avoiding masking unexpected issues.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dee3fd9 and 92772ab.

📒 Files selected for processing (4)
  • examples/configs/grpo_math_1B.yaml (1 hunks)
  • nemo_rl/algorithms/async_utils.py (2 hunks)
  • nemo_rl/algorithms/grpo.py (1 hunks)
  • nemo_rl/utils/logger.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

  • nemo_rl/algorithms/async_utils.py
  • nemo_rl/algorithms/grpo.py
  • nemo_rl/utils/logger.py
nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

  • nemo_rl/algorithms/async_utils.py
  • nemo_rl/algorithms/grpo.py
  • nemo_rl/utils/logger.py
examples/configs/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/
.yaml

Files:

  • examples/configs/grpo_math_1B.yaml
🪛 Ruff (0.14.0)
nemo_rl/utils/logger.py

145-145: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Lint check
  • GitHub Check: Post automodel integration comment / Comment on PR
  • GitHub Check: Post submodule check comment / Comment on PR
🔇 Additional comments (4)
nemo_rl/algorithms/grpo.py (1)

1746-1746: LGTM! Consistent configuration propagation.

Passing the checkpointing configuration to save_checkpoint aligns with the config-driven design principle stated in the coding guidelines ("YAML is the single source of truth for defaults"). This change makes the async GRPO path consistent with the synchronous path (line 943) and properly integrates checkpointing configuration into the persistence flow.

examples/configs/grpo_math_1B.yaml (1)

20-20: Well-documented configuration addition.

The new in_flight_weight_updates option is properly documented with a descriptive comment and has a safe default value (false). This follows the coding guidelines requiring that "exemplar configs under examples/configs/*.yaml must include documented defaults."

As per coding guidelines.

nemo_rl/algorithms/async_utils.py (2)

526-573: Clean implementation of in-flight weight update logic.

The conditional waiting behavior is well-structured and clearly documented:

  • The docstring explains the difference between async and non-async engines
  • Safe config access with .get() and default values prevents errors
  • Clear print statements aid debugging and observability
  • The logic correctly skips waiting when both async_engine and in_flight_weight_updates are enabled

This implementation properly enables the throughput improvements mentioned in the PR objectives by allowing ongoing generations to continue during weight updates.


478-480: Helpful observability improvement.

The additional message clarifies the behavior of vLLM V1 async engine during weight updates, improving the user experience by explaining that active generation threads can continue executing. This aligns well with the in-flight weight update feature.

Copy link
Contributor

@terrykong terrykong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! IIUC, the throughput is 2-3x better, than the sync baseline. what is the difference with just regular async RL and waiting for the generations to finish on llama8b?

could you also update

and
max_trajectory_age_steps: int
maybe w/ a comment linking to the magistral paper or some other seminal paper? I feel like it'd also be good to advise that the user should expect generations could come from different aged weights and kv cache is also "stale"

@youngeunkwon0405
Copy link
Contributor

Hi @parthchadha , could we have assertion when max_trajectory_age_steps > 1 then in_flight_weight_updates should be true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L1 Run doctests, unit tests, and functional tests r0.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants