feat: add capability to update weights inflight during generation #1381

parthchadha · 2025-10-16T19:44:40Z

What does this PR do ?

This PR adds the capability to do inflight weight updates which prevents stall in async RL pipeline and provides increased throughput.

Convergence plots for LLama8B, 4K seq len:
The plot below shows 3 type of runs:

Sync baseline with 4 nodes
Async baseline with 1,2,8 staleness factor and 5 nodes (4 for training and 1 for generation)
Async with inflight weight update with 1,2,8 staleness factor and 5 nodes (4 for training and 1 for generation)

Timing tokens per sec per GPU: (higher the better)
Sync at ~187
Async best at ~ 478

Total step time:
Sync at ~60s
Async best at ~ 21s

Issues

List issues that this PR closes (syntax):

Usage

You can potentially add a usage example below

# Add a code snippet demonstrating how to use this

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

...

Summary by CodeRabbit

New Features
- Added in_flight_weight_updates configuration option to control async GRPO weight update behavior
- Enhanced vLLM V1 async engine integration with smarter generation waiting logic
Bug Fixes
- Improved metric logging robustness with better error handling for non-scalar values
Chores
- Updated checkpoint saving configuration handling

…for generation Signed-off-by: Parth Chadha <pchadha@nvidia.com>

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

coderabbitai · 2025-10-16T19:45:06Z

📝 Walkthrough

Walkthrough

The changes implement in-flight weight updates for async GRPO, add defensive error handling in the TensorBoard logger, and pass checkpointing configuration to checkpoint saving operations.

Changes

Cohort / File(s)	Summary
Configuration Updates `examples/configs/grpo_math_1B.yaml`	Adds `in_flight_weight_updates: false` configuration option to async GRPO settings
Async Processing & Algorithm `nemo_rl/algorithms/async_utils.py`, `nemo_rl/algorithms/grpo.py`	Implements conditional waiting for pending generations based on in-flight weight updates in `prepare_for_refit`; adds messaging about active threads during weight updates; passes checkpointing config to policy checkpoint saving
Logger Robustness `nemo_rl/utils/logger.py`	Skips non-scalar metric values with warnings; wraps TensorBoard logging in try/except for defensive error handling

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

The changes span multiple files with a mix of logic additions (conditional wait behavior in async utilities), parameter passing (checkpointing config), and error handling (logger improvements). While the individual changes are straightforward, they affect distinct concerns requiring separate reasoning for each modification, and the conditional logic in async utilities requires understanding the interaction between vLLM engine configurations and weight update flows.

Pre-merge checks and finishing touches

✅ Passed checks (4 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Test Results For Major Changes	✅ Passed	The PR adds a major feature for inflight weight updates in asynchronous RL pipelines, which could affect both performance and convergence. According to the PR description, convergence plots for Llama8B with 4K sequence length are included, comparing synchronous baseline, asynchronous baseline with varying staleness factors, and asynchronous with inflight weight updates. These plots provide the necessary before-and-after performance numbers and convergence verification with explicit configuration context, satisfying the check requirements for major changes affecting performance and numerics.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The pull request title "feat: add capability to update weights inflight during generation" directly and clearly describes the main objective of this PR. The changes across all modified files center on implementing in-flight weight updates: the configuration file adds a new toggle for this feature, async_utils.py implements the conditional wait logic for in-flight updates, and supporting changes in grpo.py and logger.py enable this functionality. The title is specific, concise (65 characters), and uses clear technical language that a teammate scanning git history would immediately understand as referring to allowing weight updates to occur while generation is in progress.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pchadha/inflight-update

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

nemo_rl/utils/logger.py (1)
136-147: Good defensive error handling for TensorBoard logging.

The changes correctly filter out non-scalar metrics and add exception handling to prevent logging failures from disrupting the training pipeline. This aligns with the robustness improvements mentioned in the PR objectives.

The static analysis tool flags the blind Exception catch at line 145. While defensive logging is appropriate, you could be more specific:
             try:
                 self.writer.add_scalar(name, value, step)
-            except Exception as e:
+            except (ValueError, TypeError, RuntimeError) as e:
                 print(f"Warning: Failed to log metric '{name}' to TensorBoard: {e}")
                 continue
This catches the most common TensorBoard logging errors while avoiding masking unexpected issues.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dee3fd9 and 92772ab.

📒 Files selected for processing (4)

examples/configs/grpo_math_1B.yaml (1 hunks)
nemo_rl/algorithms/async_utils.py (2 hunks)
nemo_rl/algorithms/grpo.py (1 hunks)
nemo_rl/utils/logger.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

**/*.py: Follow the Google Python Style Guide for all Python code
Target Python 3.12+ for all Python code in NeMo-RL
Indent Python code with 4 spaces; do not use tabs
Python filenames should be snake_case (e.g., some_file.py)
Class names should be PascalCase
Function and method names should be snake_case
Local variable names should be snake_case; if starting with a number, prefix with k (e.g., k_99th_percentile)
Global variables should be UPPER_SNAKE_CASE and prefixed with G_ (e.g., G_MY_GLOBAL)
Constants should be UPPER_SNAKE_CASE
Avoid shadowing variables declared in an outer scope
Initialize all externally visible members of a class in the constructor
For public interfaces used outside a file, prefer docstrings over comments
Use comments mainly for code within a function or interfaces local to a file
Commented-out code must include a nearby comment explaining usage and why it is commented out; otherwise remove before merging
Use Google-style docstrings for classes and functions (Sphinx-parseable)
Avoid using reflection when functionality can be easily achieved without it
Limit except clauses to the smallest specific set of exceptions possible
For duck-typing via try/except, keep the try body minimal and use else for main logic
Add the NVIDIA copyright header (with current year) at the top of all Python files, excluding tests/ and test-only scripts

Files:

nemo_rl/algorithms/async_utils.py
nemo_rl/algorithms/grpo.py
nemo_rl/utils/logger.py

nemo_rl/**/*.py

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

nemo_rl/**/*.py: Do not set non-None configuration defaults in code; YAML is the single source of truth for defaults
Access required config attributes directly (e.g., policy_cfg["precision"]) and assume presence; do not introduce hidden defaults
Express configuration optionality via TypedDict using typing.NotRequired
When adding a new config key to a TypedDict subclass, document the key’s purpose, valid values/types, and recommended default in code
For any class or function decorated with @ray.remote, add '# pragma: no cover' on the class/def line (and on remote functions)

Files:

nemo_rl/algorithms/async_utils.py
nemo_rl/algorithms/grpo.py
nemo_rl/utils/logger.py

examples/configs/*.yaml

📄 CodeRabbit inference engine (CODING_GUIDELINES.md)

examples/configs/*.yaml: Exemplar configs under examples/configs/.yaml must include documented defaults
When adding a new config key, reflect its recommended default in exemplar YAMLs under examples/configs/.yaml

Files:

examples/configs/grpo_math_1B.yaml

🪛 Ruff (0.14.0)

nemo_rl/utils/logger.py

145-145: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Lint check
GitHub Check: Post automodel integration comment / Comment on PR
GitHub Check: Post submodule check comment / Comment on PR

🔇 Additional comments (4)

nemo_rl/algorithms/grpo.py (1)

1746-1746: LGTM! Consistent configuration propagation.

Passing the checkpointing configuration to save_checkpoint aligns with the config-driven design principle stated in the coding guidelines ("YAML is the single source of truth for defaults"). This change makes the async GRPO path consistent with the synchronous path (line 943) and properly integrates checkpointing configuration into the persistence flow.

examples/configs/grpo_math_1B.yaml (1)

20-20: Well-documented configuration addition.

The new in_flight_weight_updates option is properly documented with a descriptive comment and has a safe default value (false). This follows the coding guidelines requiring that "exemplar configs under examples/configs/*.yaml must include documented defaults."

As per coding guidelines.

nemo_rl/algorithms/async_utils.py (2)

526-573: Clean implementation of in-flight weight update logic.

The conditional waiting behavior is well-structured and clearly documented:

The docstring explains the difference between async and non-async engines

Safe config access with .get() and default values prevents errors

Clear print statements aid debugging and observability

The logic correctly skips waiting when both async_engine and in_flight_weight_updates are enabled

This implementation properly enables the throughput improvements mentioned in the PR objectives by allowing ongoing generations to continue during weight updates.

478-480: Helpful observability improvement.

The additional message clarifies the behavior of vLLM V1 async engine during weight updates, improving the user experience by explaining that active generation threads can continue executing. This aligns well with the in-flight weight update feature.

terrykong

awesome! IIUC, the throughput is 2-3x better, than the sync baseline. what is the difference with just regular async RL and waiting for the generations to finish on llama8b?

could you also update

RL/docs/guides/async-grpo.md

Line 4 in dee3fd9

and

RL/nemo_rl/algorithms/grpo.py

Line 82 in dee3fd9

max_trajectory_age_steps: int

maybe w/ a comment linking to the magistral paper or some other seminal paper? I feel like it'd also be good to advise that the user should expect generations could come from different aged weights and kv cache is also "stale"

youngeunkwon0405 · 2025-10-20T21:12:08Z

Hi @parthchadha , could we have assertion when max_trajectory_age_steps > 1 then in_flight_weight_updates should be true

parthchadha added 4 commits October 9, 2025 09:28

Update weights inflight during generation while using stale KV cache …

2c16177

…for generation Signed-off-by: Parth Chadha <pchadha@nvidia.com>

Merge remote-tracking branch 'origin/main' into pchadha/inflight-update

0f8ef19

Add flags for controlling inflight weight updates

7319d36

Signed-off-by: Parth Chadha <pchadha@nvidia.com>

Merge remote-tracking branch 'origin/main' into pchadha/inflight-update

92772ab

parthchadha requested review from a team as code owners October 16, 2025 19:44

parthchadha added the CI:L1 Run doctests, unit tests, and functional tests label Oct 16, 2025

parthchadha temporarily deployed to nemo-ci October 16, 2025 19:49 — with GitHub Actions Inactive

parthchadha changed the title ~~feat: add capability to update weights inflight (stall, update and continue)~~ feat: add capability to update weights inflight during generation Oct 16, 2025

coderabbitai bot reviewed Oct 16, 2025

View reviewed changes

parthchadha temporarily deployed to nemo-ci October 16, 2025 20:07 — with GitHub Actions Inactive

terrykong reviewed Oct 17, 2025

View reviewed changes

euronymous-aithal added the r0.4.0 label Oct 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add capability to update weights inflight during generation #1381

feat: add capability to update weights inflight during generation #1381

Uh oh!

parthchadha commented Oct 16, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Oct 16, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

terrykong left a comment •

edited

Loading

Uh oh!

youngeunkwon0405 commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: add capability to update weights inflight during generation #1381

Are you sure you want to change the base?

feat: add capability to update weights inflight during generation #1381

Uh oh!

Conversation

parthchadha commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

terrykong left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

youngeunkwon0405 commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

parthchadha commented Oct 16, 2025 •

edited

Loading

coderabbitai bot commented Oct 16, 2025 •

edited

Loading

terrykong left a comment •

edited

Loading