Skip to content

Conversation

@clumsy
Copy link
Contributor

@clumsy clumsy commented Nov 19, 2025

What does this PR do ?

Adds theoretical H200 TFLOPS as per https://www.nvidia.com/en-us/data-center/h200

This addresses the following issue:

...nemo_rl/models/policy/lm_policy.py:556: UserWarning: Error getting theoretical flops: Unknown device name: NVIDIA H200 and dtype name: torch.bfloat16

Issues

List issues that this PR closes (syntax): N/A

Usage

N/A

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
  • Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

https://www.nvidia.com/en-us/data-center/h200

Summary by CodeRabbit

Release Notes

  • New Features

    • Added support for NVIDIA H200 accelerator with performance tracking capabilities across multiple data types.
  • Tests

    • Introduced comprehensive unit tests for performance metric validation to ensure accuracy across supported accelerators.

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com>
@clumsy clumsy requested review from a team as code owners November 19, 2025 18:21
@clumsy
Copy link
Contributor Author

clumsy commented Nov 19, 2025

Please check this small fix, @terrykong @yuki-97

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 19, 2025

📝 Walkthrough

Walkthrough

Added NVIDIA H200 accelerator entries to the TFLOPS mapping in flops_tracker.py with corresponding theoretical TFLOPS values for bfloat16 and float32 data types. Introduced a new unit test file to validate the theoretical TFLOPS calculations across multiple device and data type configurations.

Changes

Cohort / File(s) Change Summary
TFLOPS Mapping Updates
nemo_rl/utils/flops_tracker.py
Added two entries to THEORETICAL_TFLOPS dictionary for NVIDIA H200: bfloat16 with value 1979 / 2 and float32 with conditional value based on TF32 usage (989 / 2 if enabled, else 67.0)
Unit Tests
tests/unit/utils/test_flops_tracker.py
New test file with parameterized test test_theoretical_tflops validating theoretical TFLOPS calculations across multiple NVIDIA devices and data types (bfloat16, float32) with tolerance-based assertions

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

  • Straightforward constant additions to existing mapping
  • Simple parameterized test with consistent structure
  • No complex logic or control flow changes
  • Configuration and test data primarily

Possibly related PRs

Suggested labels

CI:L1

Suggested reviewers

  • terrykong
  • guyueh1

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: add H200 TFLOPS' directly and clearly summarizes the main change: adding NVIDIA H200 TFLOPS values to the theoretical TFLOPS mapping.
Test Results For Major Changes ✅ Passed Changes add NVIDIA H200 hardware support constants with unit test coverage, fixing runtime warnings without introducing new features or breaking existing functionality.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/unit/utils/test_flops_tracker.py (1)

8-26: Comprehensive test coverage for all device configurations.

The parameterized test thoroughly covers all supported devices including the newly added H200. The test cases correctly mirror the expected values from the source dictionary.

Consider adding a test case to verify that get_theoretical_tflops raises a ValueError for unknown devices:

def test_theoretical_tflops_unknown_device():
    with pytest.raises(ValueError, match="Unknown device name"):
        get_theoretical_tflops("NVIDIA Unknown", torch.bfloat16)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55dc433 and 9c97222.

📒 Files selected for processing (2)
  • nemo_rl/utils/flops_tracker.py (1 hunks)
  • tests/unit/utils/test_flops_tracker.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/utils/test_flops_tracker.py (1)
nemo_rl/utils/flops_tracker.py (2)
  • get_theoretical_tflops (131-138)
  • is_using_tf32 (105-110)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Post submodule check comment / Comment on PR
  • GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (3)
tests/unit/utils/test_flops_tracker.py (2)

1-5: LGTM!

Imports are clean and include all necessary dependencies for the test.


27-28: LGTM!

The test correctly uses pytest.approx for floating-point comparison, which handles potential precision differences gracefully.

nemo_rl/utils/flops_tracker.py (1)

118-119: No issues found—H200 TFLOPS values are accurate.

The verification confirms the H200 entries are correct:

  • BFLOAT16 base value (1979) matches NVIDIA H200 SXM specification
  • FP32 scalar fallback (67.0) matches NVIDIA H200 SXM specification exactly
  • The division by 2 pattern is a deliberate, consistent convention applied uniformly across all 8 GPU entries in this tracker (A100, H100, H200, B200, B300, GB200, GB300), not a H200-specific issue
  • H200 correctly mirrors H100 since they share compute architecture

@terrykong terrykong requested a review from guyueh1 November 20, 2025 00:24
@terrykong terrykong added the CI:L0 Run doctests and unit tests label Nov 20, 2025
@terrykong
Copy link
Contributor

@guyueh1 to review

@terrykong
Copy link
Contributor

@clumsy can you add a copyright on the test module?

@clumsy
Copy link
Contributor Author

clumsy commented Nov 20, 2025

Done, @terrykong

@terrykong terrykong added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI:L0 Run doctests and unit tests community-request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants