fix: add H200 TFLOPS #1543

clumsy · 2025-11-19T18:21:44Z

What does this PR do ?

Adds theoretical H200 TFLOPS as per https://www.nvidia.com/en-us/data-center/h200

This addresses the following issue:

...nemo_rl/models/policy/lm_policy.py:556: UserWarning: Error getting theoretical flops: Unknown device name: NVIDIA H200 and dtype name: torch.bfloat16

Issues

List issues that this PR closes (syntax): N/A

Usage

N/A

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you run the unit tests and functional tests locally? Visit our Testing Guide for how to run tests
Did you add or update any necessary documentation? Visit our Document Development Guide for how to write, build and test the docs.

Additional Information

https://www.nvidia.com/en-us/data-center/h200

Summary by CodeRabbit

Release Notes

New Features
- Added support for NVIDIA H200 accelerator with performance tracking capabilities across multiple data types.
Tests
- Introduced comprehensive unit tests for performance metric validation to ensure accuracy across supported accelerators.

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com>

clumsy · 2025-11-19T18:22:01Z

Please check this small fix, @terrykong @yuki-97

coderabbitai · 2025-11-19T18:23:27Z

📝 Walkthrough

Walkthrough

Added NVIDIA H200 accelerator entries to the TFLOPS mapping in flops_tracker.py with corresponding theoretical TFLOPS values for bfloat16 and float32 data types. Introduced a new unit test file to validate the theoretical TFLOPS calculations across multiple device and data type configurations.

Changes

Cohort / File(s)	Change Summary
TFLOPS Mapping Updates `nemo_rl/utils/flops_tracker.py`	Added two entries to THEORETICAL_TFLOPS dictionary for NVIDIA H200: bfloat16 with value 1979 / 2 and float32 with conditional value based on TF32 usage (989 / 2 if enabled, else 67.0)
Unit Tests `tests/unit/utils/test_flops_tracker.py`	New test file with parameterized test `test_theoretical_tflops` validating theoretical TFLOPS calculations across multiple NVIDIA devices and data types (bfloat16, float32) with tolerance-based assertions

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Straightforward constant additions to existing mapping
Simple parameterized test with consistent structure
No complex logic or control flow changes
Configuration and test data primarily

Possibly related PRs

feat: Update Theoretical TFLOPS #1236: Adds B200/B300/GB200/GB300 entries to the same THEORETICAL_TFLOPS mapping in flops_tracker.py
fix: report the correct number of workers during FLOPs calculation #1034: Modifies flops_tracker.py with imports and FLOPS-related logic changes

Suggested labels

CI:L1

Suggested reviewers

terrykong
guyueh1

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: add H200 TFLOPS' directly and clearly summarizes the main change: adding NVIDIA H200 TFLOPS values to the theoretical TFLOPS mapping.
Test Results For Major Changes	✅ Passed	Changes add NVIDIA H200 hardware support constants with unit test coverage, fixing runtime warnings without introducing new features or breaking existing functionality.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/unit/utils/test_flops_tracker.py (1)
8-26: Comprehensive test coverage for all device configurations.

The parameterized test thoroughly covers all supported devices including the newly added H200. The test cases correctly mirror the expected values from the source dictionary.

Consider adding a test case to verify that get_theoretical_tflops raises a ValueError for unknown devices:
def test_theoretical_tflops_unknown_device():
    with pytest.raises(ValueError, match="Unknown device name"):
        get_theoretical_tflops("NVIDIA Unknown", torch.bfloat16)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55dc433 and 9c97222.

📒 Files selected for processing (2)

nemo_rl/utils/flops_tracker.py (1 hunks)
tests/unit/utils/test_flops_tracker.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

tests/unit/utils/test_flops_tracker.py (1)

nemo_rl/utils/flops_tracker.py (2)

get_theoretical_tflops (131-138)

is_using_tf32 (105-110)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Post submodule check comment / Comment on PR
GitHub Check: Post automodel integration comment / Comment on PR

🔇 Additional comments (3)

tests/unit/utils/test_flops_tracker.py (2)

1-5: LGTM!

Imports are clean and include all necessary dependencies for the test.

27-28: LGTM!

The test correctly uses pytest.approx for floating-point comparison, which handles potential precision differences gracefully.

nemo_rl/utils/flops_tracker.py (1)

118-119: No issues found—H200 TFLOPS values are accurate.

The verification confirms the H200 entries are correct:

BFLOAT16 base value (1979) matches NVIDIA H200 SXM specification

FP32 scalar fallback (67.0) matches NVIDIA H200 SXM specification exactly

The division by 2 pattern is a deliberate, consistent convention applied uniformly across all 8 GPU entries in this tracker (A100, H100, H200, B200, B300, GB200, GB300), not a H200-specific issue

H200 correctly mirrors H100 since they share compute architecture

terrykong · 2025-11-20T00:25:46Z

@guyueh1 to review

terrykong · 2025-11-20T00:26:01Z

@clumsy can you add a copyright on the test module?

clumsy · 2025-11-20T02:27:11Z

Done, @terrykong

fix: add H200 TFLOPS

958577a

Signed-off-by: Alexander Zhipa <azzhipa@amazon.com>

clumsy requested review from a team as code owners November 19, 2025 18:21

github-actions bot added the community-request label Nov 19, 2025

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

terrykong requested a review from guyueh1 November 20, 2025 00:24

terrykong added the CI:L0 Run doctests and unit tests label Nov 20, 2025

terrykong temporarily deployed to nemo-ci November 20, 2025 00:25 — with GitHub Actions Inactive

terrykong mentioned this pull request Nov 20, 2025

fix: add theoretical TFlops for H200 GPU #1422

Closed

4 tasks

terrykong temporarily deployed to nemo-ci November 20, 2025 00:48 — with GitHub Actions Inactive

clumsy force-pushed the fix/h200_tflops branch from f16ca0f to 958577a Compare November 20, 2025 02:26

Merge branch 'main' into fix/h200_tflops

6aead1e

terrykong added CI:L0 Run doctests and unit tests and removed CI:L0 Run doctests and unit tests labels Nov 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add H200 TFLOPS #1543

fix: add H200 TFLOPS #1543

clumsy commented Nov 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

clumsy commented Nov 19, 2025

Uh oh!

coderabbitai bot commented Nov 19, 2025 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

terrykong commented Nov 20, 2025

Uh oh!

terrykong commented Nov 20, 2025

Uh oh!

clumsy commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: add H200 TFLOPS #1543

Are you sure you want to change the base?

fix: add H200 TFLOPS #1543

Conversation

clumsy commented Nov 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Issues

Usage

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Release Notes

Uh oh!

clumsy commented Nov 19, 2025

Uh oh!

coderabbitai bot commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

terrykong commented Nov 20, 2025

Uh oh!

terrykong commented Nov 20, 2025

Uh oh!

clumsy commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

clumsy commented Nov 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 19, 2025 •

edited

Loading