-
Notifications
You must be signed in to change notification settings - Fork 171
fix: add H200 TFLOPS #1543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: add H200 TFLOPS #1543
Conversation
Signed-off-by: Alexander Zhipa <azzhipa@amazon.com>
|
Please check this small fix, @terrykong @yuki-97 |
📝 WalkthroughWalkthroughAdded NVIDIA H200 accelerator entries to the TFLOPS mapping in Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/unit/utils/test_flops_tracker.py (1)
8-26: Comprehensive test coverage for all device configurations.The parameterized test thoroughly covers all supported devices including the newly added H200. The test cases correctly mirror the expected values from the source dictionary.
Consider adding a test case to verify that
get_theoretical_tflopsraises aValueErrorfor unknown devices:def test_theoretical_tflops_unknown_device(): with pytest.raises(ValueError, match="Unknown device name"): get_theoretical_tflops("NVIDIA Unknown", torch.bfloat16)
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
nemo_rl/utils/flops_tracker.py(1 hunks)tests/unit/utils/test_flops_tracker.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
tests/unit/utils/test_flops_tracker.py (1)
nemo_rl/utils/flops_tracker.py (2)
get_theoretical_tflops(131-138)is_using_tf32(105-110)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Post submodule check comment / Comment on PR
- GitHub Check: Post automodel integration comment / Comment on PR
🔇 Additional comments (3)
tests/unit/utils/test_flops_tracker.py (2)
1-5: LGTM!Imports are clean and include all necessary dependencies for the test.
27-28: LGTM!The test correctly uses
pytest.approxfor floating-point comparison, which handles potential precision differences gracefully.nemo_rl/utils/flops_tracker.py (1)
118-119: No issues found—H200 TFLOPS values are accurate.The verification confirms the H200 entries are correct:
- BFLOAT16 base value (1979) matches NVIDIA H200 SXM specification
- FP32 scalar fallback (67.0) matches NVIDIA H200 SXM specification exactly
- The division by 2 pattern is a deliberate, consistent convention applied uniformly across all 8 GPU entries in this tracker (A100, H100, H200, B200, B300, GB200, GB300), not a H200-specific issue
- H200 correctly mirrors H100 since they share compute architecture
|
@guyueh1 to review |
|
@clumsy can you add a copyright on the test module? |
f16ca0f to
958577a
Compare
|
Done, @terrykong |
What does this PR do ?
Adds theoretical H200 TFLOPS as per https://www.nvidia.com/en-us/data-center/h200
This addresses the following issue:
Issues
List issues that this PR closes (syntax): N/A
Usage
N/A
Before your PR is "Ready for review"
Pre checks:
Additional Information
https://www.nvidia.com/en-us/data-center/h200
Summary by CodeRabbit
Release Notes
New Features
Tests