Skip to content

Conversation

@luccabb
Copy link
Member

@luccabb luccabb commented Feb 4, 2026

Summary:
DCGM 4.x changed the JSON output format for the dcgmi diag command

3.x: "DCGM GPU Diagnostic"
4.x: "DCGM Diagnostic"

  • Updated process_dcgmi_diag_output() to check for both "DCGM GPU Diagnostic" (3.x) and "DCGM Diagnostic" (4.x) keys
  • Added diag_pass_output_v4 test case for DCGM 4.x format
  • New _get_test_status() helper function - Handles different status locations:
    • DCGM 4.x: Uses test["test_summary"]["status"] (aggregated per-test summary)
    • DCGM 3.x: Uses test["results"][0]["status"] (backward compatible)

See the breaking changes in the release notes:
https://docs.nvidia.com/datacenter/dcgm/latest/release-notes/changelog.html#deprecations-and-breaking-changes

Reviewed By: levydori

Differential Revision: D92219750

Summary:
DCGM 4.x changed the JSON output format for the dcgmi diag command

3.x: "DCGM GPU Diagnostic"
4.x: "DCGM Diagnostic"

- Updated process_dcgmi_diag_output() to check for both "DCGM GPU Diagnostic" (3.x) and "DCGM Diagnostic" (4.x) keys
- Added diag_pass_output_v4 test case for DCGM 4.x format
- New _get_test_status() helper function - Handles different status locations:
    - DCGM 4.x: Uses test["test_summary"]["status"] (aggregated per-test summary)
    - DCGM 3.x: Uses test["results"][0]["status"] (backward compatible)

See the breaking changes in the release notes:
https://docs.nvidia.com/datacenter/dcgm/latest/release-notes/changelog.html#deprecations-and-breaking-changes

Reviewed By: levydori

Differential Revision: D92219750
@meta-codesync
Copy link

meta-codesync bot commented Feb 4, 2026

@luccabb has exported this pull request. If you are a Meta employee, you can view the originating Diff in D92219750.

@meta-codesync meta-codesync bot merged commit f79d541 into facebookresearch:main Feb 4, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants