Skip to content

Conversation

@tpatki
Copy link
Member

@tpatki tpatki commented Jun 26, 2024

Description

Extend the new energy API from v0.8 to include GPU energy on NVIDIA and AMD GPUs.

6/26: This is WIP and won't compile/work just yet.

To Do 6/27:

  • We're reporting deltas in the energy reporting API (CPU only) and the first call to GPU energy needs to be 0; this PR needs to be edited to do this correctly. Getting raw value from vendor API, need to store this value as the offset, and a do a diff from the offset for all subsequent samples.

To Do 4/9/25:

  • Deltas have been fixed in print_energy API on NVIDIA
  • Update deltas for JSON API for NVIDIA
  • Clean out the commented code from variorum.c
  • Add AMD Rocm support with rsmi_dev_energy_count_get() by mimicking what we did for NVIDIA print_energy and json_energy --> target Tioga first (Mi250x)

Fixes #532.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature/architecture support (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Build/CI update

How Has This Been Tested?

  • Lassen: GPU-only
  • Lassen: CPU-only
  • Lassen: CPU+GPU
  • Corona: GPU-only
  • Tioga: GPU-only
  • Tioga: CPU+GPU (blocked due to the HSMP module issue, will be tested later.)

Checklist:

  • I have run ./scripts/check-code-format.sh and confirm my code code follows the style guidelines of variorum
  • I have added comments in my code
  • My changes generate no new warnings (build with -DENABLE_WARNINGS=ON)
  • New and existing unit tests pass with my changes

Thank you for taking the time to contribute to Variorum!

@tpatki tpatki marked this pull request as draft June 26, 2024 17:57
@tpatki tpatki force-pushed the add_gpu_energy_API branch from 67c13fa to 34b1cd7 Compare June 27, 2024 22:26
@masterleinad masterleinad mentioned this pull request Jun 28, 2024
9 tasks
@tpatki tpatki changed the title Add GPU Energy APIs Add GPU Energy APIs and support for NVIDIA and AMD GPUs Nov 13, 2024
@tpatki tpatki force-pushed the add_gpu_energy_API branch from e240417 to 025a0da Compare April 4, 2025 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for print and JSON APIs for GPU energy values

2 participants