Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #200 - Extract more GPU information for sysinfo #203

Merged
merged 1 commit into from
Nov 19, 2024

Conversation

lars-t-hansen
Copy link
Collaborator

No description provided.

@lars-t-hansen lars-t-hansen changed the title #200 - Extract more GPU information for sysinfo Fix #200 - Extract more GPU information for sysinfo Oct 29, 2024
@lars-t-hansen
Copy link
Collaborator Author

This works as it stands and should be sensibly robust. The main open question is whether all this information is useful for sysinfo, or some for sysinfo and some for ps. (For example, fan speed and current clock speeds and memory usage are probably not very useful for sysinfo.) The other issue that I can't address is AMD - our only AMD machine is no longer working (card drivers too old and incompatible with current OS, and not fixable). So this is nvidia-only.

@lars-t-hansen
Copy link
Collaborator Author

The other issue is that nvidia-smi -q is very expensive (takes about 1s to run on an idle workstation with four cards), while extracting just the data we need through a more precise query seems to be much cheaper. This doesn't matter for sonar sysinfo but it will be a concern for sonar ps.

@lars-t-hansen lars-t-hansen force-pushed the w-200-more-sysinfo-data branch from 77cc1b7 to fcfc724 Compare November 13, 2024 14:03
@lars-t-hansen lars-t-hansen marked this pull request as ready for review November 13, 2024 14:06
@lars-t-hansen lars-t-hansen requested a review from bast November 13, 2024 14:06
@lars-t-hansen
Copy link
Collaborator Author

I'll add AMD information too, but I may need more time for that and will file a followup bug - our AMD system has been offline for a while, back just yesterday. NVIDIA is the high bit anyway, for UiO/NRIS.

@lars-t-hansen lars-t-hansen marked this pull request as draft November 14, 2024 07:32
@lars-t-hansen
Copy link
Collaborator Author

Going to tweak this a little more.

@lars-t-hansen lars-t-hansen force-pushed the w-200-more-sysinfo-data branch from f896127 to 9d623ce Compare November 14, 2024 12:11
@lars-t-hansen lars-t-hansen marked this pull request as ready for review November 14, 2024 12:11
@bast bast merged commit fe5be85 into NordicHPC:main Nov 19, 2024
1 check passed
@bast
Copy link
Member

bast commented Nov 19, 2024

Thank you!

@lars-t-hansen lars-t-hansen deleted the w-200-more-sysinfo-data branch January 14, 2025 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants