Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify the usefulness of the GPU Utilization metric compared to SM Efficiency #505

Open
isaac091 opened this issue Sep 4, 2024 · 0 comments
Labels
optimization Model training/inferencing optimization

Comments

@isaac091
Copy link
Collaborator

isaac091 commented Sep 4, 2024

This article lays out how GPU Utilization is actually measured and shows that it is possible for the utilization to be very high without that being true in the most basic sense. For example, the author shares that in some of their initial testing, their models were were reaching "100% utilization" while only hitting 20% of the maximum theoretical Model FLOPS (Floating Point Operations per Second).

The article recommends looking at a metric called SM Efficiency (SM for streaming multiprocessor, also called SM Activity) that reports the % of SMs are active. Seeing a discrepancy between these metrics can be an indicator that there is some less visible bottleneck that can be helped by the usage of "fused kernels." Using Flash Attention or SDPA is one example of doing this, but there are also similar implementations for other types of layers readily available according to the article. I didn't look into these alternatives too much, so it's possible that we're already using more than one of them for their general benefits.

If nothing else, it may be useful to add SM efficiency to our standard set of metrics logged on ClearML. The metric is available in the NVIDIA Data Center GPU Manager (DCGM), and it is also available on-demand through nvidia-smi dmon.

@isaac091 isaac091 added the optimization Model training/inferencing optimization label Sep 4, 2024
@ddaspit ddaspit moved this from 🆕 New to 📋 Backlog in SIL-NLP Research Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization Model training/inferencing optimization
Projects
Status: 📋 Backlog
Development

No branches or pull requests

1 participant