Verify the usefulness of the GPU Utilization metric compared to SM Efficiency

This [article](https://trainy.ai/blog/gpu-utilization-misleading) lays out how GPU Utilization is actually measured and shows that it is possible for the utilization to be very high without that being true in the most basic sense. For example, the author shares that in some of their initial testing, their models were were reaching "100% utilization" while only hitting 20% of the maximum theoretical Model FLOPS (Floating Point Operations per Second).

The article recommends looking at a metric called SM Efficiency (SM for streaming multiprocessor, also called SM Activity) that reports the % of SMs are active. Seeing a discrepancy between these metrics can be an indicator that there is some less visible bottleneck that can be helped by the usage of "fused kernels." Using Flash Attention or SDPA is one example of doing this, but there are also similar implementations for other types of layers readily available according to the article. I didn't look into these alternatives too much, so it's possible that we're already using more than one of them for their general benefits. 

If nothing else, it may be useful to add SM efficiency to our standard set of metrics logged on ClearML. The metric is available in the [NVIDIA Data Center GPU Manager (DCGM)](https://docs.nvidia.com/datacenter/dcgm/latest/user-guide/feature-overview.html#profiling-metrics), and it is also available on-demand through `nvidia-smi dmon`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Verify the usefulness of the GPU Utilization metric compared to SM Efficiency #505

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Verify the usefulness of the GPU Utilization metric compared to SM Efficiency #505

Description

Activity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions