Low GPU utilization #20292
Replies: 2 comments
-
|
I'm experiencing almost same issue. After completing a 50,000-step training run with two A40 GPUs, I observed very high and severe fluctuations. The only difference is that my fluctuated GPU utilization peaked at 100%. I'm suspecting this is due to a mismatch in data throughput between the CPU pre-processing and GPU processing. Have you found any effective solutions? Thanks in advance. |
Beta Was this translation helpful? Give feedback.
-
|
Try increasing In general, look at CPU and I/O utilization on the system to try to find out where the bottleneck is. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
This is my first time trying out PyTorch Lightning. On a node with 8 H100s, I'm running the code from this example: https://lightning.ai/lightning-ai/studios/pretrain-an-llm-with-pytorch-lightning?tab=overview
All three versions, small medium large, get extremely low GPU utilization, nvidia-smi fluctuating but typically around 50%. Is the example code not correct? What changes need to be made?
Beta Was this translation helpful? Give feedback.
All reactions