You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wonder if it's simpler to have all GPU nodes running the same driver, with that driver # presented somewhere (especially if it isn't the most recent) so people could compile their containers accordingly?
Currently we have V100 and A100 GPU but the two types have different drivers:
A100 report
NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7
V100 report
NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4
This causes trouble as a container that runs on one node might not run on the other. Having the images on the hardware be the same would be helpful.
The text was updated successfully, but these errors were encountered: