-
Notifications
You must be signed in to change notification settings - Fork 812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WSLg/Cuda suddenly broken due to nvidia-smi unable to find GPU #9099
Comments
Taking some shots in the dark here (mainly because I'm really motivated to fix this 😅) Looking at
These messages come immediately after some BAR assignment operations and a log warning about libcuda not being a symlink. Something else I noticed is that
It looks like my issue might be related (but not same failure mode?) as #8937 possibly? |
The error messages are most likely benign. How was the nvidia-smi utility installed? I installed it using "sudo apt install nvidia-utils-520" and it works for me just fine with the same host driver version. |
@devttebayo it is not sound practice to add any |
@elsaco Thanks for explaining that, I should have known that it wouldn't be wise to add utils intended for native hw to a virtualized guest. Being honest, I can't remember at which point I installed them (or if it was a side effect of careless debug copy+paste...) More embarrassing, it appears I somehow I lost the repro? I'm not exactly sure how though, seeing as I rebooted both WSL and my host PC a few times prior to opening this issue. Ah well, thanks for the helpful pointers! I think I'm going to go ahead and close this out for now. Sorry for the noise! |
Reopening this because it looks like I hit a repro again. Currently in a state where WSL is unable to detect my GPU and running a Going to hope my PC doesn't reboot and lose the repro in case someone has ideas of next steps I could take to investigate. |
I am encountering this issue as well. I start WSL via the Task Scheduler on login, and |
I have the same issue as described in #9134, so you are not alone. I haven't installed any external nvidia libraries in WSL either and I can run nvidia-smi.exe in WSL successfully but running the nvidia-smi located in Manually shutting down and restarting doesn't seem to yield any results either. |
So, I was able to get this to work on my end (possibly temporarily) after trying a few things. I'm not sure what exactly got it to work but I did the following.
I think the last step is what got it to work, let me know if you can reproduce it. |
Nevermind... Initially, after reinstalling the graphics driver and rebooting, there was no issue. After rebooting again, however, the issue reappears. |
Just updated to the lates 526.86 driver (released today) and ran the Was able to verify |
So this is a strange development... I updated to WSL 0.70.8 and I'm now in a strange state where nvidia-smi works in some WSL windows but not others? What I mean is:
What's super strange to me is I can have the two terminals open side by side and run nvidia-smi repeatedly with the same results in each terminal. I guess this is a workaround for me, but I have no idea why it works? |
I meet same problem since Sep. and I can run cuda in docker in wsl2, but not in kali-linux
|
@cq01 Just tried this to verify and that's 100% the difference in my setup above - my Windows Terminal always launches as Admin (and nvidia-smi fails 100% of the time) Re-launching without Admin rights gets nvidia-smi working. At least I have a workaround I understand now :) |
i have same problem i am rookie i dont know why environment: WSL 版本: 0.70.4.0 wang@wang:~$ glxinfo -B OpenGL version string: 4.5 (Compatibility Profile) Mesa 22.2.3 - kisak-mesa PPA OpenGL ES profile version string: OpenGL ES 3.2 Mesa 22.2.3 - kisak-mesa PPA |
谢谢兄弟 这玩意搞了我一个晚上 就离谱 头痛 |
In my cases, the nvidia-smi only worked when exec from Windows Terminal as Admin. |
This is the same behavior I am observing as well. In addition, when running |
Just tried reinstalled 522.06, and CUDA 11.8, then did all the shutdown and terminate, still produce
|
Just trying to link all the relevant threads here: canonical/microk8s#3024 This cannot be a coincident. |
nvidia-smi needs to be from the Windows driver package. It is mapped to /usr/lib/wsl/lib/nvidia-smi, There is an issue when nvidia-smi and other Cuda applications are running from a WSL window, started as Administrator or not. |
I have nvidia-smi installed only on Windows side, and had WSL installed with Administrator, that said, the user is literally "Administrator" and is an "Administrator" level account. I've noticed a couple of similarities in these issues:
|
My situation is the same as yours, it's amazing! |
Thanks, Im using the Windows Terminal Preview, turn the admin mode off and restart the terminal, yeal it's ok |
@CharlesSL may I know your version of Windows? |
@fzhan win11 insider build 25267 |
same problem I think this is an regression bug. |
@CharlesSL cool thanks, I have issue with the latest Win 11, fresh installed not upgraded. |
In my case
But pytorch cannot allocate GPU Memory:
and exactly in this moment these lines appears in
|
The wslg problems started right after upgrading to Store version of wsl. Run under administrator terminal: Failed to properly shut down NVML: Driver Not Loaded $ glxinfo -B OpenGL version string: 4.5 (Compatibility Profile) Mesa 22.0.5 OpenGL ES profile version string: OpenGL ES 3.2 Mesa 22.0.5 Running under Non Admin user, nvidia-smi runs but segfaults at end, and applications trying to use gpu after have various error and exit problems. Output from Standard Terminal: $ nvidia-smi +-----------------------------------------------------------------------------+ OpenGL version string: 3.3 (Compatibility Profile) Mesa 22.0.5 OpenGL ES profile version string: OpenGL ES 3.1 Mesa 22.0.5 Segmentation fault System Version Info: All worked before install wsl store version. |
similar problem also existed in snapd |
Thx, bro! It works for me:) |
Make sure you set the distro to be the default in WSL.
this worked for me. Needless to say make sure nvidia-smi works fine from within Windows before trying any of the above |
Same for me once, until I closed both windows and restarted them, surprised to find nvidia-smi only works as non-admin now... By the way, I believed the problem is associated with wslg, since my nvdia-smi is always functioning well till I modified .wslgconf. |
this may helps: |
This issue should be fixed in WSL version [1.1.3]. https://github.com/microsoft/WSL/releases/tag/1.1.3 |
Thank you ! My attempt: |
During a program compilation, I need to link with a system library, so I set the environment variable like this: I found that after this setting, nvidia-smi will not longer work and generate an error message similar to the one in this discussion. However, after removing this environment variable settings. Things can go back to normal gain. A pretty irritating issue and I spent quite some time debugging it. |
With everything updated, nvidia-smi still reports:"Failed to initialize NVML: Unknown Error" |
Based on the error message it looks like a separate issue. |
Using Cuda 12.2 version with WSL, i was receiving the same issue not having it work on wsl but command prompt instead. i just, after clicking on another page came across this solution which worked: Run the command wsl --shutdown to stop all running WSL instances. |
What works for me...
Add NB! The This should fix the problem. My environment:
and my .bashrc related to this:
Test on TensorFlow 2.13.0:
Do not mind NUMA warnings. WSL does not support NUMA at this time. Enjoy your stable environment! At least, for the time being. :) |
Found this thing on wsl installation page. On section 4.1 it mentions that:
|
Thanks, this works for me on WSL2 when having problem with |
Thank you @gserbanut! After countless hours of trying to fix this (including installing/uninstalling cuda toolkit multiple times), your simple suggestions fixed my issue. Now both |
Hi! So here’s what I did to fix the issue. First, I installed the Nvidia driver 537.58. After giving my system a quick reboot, that pesky “Segmentation fault” disappeared. Just to be on the safe side, I then updated to the latest driver and guess what? No more problems! Hope this helps! 😊 |
Version
10.0.22000.1098
WSL Version
WSL 1Kernel Version
5.15.68.1
Distro Version
Ubuntu 22.04
Other Software
WSL version: 0.70.5.0
WSLg version: 1.0.45
Direct3D version: 1.606.4
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Nvidia Driver: 526.47, Game Ready Driver, released 10/27/2022
Repro Steps
nvidia-smi
Expected Behavior
The nvidia-smi utility dumps diagnostic details about the GPU.
nvidia-smi.exe on Windows is able to display the expected output:
Actual Behavior
nvidia-smi on wsl/ubuntu 22.04 outputs a generic error instead:
Diagnostic Logs
I'll admit I'm kinda dumb when it comes to doing the linux diagnostics, which is part of what brought me here. Here's what I've been able to gather from various Googlings and such though:
dpkg -l | grep nvidia
lsmod | grep nvidia
No output
(Truncated) DxDiag Output
The text was updated successfully, but these errors were encountered: