NVIDIA RTX3090 GPU Lockout C2 Process #5960
-
Describe the problem Resolution: Tried to kill lotus-worker process, would not kill with signal 9SIGKILL/15 SIGTERM. Unable to quit process, required reboot, reboot hung for ~5 minutes while trying to quit process. After reboot, cleared worker folders and updated to lotus-worker v1.5.3. GPU was then available in nvidia-smi/nvtop. Worker accepted C2 jobs normally for ~30 minutes, then received same error, "GPU is lost", requiring another reboot. Updated to NVIDIA driver 460.67. Worker is now running, but worker log shows, "Failed to open /dev/dri/renderD128: Permission denied", although appears to be accepting GPU workload. No failures after driver update so far, but "Permission denied" message is troubling. Version Setup Miner/Daemon v1.5.3 on separate machine. Separate C2 worker machine using NVIDIA RTX3090. This was not a memory issue, plenty of available memory. Lotus daemon and miner logs "Unable to determine the device handle for GPU 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU" ** Code modifications ** No code modifications. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Check your /var/log/syslog, and you will found your GPU has fallen from the BUS. add pcie_aspm=off into grub parameters and reboot. |
Beta Was this translation helpful? Give feedback.
Check your /var/log/syslog, and you will found your GPU has fallen from the BUS.
the problem reason is Power Supply not enough, 3090 Needs 350w.
I use dual 3090, same problem.
I try a solution, It still works now, more than 24 hours.
add pcie_aspm=off into grub parameters and reboot.
Hope you lucky.