this post was submitted on 07 Jul 2023
13 points (93.3% liked)

Linux

48074 readers
792 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

Hello everyone,

Today the nvidia driver on my server stopped working out of nowhere. Yesterday it was working and today it's not. I didn't do anything in yesterday or today.

Today my Plex container stopped working because there was a problem with the nvidia card I was using for transcoding. It's a GTX 1650.

I tried running nvidia-smi and it said Failed to initialize NVML: Driver/library version mismatch. After I tried upgrading my system because it was a months ago I upgraded, maybe it will help. It didn't. I tried some rebooting because some sources said it solves the issue but it persisted.

It's driver reinstall time. Purged the driver with apt purge nvidia* then installed driver with ubuntu-drivers install --gpgpu nvidia:525-server. After reboot nvidia-smi gives the error NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running..

lsmod | grep nvidia shows nothing and /proc/driver/nvidia/version doesn't exists. I tried starting nvidia-persistenced with systemctl but it gives this error:

Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 113 has read and write permissions for those files.

/dev/nvidia* doesn't exist.

I'm very noobish when it comes to nvidia and linux it was a pain to set it up initially and I was hoping that it wouldn't go wrong someday. But here I am unfortunatelly. I don't really know what logs should I show you or what commands should I run to troubleshoot so every tip is appreciated and I will provide logs and things like that if needed.

System info:

  • Ubuntu Server 22.04
  • kernel: 5.15.0-76-generic
  • theoretically installed nvidia driver: nvidia-driver-525-server

Solution

I was using the ubuntu-drivers utility to install the driver but turns out it's not that great. After installing with the manual method from https://help.ubuntu.com/community/NvidiaDriversInstallation using the command apt install linux-modules-nvidia-${DRIVER_BRANCH}${SERVER}-${LINUX_FLAVOUR} it's working again.

you are viewing a single comment's thread
view the rest of the comments
[–] wmassingham 1 points 1 year ago (3 children)

Does it even show up in lspci? Eliminate your OS, boot it in a live system and see if it's recognized there. A quick thing to check would be that your GPU is actually powered on (fully seated in the PCIe slot and has the necessary power).

[–] Koma52 1 points 1 year ago (1 children)

Shows up in lspci. Booting a live OS would be a little bit tricky because it's in a wall mounted rack but I will try that if nothing else works. Thank you.

[–] wmassingham 3 points 1 year ago (1 children)

So it sees the hardware, but the kernel module isn't being loaded. I'd guess if you tried to load it with modprobe, it would complain about some version mismatch.

So, I'd do the uninstall and reinstall processes on this page: https://help.ubuntu.com/community/NvidiaDriversInstallation

[–] Koma52 1 points 1 year ago* (last edited 1 year ago)

I was using the ubuntu-drivers utility that this page mentions too but it turns out it isn't working very much. Now I installed with the manual method from this page using apt install linux-modules-nvidia-${DRIVER_BRANCH}${SERVER}-${LINUX_FLAVOUR} and it's working. Thank you for the suggestion!

load more comments (1 replies)