How to unload kernel module 'nvidia-drm'?

147,948

Solution 1

I imagine you want to stop the display manager which is what I'd suspect would be using the Nvidia drivers.

After change to a text console (pressing Ctrl+Alt+F2) and logging in as root, use the following command to disable the graphical target, which is what keeps the display manager running:

# systemctl isolate multi-user.target

At this point, I'd expect you'd be able to unload the Nvidia drivers using modprobe -r (or rmmod directly):

# modprobe -r nvidia-drm

Once you've managed to replace/upgrade it and you're ready to start the graphical environment again, you can use this command:

# systemctl start graphical.target

Solution 2

CUDA Installation

1) Download the latest CUDA Toolkit

2) Switch to tty3 by pressing Ctl+Alt+F3

3) Unload nvidia-drm before proceeding.

3a) Isolate multi-user.target

sudo systemctl isolate multi-user.target

3b) Note that nvidia-drm is currently in use.

lsmod | grep nvidia.drm

3c) Unload nvidia-drm

sudo modprobe -r nvidia-drm

4d) Note that nvidia-drm is not in use anymore.

lsmod | grep nvidia.drm

5) Go to your download folder and run the cuda installation.

sudo sh cuda_10.1.168_418.67_linux.run

6) Answer any prompts during installation.

7) When installation has finished, confirm that the CUDA Version has been updated.

nvidia-smi

8) Start the GUI again.

sudo systemctl start graphical.target

Solution 3

lsof lists any files that are in use by userspace processes. But nvidia_drm is a kernel module, so lsof won't necessarily see whether or not it is actually in use. (The module file won't be open because the kernel has already completely loaded it into RAM. But the module might be providing services to the userspace or other kernel components, and that is what prevents the unloading of the module.)

Run lsmod | grep nvidia.drm and see the numbers to the right of the nvidia_drm module name. The first number is simply the size of the module; the second is the use count. In order to successfully remove the module, the use count must be 0 first.

If the X11 server is running and using the nvidia driver, then the nvidia_drm kernel module will most assuredly be in use. So you'll need, at the very least, switch into text console and shutdown the X11 server. Usually this can be done by stopping whichever X Display Manager service you're using (depends on which desktop environment you're using).

As the error message said, if you are running nvidia-persistenced, you'll need to stop that too before you can unload the nvidia_drm module.

Solution 4

I solved this problem by disabling the GUI, rebooting, logging in and installing the driver, enabling GUI, and reboot.

Please make sure you know your username and password!!!

Open a terminal and write

sudo systemctl set-default multi-user.target
sudo reboot 0

Now login and you'll get to a terminal directly, install the driver Do note that I am installing here the 440.44 so you need to modify for your driver version.

sudo ./NVIDIA-Linux-x86_64-440.44.run

After installing the driver enable the GUI and Reboot:

sudo systemctl set-default graphical.target
sudo reboot 0

You should be done

In my case, nvidia-smi reported the new version 440.44, whine in the Ubuntu 18.04 Software & Updates Utilities, Additional Drivers Tab shows 435!! Another NVIDIA mystery, but heck my new docker works!!!

Solution 5

I had a similar problem.

*Reason: nvidia.drm package was in use


I fixed it by purging all NVIDIA packages.

Remove all previous NVIDIA installations with these 2 commands:


$ sudo apt-get purge nvidia*
$ sudo apt-get autoremove

Module should be removed.

Reboot and go forth.

Share:
147,948

Related videos on Youtube

Rodrigo
Author by

Rodrigo

Programmer and biologist, I'd love to leave the world a little better than I've found it.

Updated on September 18, 2022

Comments

  • Rodrigo
    Rodrigo over 1 year

    I'm trying to install the most up-to-date NVIDIA driver in Debian Stretch. I've downloaded NVIDIA-Linux-x86_64-390.48.run from here, but when I try to do

    sudo sh ./NVIDIA-Linux-x86_64-390.48.run
    

    as suggested, an error message appears.

    ERROR: An NVIDIA kernel module 'nvidia-drm' appears to already be loaded in your kernel.  This may be because it is in use (for example, by an X server, a CUDA program, or 
             the NVIDIA Persistence Daemon), but this may also happen if your kernel was configured without support for module unloading.  Please be sure to exit any programs    
             that may be using the GPU(s) before attempting to upgrade your driver.  If no GPU-based programs are running, you know that your kernel supports module unloading,   
             and you still receive this message, then an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest remedy is to     
             reboot your computer.
    

    When I try to find out who is using nvidia-drm (or nvidia_drm), I see nothing.

    ~$ sudo lsof | grep nvidia-drm
    lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
          Output information may be incomplete.
    ~$ sudo lsof -e /run/user/1000/gvfs | grep nvidia-drm
    ~$
    

    And when I try to remove it, it says it's being used.

    ~$ sudo modprobe -r nvidia-drm
    modprobe: FATAL: Module nvidia_drm is in use.
    ~$ 
    

    I have rebooted and started in text-only mode (by pressing Ctrl+Alt+F2 before giving username/password), but I got the same error.

    Besides it, how do I "know that my kernel supports module unloading"?

    I'm getting a few warnings on boot up related to nvidia, no idea if they're related, though:

    Apr 30 00:46:15 debian-9 kernel: nvidia: loading out-of-tree module taints kernel.
    Apr 30 00:46:15 debian-9 kernel: nvidia: module license 'NVIDIA' taints kernel.
    Apr 30 00:46:15 debian-9 kernel: Disabling lock debugging due to kernel taint
    Apr 30 00:46:15 debian-9 kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  375.82  Wed Jul 19 21:16:49 PDT 2017 (using threaded interrupts)
    
    • Admin
      Admin about 6 years
      can you try to do it in rescue mode?
    • Admin
      Admin about 6 years
      See this issue on github : systemctl stop systemd-logind before unloading the modules.
    • Admin
      Admin about 6 years
      @GAD3R All I have is systemctl stop systemd-logind.service, but this closes the screen and takes me back to the graphic login, where I have to do Ctrl+Alt+F2 again.
  • Rodrigo
    Rodrigo about 6 years
    After Ctrl+Alt+F2, lsmod is telling me there's 1 process using nvidia_drm. So I did sudo /etc/init.d/gdm3 stop, which went ok in stopping it. But still 1 process in lsmod. Now inside Gnome, ps aux | grep nvidia shows [irq/129-nvidia] and [nvidia] but no nvidia-persistenced. Also, here lsmod shows 2 processes using nvidia_drm. I'm stuck.
  • Rodrigo
    Rodrigo about 6 years
    Guess I'll wait for a less clumsy solution, if any shows up.
  • Rodrigo
    Rodrigo about 6 years
    I managed to uninstall it (using Filipe's answer), and install the new version to the point where there was no more working graphic mode. I had to format the PC and reinstall Debian. Now to a completely different set of bugs... All this just to see "GPU" as an option of rendering in Blender, and I still don't see it. Proprietary drivers sucks!
  • Rodrigo
    Rodrigo about 6 years
    I managed to uninstall it (using your answer), and install the new version to the point where there was no more working graphic mode. I had to format the PC and reinstall Debian. Now to a completely different set of bugs... All this just to see "GPU" as an option of rendering in Blender, and I still don't see it. Proprietary drivers sucks!
  • John Bollinger
    John Bollinger about 6 years
    @Rodrigo, I'm sorry you had such a poor experience. But that sort of problem is an example of why I recommend using packages instead of performing manual installations.
  • Rodrigo
    Rodrigo about 6 years
    Yes, I prefer using packages. But I've read somewhere that GPU option in Blender wasn't enabled probably because of an outdated driver...
  • Don Kirkby
    Don Kirkby over 5 years
    This worked for me without the modprobe step.
  • David Jung
    David Jung about 5 years
    Yeah, I didn't need modprobe step neither.
  • Scott - Слава Україні
    Scott - Слава Україні about 5 years
    So, what are you saying?  That the only solution is to reinstall?  That obviously isn’t the only solution; other answers have been posted.
  • addison
    addison over 4 years
    I can't remove nvidia-drm even when in the text console. Any idea how I can forcebly remove it?
  • filbranden
    filbranden over 4 years
    @addison Note that it's not enough to just be on a text console, you need to stop X11 or Wayland or whatever is using the nvidia driver from the kernel. The point of the systemctl isolate command is to do that. But it's possible that's not correctly configured in your system... Check ps -ef and see if you can spot what might be using the driver, then have that process stopped. That should allow you to unload the driver.
  • Yuri Feldman
    Yuri Feldman about 4 years
    Disabling the GUI as above is the only thing that worked for me through ssh!
  • GG.
    GG. almost 3 years
    Same for me, I was unable to unload nvidia-drm manually ("module not found" or something, although it was loaded) but disabling the GUI and rebooting did it.
  • Mr.Robot
    Mr.Robot almost 3 years
    Great answer! I was struggling with solution provided by the top answer for several hours (systemctl isolate multi-user.target only gives me black screen). Your answer saved my day!
  • user1315621
    user1315621 over 2 years
    sudo modprobe -r nvidia-drm raises error modprobe: FATAL: Module nvidia_drm is in use.
  • diffracteD
    diffracteD about 2 years
    Works but # modprobe -r nvidia-drm not necessary.