How to configure igpu for xserver and nvidia gpu for cuda?

29,230

Solution 1

I (creator of this post) found the solution I need on my own!

I will now explain the solution for anybody else who is in a similar situation and needs this help!

SOLUTION:
INSTALL THE NVIDIA DRIVER VIA THE RUNFILE PROVIDED AT http://www.nvidia.com/object/unix.html WITH THE FLAG "--no-opengl-files" !!

This prevents not only the nvidia opengl files from overwriting the existing mesa files but also installs the driver without nvidia prime!!

So all of my problems are solved, simply by installing the driver manually, instead of installing it from the repositories. The package from the repositories is "Optimus-Friendly" and therefore has all the useless troublemakers bundled with it.

SECONDLY

the xorg.conf has to be extended with another screen for the dedicated GPU(s) so that it/they has/have entries in nvidia-settings.

mine looks like this

Section "ServerLayout"
    Identifier     "Layout0"
    Screen 0       "intel" 0 0
    Screen 1       "nvidia550ti" 3000 0
EndSection

Section "Device"
    Identifier     "intel"
    Driver         "intel"
    BusID          "PCI:0@0:2:0"
EndSection

Section "Device"
    Identifier     "nvidia550ti"
    Driver         "nvidia"
    BoardName      "GeForce GTX 550ti"
    BusID          "PCI:2@0:0:0"
EndSection

Section "Screen"
    Identifier     "intel"
    Device         "intel"
EndSection

Section "Screen"
    Identifier     "nvidia550ti"
    Device         "nvidia550ti"
    Option         "AllowEmptyInitialConfiguration" "on"
    Option         "Coolbits" "4"
    Option         "ConstrainCursor" "on"
EndSection

Solution 2

I reinstalled the nvidia drivers without opengl as mentioned in the above solutions but it did not work for me. Moreover these solutions are quite unsatisfactory, since they imply removing some capability of the drivers (opengl).

I found a much simpler solution, for which you do not need to reinstall the driver:

  • I installed the nvidia drivers normally
  • In the nvidia-settings gui, untder PRIME profiles, I choose the intel graphic card as a main GPU
  • After rebooting, nvidia-smi does not work, but I fixed by adding /usr/lib/nvidia-387 to the libraries path:
    export LD_LIBRARY_PATH=/usr/lib/nvidia-387:$LD_LIBRARY_PATH

Note that depending on the driver installed, you might need to add another folder in your library path, for example /usr/lib/nvidia-384.
You can add this command to the file ~/.bashrc to automatically export the path when loading a new bash.

Solution 3

The accepted answer uses the X server to enable the Nvidia devices. This is not necessary and means that the X server will use some memory of the card.

Instead nvidia-modprobe should be installed as described in the driver FAQ (can be found at the link below).

So my recommended solution is to:

1) Install the latest Nvidia driver via the runfile from ftp://download.nvidia.com/XFree86/Linux-x86_64/ with the --no-opengl-files and --dkms flag.
2) Install the corresponding nvidia-modprobe version via make all and sudo make install from ftp://download.nvidia.com/XFree86/nvidia-modprobe/

The --dkms flag makes sure that the kernel module is recompiled when you upgrade your kernel.

Solution 4

mainly as a reminder for me: in ubuntu 18.04 to use igpu for rendering and nvidia gpu for cuda, install default nvidia drivers, open nvidia-settings and set to use intel gpu. After that, blacklist nouveau drivers and part of the nvidia drivers:

open /etc/modprobe.d/blacklist-nvidia.conf and comment lines in this way:

#blacklist nvidia
blacklist nvidia-drm
#blacklist nvidia-modeset
#alias nvidia off
alias nvidia-drm off
#alias nvidia-modeset off

after that open /etc/modprobe.d/blacklist-nvidia-nouveau.conf and add those lines:

blacklist nouveau
options nouveau modeset=0

to be really sure nouveau are disabled, you can blacklist them in /etc/modprobe.d/blacklist.conf too: at the end add:

#Blacklist nouveau drivers
blacklist nouveau
blacklist lbm-nouveau
alias nouveau off
alias lbm-nouveau off

then reboot.

type nvidia-smi to check nvidia is loaded, and type lspci -nnk | grep -iA2 3D to check the driver in use is nvidia and not nouveau.

Share:
29,230

Related videos on Youtube

winnetou
Author by

winnetou

Updated on September 18, 2022

Comments

  • winnetou
    winnetou over 1 year

    Ubuntu 16.04

    Output of uname -a:

    Linux HOST 4.4.0-22-generic #40-Ubuntu SMP Thu May 12 22:03:46 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
    

    Desktop grade:

    • CPU: Intel
    • GPU: Nvidia with 361.42

    What I want:

    • the intel GPU shall run the xserver and my monitor, which is connected to the onboard DP
    • the nvidia GPU shall only be used for CUDA specific computation etc.
    • full control over the nvidia gpu (real time, stats, temps fan speeds...)

    My Problem:

    • neither nvidia-smi nor nvidia-settings work and I cannot control my nvidia GPU (the errors are cited further down)

    My Story:

    After the short summary of my problem I want to dive into the topic; Since the release of Ubuntu 16.04 I am tinkering and failing to achieve the following:

    • I want my intel GPU (i7 6700K) to drive my Xserver and everything associtated to it.
    • I want my dedicated nvidia GPU to only be used for Cuda based computation and the like.
    • I will add more than one nvidia GPU to the system, after I got my problems solved.

    A short summary of my initial state:

    I installed the proprietary Drivers for nvidia and intel (intel-microcode and nvidia-361.42) via apt-get and disabled secure boot via mokutul --disable-validation.
    Then I set nvidia-prime to use the intel card.
    Then I edited my xorg.conf to contain only one screen with intel gpu and intel driver. (ask for details if needed)
    Testing the GPU for rendering with Blender, everything seemed fine, except that I couldn't get any stats of my gpu and nvidia-settings appeared empty.

    Errors:

    sudo nvidia-smi
    NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system:
    Please also try adding directory that contains libnvidia-ml.so to your system PATH.
    

    What I have so far learned through all my tries and researches since the release (short version, ask for detail any time):

    My two Problems are related but not the same:

    Nvidia-settings Empty:

    • this is because these settings only show up, when there is an Xserver connected to the nvidia GPU
    • the solution for this would be to add a new screen in xorg.conf that forces and unused xserver to run on the nvidia GPU
    • but this is currently not possible (see other problem) and not desired, as I purely want the nvidia GPU to focus on Cuda

    Nvidia-smi not working:

    • bbswitch is not a problem as my GPU (550ti) does not support it (errors in dmesg)
    • nvidia prime changes the entry for x86_64-linux-gnu_gl_conf to either /usr/lib/nvidia-361/ld.so.conf (nvidia GPU selected) or /usr/lib/nvidia-361-prime/ld.so.conf (intel GPU selected)
    • the configuration for the intel selection is missing essential paths to the essential nvidia modules which are all present in the conf for nvidia selection
    • when switching to nvidia via prime-select, I don't have a Xserver as the Display is connected to the integrated GPU, but logging in at a virtual console nvidia-smi works

    My Assumption:

    • Nvidia prime is bad and does not want the way I want.
    • I have to somehow overcome prime and configure the system (even manually writing new configs?)

    My Tries:

    • I tried uninstalling nvidia-prime but I only recognised afterwards, that this cannot work. When the conf file for x86_64-linux-gnu_gl_conf is deleted, the outcome is a pure mess...
    • I even tried adding the missing paths to the x86_64-linux-gnu_gl_conf files manually, but I didn't really know what I was doing and had no success.

    My Questions:

    1) How can I solve the nvidia-smi problem? Am I on the right track? Does anyone have instructions how I could proceed?

    2) Is it possible to enable fan control and further controls for the nvidia gpu (coolbits in xorg.conf) without an Xserver on the gpu (without a screen for the gpu in xorg.conf)?

    Huge thanks in advance for any replies. I literally combed the web, the comb being my problem.
    If I missed anything important, please tell me and do not hesitate to request log files etc.

    THANKS

    Images

    My additional driver tab image: enter image description here

  • winnetou
    winnetou almost 8 years
    good answer! BUT I tired Bumblebee with ubuntu 15.10 and the same hardware and It got messed up. So much, that I did a fresh install of 16.04 after it became available. It was really worse. Even Recovery Mode didn't boot correctly.... So I swore myself not to use bumblebee any more. Another problem with bumblebee was (I don't know how you circumvented this) was, that my dedicated gpu doesn't support bbswitch, so I got errors, that the gpu couln't be switched off correctly
  • winnetou
    winnetou almost 8 years
    But, this is theoretically a possible answer! But not the one I am looking for :( .
  • winnetou
    winnetou almost 8 years
    Another BUT: (I don't know any more, if this worked at my try) Does nvidia-settings show you the GPU (sudo optirun nvidia-settings of course) ? Are you able to control the fan speed, voltage etc? If I recall correctly, these settings are only possible in nvidia-settings, after you enable the coolbits in the xorg.conf file. nvidia-smi only gives you a monitor..... please correct if I'm wrong
  • winnetou
    winnetou almost 8 years
    And another huge Question mark is: Does Bumblebee work fine if I have more than one dedicated gpu? This usecase (igpu + gpu) is already not recommended for bumblebee on non mobile gpus, but having more than one d-gpu is a hole new story for bumblebee.
  • vskubriev
    vskubriev almost 8 years
    @winnetou you absolutly right about more than one gpu - it is a good question. Another issue is that nvidia-docker does not work as expected most likely.
  • Marko Avlijaš
    Marko Avlijaš almost 7 years
    This worked for me. This is simplest answer and should be the accepted answer.
  • Marko Avlijaš
    Marko Avlijaš almost 7 years
    Hi. Answer from leezu is simpler and perhaps you should accept that one?
  • winnetou
    winnetou almost 7 years
    I cannot confirm the answer of leezu yet, though it looks very promising to my eyes. If there are more people reporting that it works, or I finally get to test his answer and can confirm, I will change the accepted answer. He fixes the issue, that nvidia-modprobe does not work without a x-server running on that card.
  • Afzal N
    Afzal N over 6 years
    This doesn't work if you remove all the existing nvidia drivers and start all over again. Says this NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
  • revers
    revers over 6 years
    This solution did not work for me. Instead I just installed the driver normally and added the nvidia path to LD_LIBRARY_PATH as mentioned below.
  • liang
    liang about 6 years
    What happens if you upgrade the nvidia driver? Do you have to manually change the library path?
  • revers
    revers about 6 years
    I guess this should not be a problem if the driver is still in the same folder. If the folder changes, you should probably change the library path.
  • Pandian Le
    Pandian Le over 3 years
    It works on ubuntu 16.04. You have to put export LD_LIBRARY_PATH=/usr/lib/nvidia-***:$LD_LIBRARY_PATH in ~/.bashrc. Otherwise it doesn't work. But the problem is that the secondary screen does not work get detected anymore.
  • Lich4r
    Lich4r almost 3 years
    Thanks. Only your solution worked as I could not find any folder in /usr/lib. However I cannot use GreenwithEnvy to overclock the card. It says "Nvidia NV CONTROL X extension not found"
  • Lich4r
    Lich4r almost 3 years
    On ubuntu 20.04 using 460 driver I do not have such folder in /usr/lib. How should I achieve the same thing?