No CUDA-capable device is detected although requirements are installed

28,174

Solution 1

Looks like you are on a laptop with Nvidia Optimus, have you switched to nvidia using prime-select nvidia

Solution 2

It should also be noted another potential cause of this behaviour is if the CUDA_VISIBLE_DEVICES environment variable has been set to empty.

I experienced similar issues and it turned out this was accidentally getting set in my bash environment files.

Share:
28,174

Related videos on Youtube

a_guest
Author by

a_guest

Updated on September 18, 2022

Comments

  • a_guest
    a_guest over 1 year

    Problem

    I just installed cuda following the official installations instructions via the .deb file. When it comes to section 6.2.2.3 (running deviceQuery) I get the message that no CUDA-capable device was found although I'm pretty sure everything is setup correctly:

    $ ./bin/x86_64/linux/release/deviceQuery
    ./bin/x86_64/linux/release/deviceQuery Starting...
    
     CUDA Device Query (Runtime API) version (CUDART static linking)
    
    cudaGetDeviceCount returned 38
    -> no CUDA-capable device is detected
    Result = FAIL
    

    System information

    Here is some information about my system:

    $ uname -m && cat /etc/*release
    x86_64
    DISTRIB_RELEASE=16.04
    DISTRIB_DESCRIPTION="Ubuntu 16.04.2 LTS"
    VERSION="16.04.2 LTS (Xenial Xerus)"
    
    $ uname -r
    4.4.0-64-generic
    
    $ lspci | grep -i nvidia
    08:00.0 3D controller: NVIDIA Corporation GK208M [GeForce 920M] (rev a1)
    
    $ gcc --version
    gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
    

    I also verified the kernel headers are installed:

    $ sudo apt-get install linux-headers-$(uname -r)
    linux-headers-4.4.0-64-generic is already the newest version (4.4.0-64.85).
    

    Installation of CUDA

    So my system meets all the prerequisites. I then followed the instructions for the installation via apt-get (I installed cuda-repo-ubuntu1604-8-0-local-ga2_8.0.61-1_amd64.deb).

    PATH and LD_LIBRARY_PATH are set to point to the required locations:

    $ echo $PATH
    /usr/local/cuda-8.0/bin:[...]
    
    $ echo $LD_LIBRARY_PATH 
    /usr/local/cuda-8.0/lib64
    

    Note that I did setup up LD_LIBRARY_PATH manually although this was mentioned to be necessary only for the runfile installation. However the error persists when resetting LD_LIBRARY_PATH.

    The NVIDIA drivers also seem to be up-to-date:

    $ cat /proc/driver/nvidia/version
    NVRM version: NVIDIA UNIX x86_64 Kernel Module  367.57  Mon Oct  3 20:37:01 PDT 2016
    GCC version:  gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
    

    Information about the cuda compiler driver:

    $ nvcc -V
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2016 NVIDIA Corporation
    Built on Tue_Jan_10_13:22:03_CST_2017
    Cuda compilation tools, release 8.0, V8.0.61
    

    The instructions mention that this could be a problem with file permission:

    If a CUDA-capable device and the CUDA Driver are installed but deviceQuery reports that no CUDA-capable devices are present, this likely means that the /dev/nvidia* files are missing or have the wrong permissions.

    Those files didn't have the execution flag which I then added:

    $ ls -al /dev/nvidia*
    crwxrwxrwx 1 root root 195,   0 Feb 27 13:17 /dev/nvidia0
    crwxrwxrwx 1 root root 195, 255 Feb 27 13:17 /dev/nvidiactl
    crwxrwxrwx 1 root root 195, 254 Feb 27 13:17 /dev/nvidia-modeset
    crwxrwxrwx 1 root root 243,   0 Feb 27 13:17 /dev/nvidia-uvm
    crwxrwxrwx 1 root root 243,   1 Feb 27 18:24 /dev/nvidia-uvm-tools
    

    However after running deviceQuery (which still fails) some of the permissions are reset:

    $ ls -al /dev/nvidia*
    crwxrwxrwx 1 root root 195,   0 Feb 27 13:17 /dev/nvidia0
    crw-rw-rw- 1 root root 195, 255 Feb 27 13:17 /dev/nvidiactl
    crwxrwxrwx 1 root root 195, 254 Feb 27 13:17 /dev/nvidia-modeset
    crw-rw-rw- 1 root root 243,   0 Feb 27 13:17 /dev/nvidia-uvm
    crw-rw-rw- 1 root root 243,   1 Feb 27 18:24 /dev/nvidia-uvm-tools
    

    That's a bit puzzling especially because I'm running deviceQuery without sudo.

    Maybe related

    Samples build fails

    When I try to build the cuda samples via make it fails for one of them with the message

    /usr/bin/ld: cannot find -lnvcuvid
    collect2: error: ld returned 1 exit status
    Makefile:381: recipe for target 'cudaDecodeGL' failed
    make[1]: *** [cudaDecodeGL] Error 1
    

    Which indeed seems to be missing:

    $ ls /usr/local/cuda-8.0/lib64/libnvcuvid
    ls: cannot access '/usr/local/cuda-8.0/lib64/libnvcuvid': No such file or directory
    

    Although the corresponding header file is there:

    $ ls /usr/local/cuda-8.0/targets/x86_64-linux/include/nvcuvid.h 
    /usr/local/cuda-8.0/targets/x86_64-linux/include/nvcuvid.h
    

    Problem with static linking

    The error which is raised from deviceQuery suggests a problem with static linking:

    CUDA Device Query (Runtime API) version (CUDART static linking)
    

    AFAIK LD_LIBRARY_PATH is only responsible for dynamic linking. I found this question where a suggestion is to include /usr/lib/nvidia-current to the linker path. However this directory doesn't exist within my installation:

    $ ls /usr/lib/nvidia-current
    ls: cannot access '/usr/lib/nvidia-current': No such file or directory
    
    • Artyom
      Artyom over 7 years
      looks like you are on a laptop, have you switched to nvidia using "prime-select nvidia"
    • Admin
      Admin over 7 years
      As above... Nvidia drivers won't load if the card isn't being used. No Nvidia drivers, no CUDA.
    • a_guest
      a_guest over 7 years
      Thanks guys, prime-select nvidia helped! I guess this means I was running on onboard graphics before?
    • Artyom
      Artyom over 7 years
      @a_guest Yep, you were using your onboard graphics. To check what are you are using easily; after login > top right button > about this computer, you'll see your graphics there. Also could you select my answer so I'll get internet points and this question will be marked answered to help others.
  • biocyberman
    biocyberman over 6 years
    I installed 'nvidia-prime' and ran the command. This seems to solved my problem with Ubuntu 16.04 on a Dell server. 'It seems' because I haven't restarted the server, and I don't know if I have to run prime-select again.
  • Seth Bruder
    Seth Bruder over 2 years
    A variant of this is if you have CUDA_VISIBLE_DEVICES set to the GUID of a card that you have replaced.