Tensorflow complains that no CUDA-capable device is detected

11,096

I finally had the idea to look for any files with 390.77 in the name.

$ locate 390.77
/usr/lib/i386-linux-gnu/libcuda.so.390.77
/usr/lib/i386-linux-gnu/libnvcuvid.so.390.77
/usr/lib/i386-linux-gnu/libnvidia-compiler.so.390.77
/usr/lib/i386-linux-gnu/libnvidia-encode.so.390.77
/usr/lib/i386-linux-gnu/libnvidia-fatbinaryloader.so.390.77
/usr/lib/i386-linux-gnu/libnvidia-ml.so.390.77
/usr/lib/i386-linux-gnu/libnvidia-opencl.so.390.77
/usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.390.77
/usr/lib/i386-linux-gnu/vdpau/libvdpau_nvidia.so.390.77
/usr/lib/x86_64-linux-gnu/libcuda.so.390.77
/usr/lib/x86_64-linux-gnu/libnvcuvid.so.390.77
/usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.390.77
/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.390.77
/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.390.77
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.390.77
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.390.77
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.390.77
/usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.390.77

So there they are! A closer look shows that I must have installed the newer version at some point.

$ ls /usr/lib/i386-linux-gnu/libcuda* -l
lrwxrwxrwx 1 root root      12 Nov  8 13:58 /usr/lib/i386-linux-gnu/libcuda.so -> libcuda.so.1
lrwxrwxrwx 1 root root      17 Nov 12 14:04 /usr/lib/i386-linux-gnu/libcuda.so.1 -> libcuda.so.390.77
-rw-r--r-- 1 root root 9179124 Jan 31  2018 /usr/lib/i386-linux-gnu/libcuda.so.390.30
-rw-r--r-- 1 root root 9179796 Jul 10  2018 /usr/lib/i386-linux-gnu/libcuda.so.390.77

Where did they come from?

$ dpkg -S /usr/lib/i386-linux-gnu/libcuda.so.390.30
libcuda1-390: /usr/lib/i386-linux-gnu/libcuda.so.390.30
$ dpkg -S /usr/lib/i386-linux-gnu/libcuda.so.390.77
dpkg-query: no path found matching pattern /usr/lib/i386-linux-gnu/libcuda.so.390.77

So the 390.77 no longer belongs to any package. Perhaps I installed the old version and had to force it to overwrite the links.

My plan is to delete the files, then reinstall the packages to set up the links to the correct version. So which packages will I need to reinstall?

$ locate 390.77|sed -e 's/390.77/390.30/'|xargs dpkg -S

Some of the files don't match anything, but the ones that do match are from these packages:

  • libcuda1-390
  • nvidia-opencl-icd-390

Crossing my fingers, I delete the version 390.77 files.

locate 390.77|sudo xargs rm

Then I reinstall the packages.

sudo apt-get install --reinstall libcuda1-390 nvidia-opencl-icd-390

Finally, it works!

$ LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 python -c "import tensorflow; tensorflow.Session()"
2019-02-06 22:13:59.460822: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-02-06 22:13:59.665756: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-02-06 22:13:59.666205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.176
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.81GiB
2019-02-06 22:13:59.666226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-06 22:17:21.254445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-06 22:17:21.254489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-02-06 22:17:21.254496: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-02-06 22:17:21.290992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3539 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)

nvidia-smi also works now.

$ LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 nvidia-smi
Wed Feb  6 22:19:24 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   45C    P8    N/A /  N/A |    113MiB /  4046MiB |      6%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      3212      G   /usr/lib/xorg/Xorg                           113MiB |
+-----------------------------------------------------------------------------+

I rebooted, and the video drivers continued to work. Hurrah!

Share:
11,096

Related videos on Youtube

Don Kirkby
Author by

Don Kirkby

Python, Java, and C# developer working in AIDS research. Hobbies include designing board games and puzzles, as well as learning Chinese. If you just want to see the codez, check out GitHub. To contact me, use Twitter or e-mail [email protected] .

Updated on June 04, 2022

Comments

  • Don Kirkby
    Don Kirkby almost 2 years

    I'm trying to run some Tensorflow code, and I get what seems to be a common problem:

    $ LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 python -c "import tensorflow; tensorflow.Session()"
    2019-02-06 20:36:15.903204: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    2019-02-06 20:36:15.908809: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
    2019-02-06 20:36:15.908858: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] retrieving CUDA diagnostic information for host: tigris
    2019-02-06 20:36:15.908868: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:170] hostname: tigris
    2019-02-06 20:36:15.908942: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:194] libcuda reported version is: 390.77.0
    2019-02-06 20:36:15.908985: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:198] kernel reported version is: 390.30.0
    2019-02-06 20:36:15.909006: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:308] kernel version 390.30.0 does not match DSO version 390.77.0 -- cannot find working devices in this configuration
    $
    

    The key pieces of that error message seem to be:

    [...] libcuda reported version is: 390.77.0
    [...] kernel reported version is: 390.30.0
    [...] kernel version 390.30.0 does not match DSO version 390.77.0 -- cannot find working devices in this configuration
    

    How can I install compatible versions? Where is that libcuda version coming from?

    Background

    A few months ago, I tried installing Tensorflow with GPU support, but the versions either broke my display or wouldn't work with Tensorflow. Finally, I got it working by following a tutorial on how to install multiple versions of the CUDA libraries on the same machine. That worked at the time, but when I came back to the project after a few months, it has stopped working. I assume that some driver got upgraded during that time.

    Investigation

    The first thing I tried was to see what versions I have of the nvidia drivers and libcuda package.

    $ dpkg --list|grep libcuda
    ii  libcuda1-390                                                390.30-0ubuntu1                              amd64        NVIDIA CUDA runtime library
    

    Looks like it's 390.30. Why does the error message say that libcuda reported 390.77?

    $ dpkg --list|grep nvidia
    ii  libnvidia-container-tools                                   1.0.1-1                                      amd64        NVIDIA container runtime library (command-line tools)
    ii  libnvidia-container1:amd64                                  1.0.1-1                                      amd64        NVIDIA container runtime library
    rc  nvidia-384                                                  384.130-0ubuntu0.16.04.1                     amd64        NVIDIA binary driver - version 384.130
    ii  nvidia-390                                                  390.30-0ubuntu1                              amd64        NVIDIA binary driver - version 390.30
    ii  nvidia-390-dev                                              390.30-0ubuntu1                              amd64        NVIDIA binary Xorg driver development files
    rc  nvidia-396                                                  396.44-0ubuntu1                              amd64        NVIDIA binary driver - version 396.44
    ii  nvidia-container-runtime                                    2.0.0+docker18.09.1-1                        amd64        NVIDIA container runtime
    ii  nvidia-container-runtime-hook                               1.4.0-1                                      amd64        NVIDIA container runtime hook
    ii  nvidia-docker2                                              2.0.3+docker18.09.1-1                        all          nvidia-docker CLI wrapper
    ii  nvidia-modprobe                                             390.30-0ubuntu1                              amd64        Load the NVIDIA kernel driver and create device files
    rc  nvidia-opencl-icd-384                                       384.130-0ubuntu0.16.04.1                     amd64        NVIDIA OpenCL ICD
    ii  nvidia-opencl-icd-390                                       390.30-0ubuntu1                              amd64        NVIDIA OpenCL ICD
    rc  nvidia-opencl-icd-396                                       396.44-0ubuntu1                              amd64        NVIDIA OpenCL ICD
    ii  nvidia-prime                                                0.8.8.2                                      all          Tools to enable NVIDIA's Prime
    ii  nvidia-settings                                             396.44-0ubuntu1                              amd64        Tool for configuring the NVIDIA graphics driver
    

    Again, everything looks like it's 390.30. There were some packages that had version 390.77, but they were in the rc status. I guess I installed that version and later removed it, so the configuration files were left behind. I purged the configuration files with commands like this:

    sudo apt-get remove --purge nvidia-kernel-common-390
    

    Now, there are no packages at all with version 390.77.

    $ dpkg --list|grep 390.77
    $
    

    I tried reinstalling CUDA, to see if it had been compiled with the wrong version.

    $ sudo sh cuda_9.0.176_384.81_linux.run --silent --toolkit --toolkitpath=/usr/local/cuda-9.0 --override
    

    That didn't make any difference.

    Finally, I tried running nvidia-smi.

    $ LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 nvidia-smi
    Failed to initialize NVML: Driver/library version mismatch
    $
    

    All of this is running on Ubuntu 18.04 with Python 3.6.7, and my graphics card is NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2).