Tensorflow complains that no CUDA-capable device is detected

tensorflow cuda ubuntu-18.04

11,096

I finally had the idea to look for any files with 390.77 in the name.

$ locate 390.77
/usr/lib/i386-linux-gnu/libcuda.so.390.77
/usr/lib/i386-linux-gnu/libnvcuvid.so.390.77
/usr/lib/i386-linux-gnu/libnvidia-compiler.so.390.77
/usr/lib/i386-linux-gnu/libnvidia-encode.so.390.77
/usr/lib/i386-linux-gnu/libnvidia-fatbinaryloader.so.390.77
/usr/lib/i386-linux-gnu/libnvidia-ml.so.390.77
/usr/lib/i386-linux-gnu/libnvidia-opencl.so.390.77
/usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.390.77
/usr/lib/i386-linux-gnu/vdpau/libvdpau_nvidia.so.390.77
/usr/lib/x86_64-linux-gnu/libcuda.so.390.77
/usr/lib/x86_64-linux-gnu/libnvcuvid.so.390.77
/usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.390.77
/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.390.77
/usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.390.77
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.390.77
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.390.77
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.390.77
/usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.390.77

So there they are! A closer look shows that I must have installed the newer version at some point.

$ ls /usr/lib/i386-linux-gnu/libcuda* -l
lrwxrwxrwx 1 root root      12 Nov  8 13:58 /usr/lib/i386-linux-gnu/libcuda.so -> libcuda.so.1
lrwxrwxrwx 1 root root      17 Nov 12 14:04 /usr/lib/i386-linux-gnu/libcuda.so.1 -> libcuda.so.390.77
-rw-r--r-- 1 root root 9179124 Jan 31  2018 /usr/lib/i386-linux-gnu/libcuda.so.390.30
-rw-r--r-- 1 root root 9179796 Jul 10  2018 /usr/lib/i386-linux-gnu/libcuda.so.390.77

Where did they come from?

$ dpkg -S /usr/lib/i386-linux-gnu/libcuda.so.390.30
libcuda1-390: /usr/lib/i386-linux-gnu/libcuda.so.390.30
$ dpkg -S /usr/lib/i386-linux-gnu/libcuda.so.390.77
dpkg-query: no path found matching pattern /usr/lib/i386-linux-gnu/libcuda.so.390.77

So the 390.77 no longer belongs to any package. Perhaps I installed the old version and had to force it to overwrite the links.

My plan is to delete the files, then reinstall the packages to set up the links to the correct version. So which packages will I need to reinstall?

$ locate 390.77|sed -e 's/390.77/390.30/'|xargs dpkg -S

Some of the files don't match anything, but the ones that do match are from these packages:

libcuda1-390
nvidia-opencl-icd-390

Crossing my fingers, I delete the version 390.77 files.

locate 390.77|sudo xargs rm

Then I reinstall the packages.

sudo apt-get install --reinstall libcuda1-390 nvidia-opencl-icd-390

Finally, it works!

$ LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 python -c "import tensorflow; tensorflow.Session()"
2019-02-06 22:13:59.460822: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-02-06 22:13:59.665756: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-02-06 22:13:59.666205: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 960M major: 5 minor: 0 memoryClockRate(GHz): 1.176
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.81GiB
2019-02-06 22:13:59.666226: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-06 22:17:21.254445: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-06 22:17:21.254489: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-02-06 22:17:21.254496: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-02-06 22:17:21.290992: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3539 MB memory) -> physical GPU (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0, compute capability: 5.0)

nvidia-smi also works now.

$ LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 nvidia-smi
Wed Feb  6 22:19:24 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.30                 Driver Version: 390.30                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   45C    P8    N/A /  N/A |    113MiB /  4046MiB |      6%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      3212      G   /usr/lib/xorg/Xorg                           113MiB |
+-----------------------------------------------------------------------------+

I rebooted, and the video drivers continued to work. Hurrah!

11,096

Don Kirkby

Python, Java, and C# developer working in AIDS research. Hobbies include designing board games and puzzles, as well as learning Chinese. If you just want to see the codez, check out GitHub. To contact me, use Twitter or e-mail [email protected] .

Updated on June 04, 2022

Comments

Don Kirkby almost 2 years

I'm trying to run some Tensorflow code, and I get what seems to be a common problem:

$ LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 python -c "import tensorflow; tensorflow.Session()"
2019-02-06 20:36:15.903204: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-02-06 20:36:15.908809: E tensorflow/stream_executor/cuda/cuda_driver.cc:300] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2019-02-06 20:36:15.908858: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] retrieving CUDA diagnostic information for host: tigris
2019-02-06 20:36:15.908868: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:170] hostname: tigris
2019-02-06 20:36:15.908942: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:194] libcuda reported version is: 390.77.0
2019-02-06 20:36:15.908985: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:198] kernel reported version is: 390.30.0
2019-02-06 20:36:15.909006: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:308] kernel version 390.30.0 does not match DSO version 390.77.0 -- cannot find working devices in this configuration
$

The key pieces of that error message seem to be:

[...] libcuda reported version is: 390.77.0
[...] kernel reported version is: 390.30.0
[...] kernel version 390.30.0 does not match DSO version 390.77.0 -- cannot find working devices in this configuration

How can I install compatible versions? Where is that libcuda version coming from?

Background

A few months ago, I tried installing Tensorflow with GPU support, but the versions either broke my display or wouldn't work with Tensorflow. Finally, I got it working by following a tutorial on how to install multiple versions of the CUDA libraries on the same machine. That worked at the time, but when I came back to the project after a few months, it has stopped working. I assume that some driver got upgraded during that time.

Investigation

The first thing I tried was to see what versions I have of the nvidia drivers and libcuda package.

$ dpkg --list|grep libcuda
ii  libcuda1-390                                                390.30-0ubuntu1                              amd64        NVIDIA CUDA runtime library

Looks like it's 390.30. Why does the error message say that libcuda reported 390.77?

$ dpkg --list|grep nvidia
ii  libnvidia-container-tools                                   1.0.1-1                                      amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64                                  1.0.1-1                                      amd64        NVIDIA container runtime library
rc  nvidia-384                                                  384.130-0ubuntu0.16.04.1                     amd64        NVIDIA binary driver - version 384.130
ii  nvidia-390                                                  390.30-0ubuntu1                              amd64        NVIDIA binary driver - version 390.30
ii  nvidia-390-dev                                              390.30-0ubuntu1                              amd64        NVIDIA binary Xorg driver development files
rc  nvidia-396                                                  396.44-0ubuntu1                              amd64        NVIDIA binary driver - version 396.44
ii  nvidia-container-runtime                                    2.0.0+docker18.09.1-1                        amd64        NVIDIA container runtime
ii  nvidia-container-runtime-hook                               1.4.0-1                                      amd64        NVIDIA container runtime hook
ii  nvidia-docker2                                              2.0.3+docker18.09.1-1                        all          nvidia-docker CLI wrapper
ii  nvidia-modprobe                                             390.30-0ubuntu1                              amd64        Load the NVIDIA kernel driver and create device files
rc  nvidia-opencl-icd-384                                       384.130-0ubuntu0.16.04.1                     amd64        NVIDIA OpenCL ICD
ii  nvidia-opencl-icd-390                                       390.30-0ubuntu1                              amd64        NVIDIA OpenCL ICD
rc  nvidia-opencl-icd-396                                       396.44-0ubuntu1                              amd64        NVIDIA OpenCL ICD
ii  nvidia-prime                                                0.8.8.2                                      all          Tools to enable NVIDIA's Prime
ii  nvidia-settings                                             396.44-0ubuntu1                              amd64        Tool for configuring the NVIDIA graphics driver

Again, everything looks like it's 390.30. There were some packages that had version 390.77, but they were in the rc status. I guess I installed that version and later removed it, so the configuration files were left behind. I purged the configuration files with commands like this:

sudo apt-get remove --purge nvidia-kernel-common-390

Now, there are no packages at all with version 390.77.

$ dpkg --list|grep 390.77
$

I tried reinstalling CUDA, to see if it had been compiled with the wrong version.

$ sudo sh cuda_9.0.176_384.81_linux.run --silent --toolkit --toolkitpath=/usr/local/cuda-9.0 --override

That didn't make any difference.

Finally, I tried running nvidia-smi.

$ LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64 nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
$

All of this is running on Ubuntu 18.04 with Python 3.6.7, and my graphics card is NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2).

Recents

Why Is PNG file with Drop Shadow in Flutter Web App Grainy?

How to troubleshoot crashes detected by Google Play Store for Flutter app

Cupertino DateTime picker interfering with scroll behaviour

Why does awk -F work for most letters, but not for the letter "t"?

Flutter change focus color and icon color but not works

How to print and connect to printer using flutter desktop via usb?

Critical issues have been reported with the following SDK versions: com.google.android.gms:play-services-safetynet:17.0.0

Flutter Dart - get localized country name from country code

navigatorState is null when using pushNamed Navigation onGenerateRoutes of GetMaterialPage

Android Sdk manager not found- Flutter doctor error

Flutter Laravel Push Notification without using any third party like(firebase,onesignal..etc)

How to change the color of ElevatedButton when entering text in TextField

How to install libcusolver.so.11

cuda install error on Ubuntu 17.04

Tensorflow import error

TensorFlow GPU: is cudnn optional? Couldn't open CUDA library libcudnn.so

How to install cuda 10.0 with nvidia-418 driver on Ubuntu 19.04?

Installing Cuda 10 and TensorFlow 2.0 Ubuntu 19.10

How to check if cuda is installed correctly on Anaconda

cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version

Tensorflow: I installed CUDA 9.2 but it needs 9.0?

Tensorflow cannot open libcuda.so.1