RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED using pytorch
Solution 1
There is some discussion regarding this here. I had the same issue but using cuda 11.1 resolved it for me.
This is the exact pip command
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
Solution 2
In my case it actually had nothing do with the PyTorch/CUDA/cuDNN version. PyTorch initializes cuDNN lazily whenever a convolution is executed for the first time. However, in my case there was not enough GPU memory left to initialize cuDNN because PyTorch itself already held the entire memory in its internal cache. One can release the cache manually with "torch.cuda.empty_cache()" right before the first convolution that is executed. A cleaner solution is to force cuDNN initialization at the beginning by doing a mock convolution:
def force_cudnn_initialization():
s = 32
dev = torch.device('cuda')
torch.nn.functional.conv2d(torch.zeros(s, s, s, s, device=dev), torch.zeros(s, s, s, s, device=dev))
Calling the above function at the very beginning of the program solved the problem for me.
Solution 3
I am also using Cuda 10.2. I had the exact same error when upgrading torch and torchvision to the latest version (torch-1.8.0 and torchvision-0.9.0). Which version are you using?
I guess this is not the best solution but by downgrading to torch-1.7.1 and torchvision-0.8.2 it works just fine.
Eduardo H
Updated on December 03, 2021Comments
-
Eduardo H over 2 years
I am trying to run a simple pytorch sample code. It's works fine using CPU. But when using GPU, i get this error message:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 263, in forward return self._conv_forward(input, self.weight, self.bias) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 260, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
The code i am trying to run is the following:
import torch from torch import nn m = nn.Conv1d(16, 33, 3, stride=2) m=m.to('cuda') input = torch.randn(20, 16, 50) input=input.to('cuda') output = m(input)
I am running this code in a NVIDIA docker with CUDA version 10.2 and my GPU is a RTX 2070
-
mosc9575 about 3 yearsOne hint which is not related to your problem. Please do not use python keywords as a variable because this can cause some very ugly and difficult problems.
-
Tim Roberts about 3 years
import torch.cuda
/torch.cuda.is_available()
? -
Guojun Zhang about 3 yearsI have exactly the same problem on CUDA 10.2. Did you solve it?
-
Eduardo H about 3 years@GuojunZhang I solved it by using the pytorch container for nvidia docker.
-