CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

nlp pytorch bert-language-model

58,913

Solution 1

This error can actually be due to different reasons. It is recommended to debug CUDA errors by running the code on the CPU, if possible. If that’s not possible, try to execute the script via:

CUDA_LAUNCH_BLOCKING=1 python [YOUR_PROGRAM]

This will help you get the right line of code which raised the error in the stack trace so that you can resolve it.

Solution 2

No, batch size does not matter in this case

The most likely reason is that there is an inconsistency between number of labels and number of output units.

Try printing the size of the final output in the forward pass and check the size of the output

print(model.fc1(x).size())
Here fc1 would be replaced by the name of your model's last linear layer before returning

Make sure that label.size() is equal to prediction.size() before calculating the loss

And even after fixing that problem, you'll have to restart the GPU runtime (I needed to do this in my case when using a colab GPU)

This answer might also be helpful

Solution 3

Reducing batch size works for me and the training proceeds as planned.

Solution 4

First, try running the same on your CPU to check if everything is fine with your tensors' shapes.

In my case everything was fine. And since this error means "Resource allocation failed inside the cuBLAS library", I tried decreasing the batch size and it solved the issue. You said you increased to 64 and it didn't help. Can you try 32, 8, 1?

Solution 5

I encountered this problem when the number of label is not equaled with the number of network's output channel, i.e the number of classes predicted.

View more solutions

58,913

Author by

Mr. NLP

Research in NLP

Updated on July 27, 2022

Comments

Mr. NLP almost 2 years

I got the following error when I ran my pytorch deep learning model in colab

/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
   1370         ret = torch.addmm(bias, input, weight.t())
   1371     else:
-> 1372         output = input.matmul(weight.t())
   1373         if bias is not None:
   1374             output += bias

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

I even reduced batch size from 128 to 64 i.e., reduced to half, but still, I got this error. Earlier, I ran the same code with a batch size of 128 but didn't get any error like this.