GPU out of memory error message on Google Colab

27,394

Solution 1

You are getting out of memory in GPU. If you are running a python code, try to run this code before yours. It will show the amount of memory you have. Note that if you try in load images bigger than the total memory, it will fail.

# memory footprint support libraries/code
!ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
!pip install gputil
!pip install psutil
!pip install humanize

import psutil
import humanize
import os
import GPUtil as GPU

GPUs = GPU.getGPUs()
# XXX: only one GPU on Colab and isn’t guaranteed
gpu = GPUs[0]
def printm():
    process = psutil.Process(os.getpid())
    print("Gen RAM Free: " + humanize.naturalsize(psutil.virtual_memory().available), " |     Proc size: " + humanize.naturalsize(process.memory_info().rss))
    print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total     {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
printm()

Solution 2

Google Colab resource allocation is dynamic, based on users past usage. Suppose if a user has been using more resources recently and a new user who is less frequently uses Colab, he will be given relatively more preference in resource allocation.

Hence to get the max out of Colab , close all your Colab tabs and all other active sessions ,restart runtime for the one you want to use. You'll definitely get better GPU allocation.

If you are training a NN and still face the same issue Try to reduce the batch size too.

Solution 3

Try reducing your batch size to 8 or 16. It worked for me

Solution 4

Just as an answer to other people using Google Colab. I had this problem often when I used it for my deep learning class. I started paying for Google Colab and it immediately started allowing me to run my code. This however does not stop the problem completely. I started using Google Colab for my research and hit this error again! I started researching on Google Colabs website and found that there are GPU usage limits even for people who pay for Google Colab. To test this I tried using a secondary gmail account I rarely use. Sure enough it ran perfectly...

So in short. Share your code with a secondary email or set up a new email account. Sign into Colab with the secondary account. If that works for any of you, comment below so people are aware of this. I found it super frustrating and lost a lot of time to this error.

Share:
27,394
user1551817
Author by

user1551817

Updated on December 08, 2021

Comments

  • user1551817
    user1551817 over 2 years

    I'm using a GPU on Google Colab to run some deep learning code.

    I have got 70% of the way through the training, but now I keep getting the following error:

    RuntimeError: CUDA out of memory. Tried to allocate 2.56 GiB (GPU 0; 15.90 GiB total capacity; 10.38 GiB already allocated; 1.83 GiB free; 2.99 GiB cached)
    

    I'm trying to understand what this means. Is it talking about RAM memory? If so, the code should just run the same as is has been doing shouldn't it? When I try to restart it, the memory message appears immediately. Why would it be using more RAM when I start it today than it did when I started it yesterday or the day before?

    Or is this message about hard disk space? I could understand that because the code saves things as it goes on and so the hard disk usage would be cumulative.

    Any help would be much appreciated.


    So if it's just the GPU running out of memory - could someone explain why the error message says 10.38 GiB already allocated - how can there be memory already allocated when I start to run something. Could that be being used by someone else? Do I just need to wait and try again later?

    Here is a screenshot of the GPU usage when I run the code, just before it runs out of memory:

    enter image description here


    I found this post in which people seem to be having similar problems. When I run a code suggested on that thread I see:

    Gen RAM Free: 12.6 GB  | Proc size: 188.8 MB
    GPU RAM Free: 16280MB | Used: 0MB | Util   0% | Total 16280MB
    

    which seems to suggest there is 16 GB of RAM free.

    I'm confused.