How to get rid of CUDA out of memory without having to restart the machine?

5,659

You could use try using torch.cuda.empty_cache(), since PyTorch is the one that's occupying the CUDA memory.

Share:
5,659

Related videos on Youtube

Mona Jalal
Author by

Mona Jalal

contact me at [email protected] I am a 5th-year computer science Ph.D. Candidate at Boston University advised by Professor Vijaya Kolachalama in computer vision as the area of study. Currently, I am working on my proposal exam and thesis on the use of efficient computer vision and deep learning for cancer detection in H&E stained digital pathology images.

Updated on September 18, 2022

Comments

  • Mona Jalal
    Mona Jalal over 1 year

    Is there a hack in Ubuntu 20.04 to get rid of the following CUDA out of memory error without having to restart the machine?

    RuntimeError: CUDA out of memory. Tried to allocate 40.00 MiB (GPU 0; 7.80 GiB total capacity; 6.34 GiB already allocated; 32.44 MiB free; 6.54 GiB reserved in total by PyTorch)

    I understand that the following works but then also kills my Jupyter notebook. Is there a way to free up memory in GPU without having to kill the Jupyter notebook?

    (base) mona@mona:~/research/facial_landmark$ nvidia-smi
    Tue Oct  6 20:28:05 2020       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  GeForce RTX 2070    Off  | 00000000:01:00.0 Off |                  N/A |
    | N/A   47C    P8     9W /  N/A |   7883MiB /  7982MiB |      2%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    0   N/A  N/A      1306      G   /usr/lib/xorg/Xorg                255MiB |
    |    0   N/A  N/A      1743      G   /usr/bin/gnome-shell              151MiB |
    |    0   N/A  N/A      3273      G   /usr/lib/firefox/firefox            2MiB |
    |    0   N/A  N/A      3359      G   /usr/lib/firefox/firefox            2MiB |
    |    0   N/A  N/A      3844      G   /usr/lib/firefox/firefox            2MiB |
    |    0   N/A  N/A      4222      G   /usr/lib/firefox/firefox            2MiB |
    |    0   N/A  N/A      4587      C   ...mona/anaconda3/bin/python     7459MiB |
    +-----------------------------------------------------------------------------+
    (base) mona@mona:~/research/facial_landmark$ kill -9  4587
    (base) mona@mona:~/research/facial_landmark$ nvidia-smi
    Tue Oct  6 20:28:24 2020       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  GeForce RTX 2070    Off  | 00000000:01:00.0 Off |                  N/A |
    | N/A   47C    P8     9W /  N/A |    433MiB /  7982MiB |      4%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
                                                                                   
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |    0   N/A  N/A      1306      G   /usr/lib/xorg/Xorg                255MiB |
    |    0   N/A  N/A      1743      G   /usr/bin/gnome-shell              152MiB |
    |    0   N/A  N/A      3273      G   /usr/lib/firefox/firefox            2MiB |
    |    0   N/A  N/A      3359      G   /usr/lib/firefox/firefox            2MiB |
    |    0   N/A  N/A      3844      G   /usr/lib/firefox/firefox            2MiB |
    |    0   N/A  N/A      4222      G   /usr/lib/firefox/firefox            2MiB |
    +-----------------------------------------------------------------------------+
    (base) mona@mona:~/research/facial_landmark$ 
    
  • Jean Monet
    Jean Monet about 3 years
    If for example I shut down my Jupyter kernel without first x.detach.cpu() then del x then torch.cuda.empty_cache(), it becomes impossible to free that memorey from a different notebook. So the solution would not work. Astonished to see that in 2021 it's such a pain to delete stuff from cuda memory.