Out of memory running Tensorflow with GPU support in PyCharm

12,667

To wrap up our conversation as per the comments, I'm do not believe that you can allocate GPU memory or desktop memory to the GPU - not in the way that you are trying to. When you have a single GPU, Tensorflow-GPU in most cases will allocate around 95% of the available memory to the task it runs. In your case, Something is already consuming all of the available GPU memory which is the primary reason your program does not run. You need to review the memory usage of your GPU and free up some memory (I can't help but to think you already have another instance python using Tensorflow GPU running in the background or some other intensive GPU program). In Linux the command nvidia-smi on the command line will tell you what uses your GPU here is an example

Sun Jan 20 18:23:35 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130                Driver Version: 384.130                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 970     Off  | 00000000:01:00.0 Off |                  N/A |
| 32%   63C    P2    69W / 163W |   3823MiB /  4035MiB |     40%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      3019      C   ...e/scarter/anaconda3/envs/tf1/bin/python  3812MiB |
+-----------------------------------------------------------------------------+

You can see in my case, that my card on my server has 4035MB or RAM, 3823MB is being used. Further more, review GPU process at the bottom. Process PID 3019 consumes 3812MB of the available 4035MB on the card. If We wanted to run another python script using tensorflow, I have 2 main choices, I can either install a second GPU and run on the second GPU or if no GPU is available, then run on the CPU. Someone more expert than me may say that you could allocate just half the memory to each task, but 2Gig of memory is already pretty low for tensorflow training. Typically cards with much more memeory (6 gig +) is recommended for that task.
In closing, find out what is consuming all of your Video cards memory and end that task. I believe it will resolve your problem.

Share:
12,667
ling
Author by

ling

Updated on June 11, 2022

Comments

  • ling
    ling almost 2 years

    My code works fine when running in iPython terminal, but failed with out of memory error, as below.

    /home/abigail/anaconda3/envs/tf_gpuenv/bin/python -Xms1280m -Xmx4g /home/abigail/PycharmProjects/MLNN/src/test.py
    Using TensorFlow backend.
    Epoch 1/150
    2019-01-19 22:12:39.539156: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
    2019-01-19 22:12:39.588899: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
    2019-01-19 22:12:39.589541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
    name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.0845
    pciBusID: 0000:01:00.0
    totalMemory: 1.95GiB freeMemory: 59.69MiB
    2019-01-19 22:12:39.589552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
    Traceback (most recent call last):
      File "/home/abigail/PycharmProjects/MLNN/src/test.py", line 20, in <module>
        model.fit(X, Y, epochs=150, batch_size=10)
      File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit
        validation_steps=validation_steps)
      File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop
        outs = f(ins_batch)
      File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2697, in __call__
        if hasattr(get_session(), '_make_callable_from_options'):
      File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 186, in get_session
        _SESSION = tf.Session(config=config)
      File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in __init__
        super(Session, self).__init__(target, graph, config=config)
      File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in __init__
        self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts)
    tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory
    
    Process finished with exit code 1
    

    In PyCharm, I first edited the "Help->Edit Custom VM options":

    -Xms1280m
    -Xmx4g
    

    This doesn't fix the issue. Then I edited "Run->Edit Configurations->Interpreter options":

    -Xms1280m -Xmx4g
    

    It still gives the same error. My desktop Linux has enough memory (64G). How to fix this issue?

    BTW, in PyCharm if I don't use GPU, it doesn't give the error.

    EDIT:

    In [5]: exit                                                                                                                                                                                                                                                                                                                    
    (tf_gpuenv) abigail@abigail-XPS-8910:~/nlp/MLMastery/DLwithPython/code/chapter_07$ nvidia-smi
    Sun Jan 20 00:41:49 2019       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 415.25       Driver Version: 415.25       CUDA Version: 10.0     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  GeForce GTX 750 Ti  Off  | 00000000:01:00.0  On |                  N/A |
    | 38%   54C    P0     2W /  38W |   1707MiB /  1993MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                       GPU Memory |
    |  GPU       PID   Type   Process name                             Usage      |
    |=============================================================================|
    |    0       770      G   /usr/bin/akonadi_archivemail_agent             2MiB |
    |    0       772      G   /usr/bin/akonadi_sendlater_agent               2MiB |
    |    0       774      G   /usr/bin/akonadi_mailfilter_agent              2MiB |
    |    0      1088      G   /usr/lib/xorg/Xorg                           166MiB |
    |    0      1440      G   kwin_x11                                      60MiB |
    |    0      1446      G   /usr/bin/krunner                               1MiB |
    |    0      1449      G   /usr/bin/plasmashell                          60MiB |
    |    0      1665      G   ...quest-channel-token=3687002912233960986   137MiB |
    |    0     20728      C   ...ail/anaconda3/envs/tf_gpuenv/bin/python  1255MiB |
    +-----------------------------------------------------------------------------+
    
  • ling
    ling over 5 years
    I am running on Kubuntu, see my edition above. Why does only 1 process use the GPU memory in your case, but in my case quite a few processes are using the GPU memory? In TensorFlow GPU, models only use GPU's memory? What about other RAM? My computer has a total of 64G RAM. Are they almost not used?
  • IamSierraCharlie
    IamSierraCharlie over 5 years
    Yes, in Tensorflow-GPU, models only use GPU memory. If you want to use your desktop memory, then you would not use GPU support for the model. Keep in mind that the CPU will be the bottleneck if you do your training on the CPU, not the memory. The main reason for using your GPU is because it is much faster than CPU's in most situations. In my case, I run Ubuntu server 16.04 and I have no graphical interface so there is only 1 process running on the GPU which is a Tensorflow program I am developing.
  • IamSierraCharlie
    IamSierraCharlie over 5 years
    I see in your update that you have one PID 20728 using most of your GPU's resources. This is your program already running in the background. If you were to end that program, then resources would be available for the one you are trying to run. In your case, if you are trying to run 2 training scripts on the same GPU, this is not going to be possible with the current memory allocation.
  • ling
    ling over 5 years
    The PID 20728 is the process I am running. Somehow it has not out of memory anymore, but very very slow. I am testing CPU to train the same model. It may be much faster in this case.
  • ling
    ling over 5 years
    You have a only 4G gpu, how fast is it in your case? Did you compare it with CPU training?
  • IamSierraCharlie
    IamSierraCharlie over 5 years
    Is was a GTX970, it was okay for training - still faster than my 4Ghz CPU, 2 of them was better again. Now, I keep that one GTX970GPU for training & testing of models that I don't care as much about. When I have something I really need to get done, I use my main PC with twin 1080ti cards. They have 11gig each. Its important to note that you can also run out of memory if your config is wrong. Say you double your memory, then you can train larger of batches of data because more can be loaded into memory. If you get an out of memory error, reducing the batch size can help to resolve that.
  • IamSierraCharlie
    IamSierraCharlie over 5 years
  • forgetso
    forgetso over 3 years
    Great answer. nvidia-smi showed me that I had consumed the entire GPU memory in a running Jupyter notebook.