Out of memory running Tensorflow with GPU support in PyCharm
To wrap up our conversation as per the comments, I'm do not believe that you can allocate GPU memory or desktop memory to the GPU - not in the way that you are trying to. When you have a single GPU, Tensorflow-GPU in most cases will allocate around 95% of the available memory to the task it runs. In your case, Something is already consuming all of the available GPU memory which is the primary reason your program does not run. You need to review the memory usage of your GPU and free up some memory (I can't help but to think you already have another instance python using Tensorflow GPU running in the background or some other intensive GPU program). In Linux the command nvidia-smi
on the command line will tell you what uses your GPU
here is an example
Sun Jan 20 18:23:35 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.130 Driver Version: 384.130 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 970 Off | 00000000:01:00.0 Off | N/A |
| 32% 63C P2 69W / 163W | 3823MiB / 4035MiB | 40% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 3019 C ...e/scarter/anaconda3/envs/tf1/bin/python 3812MiB |
+-----------------------------------------------------------------------------+
You can see in my case, that my card on my server has 4035MB or RAM, 3823MB is being used. Further more, review GPU process at the bottom. Process PID 3019 consumes 3812MB of the available 4035MB on the card. If We wanted to run another python script using tensorflow, I have 2 main choices, I can either install a second GPU and run on the second GPU or if no GPU is available, then run on the CPU. Someone more expert than me may say that you could allocate just half the memory to each task, but 2Gig of memory is already pretty low for tensorflow training. Typically cards with much more memeory (6 gig +) is recommended for that task.
In closing, find out what is consuming all of your Video cards memory and end that task. I believe it will resolve your problem.
ling
Updated on June 11, 2022Comments
-
ling almost 2 years
My code works fine when running in iPython terminal, but failed with out of memory error, as below.
/home/abigail/anaconda3/envs/tf_gpuenv/bin/python -Xms1280m -Xmx4g /home/abigail/PycharmProjects/MLNN/src/test.py Using TensorFlow backend. Epoch 1/150 2019-01-19 22:12:39.539156: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-01-19 22:12:39.588899: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-01-19 22:12:39.589541: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GTX 750 Ti major: 5 minor: 0 memoryClockRate(GHz): 1.0845 pciBusID: 0000:01:00.0 totalMemory: 1.95GiB freeMemory: 59.69MiB 2019-01-19 22:12:39.589552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 Traceback (most recent call last): File "/home/abigail/PycharmProjects/MLNN/src/test.py", line 20, in <module> model.fit(X, Y, epochs=150, batch_size=10) File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/engine/training.py", line 1039, in fit validation_steps=validation_steps) File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/engine/training_arrays.py", line 199, in fit_loop outs = f(ins_batch) File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2697, in __call__ if hasattr(get_session(), '_make_callable_from_options'): File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 186, in get_session _SESSION = tf.Session(config=config) File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in __init__ super(Session, self).__init__(target, graph, config=config) File "/home/abigail/anaconda3/envs/tf_gpuenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in __init__ self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts) tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: out of memory Process finished with exit code 1
In PyCharm, I first edited the "Help->Edit Custom VM options":
-Xms1280m -Xmx4g
This doesn't fix the issue. Then I edited "Run->Edit Configurations->Interpreter options":
-Xms1280m -Xmx4g
It still gives the same error. My desktop Linux has enough memory (64G). How to fix this issue?
BTW, in PyCharm if I don't use GPU, it doesn't give the error.
EDIT:
In [5]: exit (tf_gpuenv) abigail@abigail-XPS-8910:~/nlp/MLMastery/DLwithPython/code/chapter_07$ nvidia-smi Sun Jan 20 00:41:49 2019 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 415.25 Driver Version: 415.25 CUDA Version: 10.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 750 Ti Off | 00000000:01:00.0 On | N/A | | 38% 54C P0 2W / 38W | 1707MiB / 1993MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 770 G /usr/bin/akonadi_archivemail_agent 2MiB | | 0 772 G /usr/bin/akonadi_sendlater_agent 2MiB | | 0 774 G /usr/bin/akonadi_mailfilter_agent 2MiB | | 0 1088 G /usr/lib/xorg/Xorg 166MiB | | 0 1440 G kwin_x11 60MiB | | 0 1446 G /usr/bin/krunner 1MiB | | 0 1449 G /usr/bin/plasmashell 60MiB | | 0 1665 G ...quest-channel-token=3687002912233960986 137MiB | | 0 20728 C ...ail/anaconda3/envs/tf_gpuenv/bin/python 1255MiB | +-----------------------------------------------------------------------------+
-
ling over 5 yearsI am running on Kubuntu, see my edition above. Why does only 1 process use the GPU memory in your case, but in my case quite a few processes are using the GPU memory? In TensorFlow GPU, models only use GPU's memory? What about other RAM? My computer has a total of 64G RAM. Are they almost not used?
-
IamSierraCharlie over 5 yearsYes, in Tensorflow-GPU, models only use GPU memory. If you want to use your desktop memory, then you would not use GPU support for the model. Keep in mind that the CPU will be the bottleneck if you do your training on the CPU, not the memory. The main reason for using your GPU is because it is much faster than CPU's in most situations. In my case, I run Ubuntu server 16.04 and I have no graphical interface so there is only 1 process running on the GPU which is a Tensorflow program I am developing.
-
IamSierraCharlie over 5 yearsI see in your update that you have one PID 20728 using most of your GPU's resources. This is your program already running in the background. If you were to end that program, then resources would be available for the one you are trying to run. In your case, if you are trying to run 2 training scripts on the same GPU, this is not going to be possible with the current memory allocation.
-
ling over 5 yearsThe PID 20728 is the process I am running. Somehow it has not out of memory anymore, but very very slow. I am testing CPU to train the same model. It may be much faster in this case.
-
ling over 5 yearsYou have a only 4G gpu, how fast is it in your case? Did you compare it with CPU training?
-
IamSierraCharlie over 5 yearsIs was a GTX970, it was okay for training - still faster than my 4Ghz CPU, 2 of them was better again. Now, I keep that one GTX970GPU for training & testing of models that I don't care as much about. When I have something I really need to get done, I use my main PC with twin 1080ti cards. They have 11gig each. Its important to note that you can also run out of memory if your config is wrong. Say you double your memory, then you can train larger of batches of data because more can be loaded into memory. If you get an out of memory error, reducing the batch size can help to resolve that.
-
IamSierraCharlie over 5 yearsLet us continue this discussion in chat.
-
forgetso over 3 yearsGreat answer. nvidia-smi showed me that I had consumed the entire GPU memory in a running Jupyter notebook.