Tensorflow Allocation Memory: Allocation of 38535168 exceeds 10% of system memory

python tensorflow memory keras-layer resnet

92,663

Solution 1

Try reducing batch_size attribute to a small number(like 1,2 or 3). Example:

train_generator = data_generator.flow_from_directory(
    'path_to_the_training_set',
    target_size = (IMG_SIZE,IMG_SIZE),
    batch_size = 2,
    class_mode = 'categorical'
    )

Solution 2

I was having the same problem while running Tensorflow container with Docker and Jupyter notebook. I was able to fix this problem by increasing the container memory.

On Mac OS, you can easily do this from:

       Docker Icon > Preferences >  Advanced > Memory

Drag the scrollbar to maximum (e.g. 4GB). Apply and it will restart the Docker engine.

Now run your tensor flow container again.

It was handy to use the docker stats command in a separate terminal It shows the container memory usage in realtime, and you can see how much memory consumption is growing:

CONTAINER ID   NAME   CPU %   MEM USAGE / LIMIT     MEM %    NET I/O             BLOCK I/O           PIDS
3170c0b402cc   mytf   0.04%   588.6MiB / 3.855GiB   14.91%   13.1MB / 3.06MB     214MB / 3.13MB      21

Solution 3

Alternatively, you can set the environment variable TF_CPP_MIN_LOG_LEVEL=2 to filter out info and warning messages. I found that on this github issue where they complain about the same output. To do so within python, you can use the solution from here:

import os
import tensorflow as tf
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

You can even turn it on and off at will with this. I test for the maximum possible batch size before running my code, and I can disable warnings and errors while doing this.

Solution 4

I was running a small model on a CPU and had the same issue. Adding:os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' resolved it.

Solution 5

I was having the same problem, and i concluded that there are two factors to be considered when see this error: 1- batch_size ==> because this responsible for the data size to be processed for each epoch 2- image_size ==> the higher image dimensions (image size), more data to be processed

So for these two factors, the RAM cannot handle all of required data.

To solve the problem I tried two cases: The first change batch_size form 32 to 3 or 2 The second reduce image_size from (608,608) to (416,416)

View more solutions

92,663

Madhi

Data Scientist, Machine Learning Engineer

Updated on March 05, 2022

Comments

Madhi about 2 years
Using ResNet50 pre-trained Weights I am trying to build a classifier. The code base is fully implemented in Keras high-level Tensorflow API. The complete code is posted in the below GitHub Link.

Source Code: Classification Using RestNet50 Architecture

The file size of the pre-trained model is 94.7mb.

I loaded the pre-trained file
```
new_model = Sequential()

new_model.add(ResNet50(include_top=False,
                pooling='avg',
                weights=resnet_weight_paths))
```
and fit the model
```
train_generator = data_generator.flow_from_directory(
    'path_to_the_training_set',
    target_size = (IMG_SIZE,IMG_SIZE),
    batch_size = 12,
    class_mode = 'categorical'
    )

validation_generator = data_generator.flow_from_directory(
    'path_to_the_validation_set',
    target_size = (IMG_SIZE,IMG_SIZE),
    class_mode = 'categorical'
    )

#compile the model

new_model.fit_generator(
    train_generator,
    steps_per_epoch = 3,
    validation_data = validation_generator,
    validation_steps = 1
)
```
and in the Training dataset, I have two folders dog and cat, each holder almost 10,000 images. When I compiled the script, I get the following error

Epoch 1/1 2018-05-12 13:04:45.847298: W tensorflow/core/framework/allocator.cc:101] Allocation of 38535168 exceeds 10% of system memory. 2018-05-12 13:04:46.845021: W tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 exceeds 10% of system memory. 2018-05-12 13:04:47.552176: W tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 exceeds 10% of system memory. 2018-05-12 13:04:48.199240: W tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 exceeds 10% of system memory. 2018-05-12 13:04:48.918930: W tensorflow/core/framework/allocator.cc:101] Allocation of 37171200 exceeds 10% of system memory. 2018-05-12 13:04:49.274137: W tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 exceeds 10% of system memory. 2018-05-12 13:04:49.647061: W tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 exceeds 10% of system memory. 2018-05-12 13:04:50.028839: W tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 exceeds 10% of system memory. 2018-05-12 13:04:50.413735: W tensorflow/core/framework/allocator.cc:101] Allocation of 19267584 exceeds 10% of system memory.

Any ideas to optimize the way to load the pre-trained model (or) get rid of this warning message?

Thanks!
- Allen Lavoie about 6 years
  
  To clarify, does the model run after these messages?
- Madhi about 6 years
  
  Yes it run.....
- Allen Lavoie about 6 years
  
  In that case, take a look at stackoverflow.com/a/42121886/6824418 ? Unless there's some other reason you need to reduce the memory usage.
Omrii about 3 years

How come reducing the batch_size fixes this? (Intuitively it'd mean the training/validation data takes up less memory but I don't see how!)
VMMF almost 3 years

With this the message is not show, but the problem persists
VMMF almost 3 years

With this the message is not show, but the problem persists
Poik almost 3 years

@VMMF In my case and the case of the original question, this was not a problem. It was only using over 10%, and I explicitly test for maximal utilization, so of course I'll exceed that. If you are running into a utilization problem that is actually a problem, then you're likely going to need to do more work. Decrease the batch size. Decrease the image size or network size, which is likely undoable in pretrained models such as in the question. If you are continuing to have issues, I recommend opening a question more in line with the problem you are having, and we will give it our best to help.
Poik almost 3 years

@Omrii It's a matter of caching. Most data loaders do not keep all of the data in memory at all times, and nearly all do not keep significant data in the GPU memory. By decreasing the batch_size, you decrease the amount of data needed at any step of the training, allowing for less to be loaded at a time and still maintain optimal throughput. So no, the data is not smaller, but yes, it does take up less RAM and video RAM. Tensorflow is being clever here to save you headache.