How to fix "ResourceExhaustedError: OOM when allocating tensor"

37,475

Solution 1

OOM stands for "out of memory". Your GPU is running out of memory, so it can't allocate memory for this tensor. There are a few things you can do:

  • Decrease the number of filters in your Dense, Conv2D layers
  • Use a smaller batch_size (or increase steps_per_epoch and validation_steps)
  • Use grayscale images (you can use tf.image.rgb_to_grayscale)
  • Reduce the number of layers
  • Use MaxPooling2D layers after convolutional layers
  • Reduce the size of your images (you can use tf.image.resize for that)
  • Use smaller float precision for your input, namely np.float32
  • If you're using a pre-trained model, freeze the first layers (like this)

There is more useful information about this error:

OOM when allocating tensor with shape[800000,32,30,62]

This is a weird shape. If you're working with images, you should normally have 3 or 1 channel. On top of that, it seems like you are passing your entire dataset at once; you should instead pass it in batches.

Solution 2

From [800000,32,30,62] it seems your model put all the data in one batch.

Try specified batch size like

history = model.fit([trainimage, train_product_embd],train_label, validation_data=([validimage,valid_product_embd],valid_label), epochs=10, steps_per_epoch=100, validation_steps=10, batch_size=32)

If it still OOM then try reduce the batch_size

Share:
37,475
Admin
Author by

Admin

Updated on July 09, 2022

Comments

  • Admin
    Admin almost 2 years

    I wanna make a model with multiple inputs. So, I try to build a model like this.

    # define two sets of inputs
    inputA = Input(shape=(32,64,1))
    inputB = Input(shape=(32,1024))
     
    # CNN
    x = layers.Conv2D(32, kernel_size = (3, 3), activation = 'relu')(inputA)
    x = layers.Conv2D(32, (3,3), activation='relu')(x)
    x = layers.MaxPooling2D(pool_size=(2,2))(x)
    x = layers.Dropout(0.2)(x)
    x = layers.Flatten()(x)
    x = layers.Dense(500, activation = 'relu')(x)
    x = layers.Dropout(0.5)(x)
    x = layers.Dense(500, activation='relu')(x)
    x = Model(inputs=inputA, outputs=x)
     
    # DNN
    y = layers.Flatten()(inputB)
    y = Dense(64, activation="relu")(y)
    y = Dense(250, activation="relu")(y)
    y = Dense(500, activation="relu")(y)
    y = Model(inputs=inputB, outputs=y)
     
    # Combine the output of the two models
    combined = concatenate([x.output, y.output])
     
    
    # combined outputs
    z = Dense(300, activation="relu")(combined)
    z = Dense(100, activation="relu")(combined)
    z = Dense(1, activation="softmax")(combined)
    
    model = Model(inputs=[x.input, y.input], outputs=z)
    
    model.summary()
    
    opt = Adam(lr=1e-3, decay=1e-3 / 200)
    model.compile(loss = 'sparse_categorical_crossentropy', optimizer = opt,
        metrics = ['accuracy'])
    

    and the summary : _

    But, when i try to train this model,

    history = model.fit([trainimage, train_product_embd],train_label,
        validation_data=([validimage,valid_product_embd],valid_label), epochs=10, 
        steps_per_epoch=100, validation_steps=10)
    

    the problem happens.... :

     ResourceExhaustedError                    Traceback (most recent call
     last) <ipython-input-18-2b79f16d63c0> in <module>()
     ----> 1 history = model.fit([trainimage, train_product_embd],train_label,
     validation_data=([validimage,valid_product_embd],valid_label),
     epochs=10, steps_per_epoch=100, validation_steps=10)
    
     4 frames
     /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py
     in __call__(self, *args, **kwargs)    1470         ret =
     tf_session.TF_SessionRunCallable(self._session._session,    1471      
     self._handle, args,
     -> 1472                                                run_metadata_ptr)    1473         if run_metadata:    1474          
     proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
     
     ResourceExhaustedError: 2 root error(s) found.   (0) Resource
     exhausted: OOM when allocating tensor with shape[800000,32,30,62] and
     type float on /job:localhost/replica:0/task:0/device:GPU:0 by
     allocator GPU_0_bfc     [[{{node conv2d_1/convolution}}]] Hint: If you
     want to see a list of allocated tensors when OOM happens, add
     report_tensor_allocations_upon_oom to RunOptions for current
     allocation info.
     
         [[metrics/acc/Mean_1/_185]] Hint: If you want to see a list of
     allocated tensors when OOM happens, add
     report_tensor_allocations_upon_oom to RunOptions for current
     allocation info.
     
       (1) Resource exhausted: OOM when allocating tensor with
     shape[800000,32,30,62] and type float on
     /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc    
     [[{{node conv2d_1/convolution}}]] Hint: If you want to see a list of
     allocated tensors when OOM happens, add
     report_tensor_allocations_upon_oom to RunOptions for current
     allocation info.
     
     0 successful operations. 0 derived errors ignored.
    

    Thanks for reading and hopefully helping me :)