Possible explanations for loss increasing?

30,013

Solution 1

I would normally say your learning rate it too high however it looks like you have ruled that out. You should check the magnitude of the numbers coming into and out of the layers. You can use tf.Print to do so. Maybe you are somehow inputting a black image by accident or you can find the layer where the numbers go crazy.

Also how are you calculating the cross entropy? You might want to add a small epsilon inside of the log since it's value will go to infinity as its input approaches zero. Or better yet use the tf.nn.sparse_softmax_cross_entropy_with_logits(...) function which takes care of numerical stability for you.

Since the cost is so high for your crossentropy it sounds like the network is outputting almost all zeros (or values close to zero). Since you did not post any code I can not say why. I think you may just be zeroing something out in the cost function calculation by accident.

Solution 2

I was also facing the problem ,I was using keras library (tensorflow backend)

    Epoch 00034: saving model to /home/ubuntu/temp/trained_data1/final_dev/final_weights-improvement-34-0.627.hdf50
Epoch 35/150
226160/226160 [==============================] - 65s 287us/step - loss: 0.2870 - acc: 0.9331 - val_loss: 2.7904 - val_acc: 0.6193
Epoch 36/150
226160/226160 [==============================] - 65s 288us/step - loss: 0.2813 - acc: 0.9331 - val_loss: 2.7907 - val_acc: 0.6268

Epoch 00036: saving model to /home/ubuntu/temp/trained_data1/final_dev/final_weights-improvement-36-0.627.hdf50
Epoch 37/150
226160/226160 [==============================] - 65s 286us/step - loss: 0.2910 - acc: 0.9330 - val_loss: 2.5704 - val_acc: 0.6327
Epoch 38/150
226160/226160 [==============================] - 65s 287us/step - loss: 0.2982 - acc: 0.9321 - val_loss: 2.5147 - val_acc: 0.6415

Epoch 00038: saving model to /home/ubuntu/temp/trained_data1/final_dev/final_weights-improvement-38-0.642.hdf50
Epoch 39/150
226160/226160 [==============================] - 68s 301us/step - loss: 0.2968 - acc: 0.9318 - val_loss: 2.7375 - val_acc: 0.6409
Epoch 40/150
226160/226160 [==============================] - 68s 299us/step - loss: 0.3124 - acc: 0.9298 - val_loss: 2.8359 - val_acc: 0.6047

Epoch 00040: saving model to /home/ubuntu/temp/trained_data1/final_dev/final_weights-improvement-40-0.605.hdf50
Epoch 41/150
226160/226160 [==============================] - 65s 287us/step - loss: 0.2945 - acc: 0.9315 - val_loss: 3.5825 - val_acc: 0.5321
Epoch 42/150
226160/226160 [==============================] - 65s 287us/step - loss: 0.3214 - acc: 0.9278 - val_loss: 2.5816 - val_acc: 0.6444

When i saw my model ,the model was consisting of too many neurons , In short the model was overfitting. I decreased the no of neurons in 2 dense layers (from 300 neurons to 200 neurons)

Share:
30,013
JohnAllen
Author by

JohnAllen

Updated on May 17, 2020

Comments

  • JohnAllen
    JohnAllen almost 4 years

    I've got a 40k image dataset of images from four different countries. The images contain diverse subjects: outdoor scenes, city scenes, menus, etc. I wanted to use deep learning to geotag images.

    I started with a small network of 3 conv->relu->pool layers and then added 3 more to deepen the network since the learning task is not straightforward.

    My loss is doing this (with both the 3 and 6 layer networks):enter image description here:

    The loss actually starts kind of smooth and declines for a few hundred steps, but then starts creeping up.

    What are the possible explanations for my loss increasing like this?

    My initial learning rate is set very low: 1e-6, but I've tried 1e-3|4|5 as well. I have sanity-checked the network design on a tiny-dataset of two classes with class-distinct subject matter and the loss continually declines as desired. Train accuracy hovers at ~40%

  • JohnAllen
    JohnAllen over 7 years
    I am using tf.nn.sparse_softmax_cross_entropy_with_logits(...) FYI.
  • chasep255
    chasep255 over 7 years
    I figured you might be. Otherwise the cost would have gone to infinity and you would get a nan. The output is definitely going all zero for some reason. Maybe try using the elu activation instead of relu since these do not die at zero.
  • JohnAllen
    JohnAllen over 7 years
    I'm using tf.nn.elus in my conv layers and relu in the final hidden layer: hidden = tf.nn.relu(tf.matmul(reshape, fc1_weights) + fc1_biases). Perhaps I can try a elu there too.
  • chasep255
    chasep255 over 7 years
    You don't need an activation in the final layer since the softmax function is an activation. Also make sure your weights are initialized with both positive and negative values.
  • etarion
    etarion over 7 years
    It's even a bit stronger - you absolutely do not want relus in the final layer, you want the logits to be able to go to arbitrary negative values.
  • AveryLiu
    AveryLiu about 3 years
    Overfitting does not make the training loss increase, rather, it refers to the situation where training loss decreases to a small value while the validation loss remains high.