Keras: Training loss decrases (accuracy increase) while validation loss increases (accuracy decrease)

python deep-learning keras classification

12,654

Solution 1

What I can think of by analyzing your metric outputs (from the link you provided):

Seems to me that approximately near epoch 30 your model is starting to overfit. Therefore you can try stopping your training in that iteration, or well just train it for ~30 epochs (or the exact number). The Keras Callbacks may be useful here, specially the ModelCheckpoint to enable you to stop your training when desired (Ctrl +C) or when certain criteria is met. Here is an example of basic ModelCheckpoint use:

#save best True saves only if the metric improves
chk = ModelCheckpoint("myModel.h5", monitor='val_loss', save_best_only=False) 
callbacks_list = [chk]
#pass callback on fit
history = model.fit(X, Y, ... , callbacks=callbacks_list)

(Edit:) As suggested in comments, another option you have available is to use the EarlyStopping callback, where you can specify the minimum change tolerated and the 'patience' or epochs without such improvement before stopping the training. If using this, you have to pass it to the callbacks argument as explained before.

At the current setup you model has (and with the modifications you have tried) that point in your training seems to be the optimal training time for your case; training it further will bring no benefits to your model (in fact, will make it generalize worse).

Given you have tried several modifications, one thing you can do is to try to increase your Network Depth, to give it more capacity. Try adding more layers, one at a time, and check for improvements. Also, you usually you want to start with simpler models first, before attempting a multi-layer solution.

If a simple model doesn't work, add one layer and test again, repeating until satisfied or possible. And by simple I mean really simple, have you tried a non-convolutional approach? Although CNN are great for images, maybe you are overkilling it here.

If nothing seems to work, maybe it is time to get more data, or to generate more data from the one you have by sampling or other techniques. For that last suggestion, try checking this keras blog I have found really useful. Deep learning algorithms usually require substantial amount of training data, specially for complex models, like images, so be aware this may not be an easy task. Hope this helps.

Solution 2

IMHO, this is just normal situation for DL. In Keras you can setup a callback that will save the best model (depending on evaluation metric that you provide), and callback that will stop training if model isn't improving.

See ModelCheckpoint & EarlyStopping callbacks respectively.

P.S. Sorry, maybe I misunderstood question - do you have validation loss decreasing form first step?

Solution 3

Validation loss is increasing. This means you need more data, or more regularization. Standard situation here, and nothing to be worried about. By the way, more parameters (bigger model) is just going to worsen this problem unless you fix it.

So you can now investigate profitably by introducing more examples, L2, L1, or dropout.

Solution 4

I faced a similar problem and managed to fix it by removing the Batch Normalisation layer that's just before the output dense layer. This made a ton of difference. Also one of the suggestions I was given is to remove the Dropout layer as it might be causing Shift Variance. Check this paper

I got part of the solution from this thread.

View more solutions

12,654

Author by

Jesper

Updated on July 08, 2022

Comments

Jesper almost 2 years

I am working on a very sparse dataset with the point of predicting 6 classes. I have tried working with a lot of models and architectures, but the problem remains the same.

When I start training, the acc for training will slowly start to increase and loss will decrease where as the validation will do the exact opposite.

I have really tried to deal with overfitting, and I simply cannot still believe that this is what is coursing this issue.

What have I tried

Transfer learning on VGG16:

exclude top layer and add dense layer with 256 units and 6 units softmax output layer
finetune the top CNN block
finetune the top 3-4 CNN blocks

To deal with overfitting I use heavy augmentation in Keras and dropout after the 256 dense layer with p=0.5.

Creating own CNN with VGG16-ish architecture:

including batch normalization wherever possible
L2 regularization on each CNN+dense layer
Dropout from anywhere between 0.5-0.8 after each CNN+dense+pooling layer
Heavy data augmentation in "on the fly" in Keras

Realising that perhaps I have too many free parameters:

decreasing the network to only contain 2 CNN blocks + dense + output.
dealing with overfitting in the same manner as above.

Without exception all training sessions are looking like this: Training & Validation loss+accuracy

The last mentioned architecture looks like this:

    reg = 0.0001

    model = Sequential()

    model.add(Conv2D(8, (3, 3), input_shape=input_shape, padding='same',
            kernel_regularizer=regularizers.l2(reg)))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    model.add(Dropout(0.7))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.5))

    model.add(Conv2D(16, (3, 3), input_shape=input_shape, padding='same',
            kernel_regularizer=regularizers.l2(reg)))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    model.add(Dropout(0.7))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.5))

    model.add(Flatten())
    model.add(Dense(16, kernel_regularizer=regularizers.l2(reg)))
    model.add(BatchNormalization())
    model.add(Activation('relu'))
    model.add(Dropout(0.5))

    model.add(Dense(6))
    model.add(Activation('softmax'))

    model.compile(loss='categorical_crossentropy', optimizer='SGD',metrics=['accuracy'])

And the data is augmented by the generator in Keras and is loaded with flow_from_directory:

    train_datagen = ImageDataGenerator(rotation_range=10,
                                width_shift_range=0.05,
                                height_shift_range=0.05,
                                shear_range=0.05,
                                zoom_range=0.05,
                                rescale=1/255.,
                                fill_mode='nearest',
                                channel_shift_range=0.2*255)
    train_generator = train_datagen.flow_from_directory(
                train_data_dir,
                target_size=(img_width, img_height),
                batch_size=batch_size,
                shuffle = True,
                class_mode='categorical')

    validation_datagen = ImageDataGenerator(rescale=1/255.)
    validation_generator = validation_datagen.flow_from_directory(
                                            validation_data_dir,
                                            target_size=(img_width, img_height),
                                            batch_size=1,
                                            shuffle = True,
                                            class_mode='categorical')

Jesper over 6 years

As shown in plot (link should be available in the post) the loss slightly decreases at the very beginning and then starts increasing. The accuracy for the validation does not change much overall. Using the weights from the first few epochs would not make much sense here, as the network would not have learning sufficient enough.
DarkCygnus over 6 years

@AlexOtt thanks for the suggestion, editing the answer to include such option :)
Jesper over 6 years

I'll mark this as answered - thank you for your good advises both @AlexOtt and you. I have tried what you suggested and the trend does not change. The train loss will decrease and the val loss will increase. I get a max accuracy on the val set of some 45-ish%.
DarkCygnus over 6 years

@Jesper Did you tried all what I suggested (more data, depth,..)? Another thing that could be acting strangely is your data augmentation. Could probably be saturating your performance up to a point where augmentation brings no further benefit (what if you do it without augmentation? thats what I meant when suggesting to get more data, organic samples and not artificial ones). You can ping me if you want for any further discussion if you like. Cheers
Jesper over 6 years

Yes, I did try to vary the network size. Both to very simple and deeper models. More details about the project follows below: For starters the training data is images like this one, obtained in different scenery with different ligthing conditions, etc: imgur.com/mmlNqEi After training here, an attentionmap revieled that almost all attention was given to the background. To overcome this, all train images were cropped to only fit the wrenches, like this: imgur.com/upp51pA Now, it's better, but the attentionmap still reviels some problem w.r.t. focusing on the wrenches themself
Jesper over 6 years

The idea is to predict the location of wrench of size 19. I.e. when it is located on the seconds position from the left, the model should output [0,1,0,0,0,0].
bit_scientist over 4 years

@Jesper maybe you should try playing with adam and/or sgd with lr > 0.01 (which is default). Or if you did already, how was the performance?