Keras EarlyStopping: Which min_delta and patience to use?

17,474

Solution 1

The role of two parameters is clear from keras documentation.

min_delta : minimum change in the monitored quantity to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.

patience : number of epochs with no improvement after which training will be stopped.

Actually there is no standard value for these parameters. You need to analyse the participants(dataset,environment,model-type) of the training process to decide their values.

(1). patience

  • Dataset - If the dataset has not so good variation for different categories.(example - faces of person of age group 25-30 & 30-35). The change in loss would be slow and also random. - In such cases it is good to have higher value for patience. And vice-versa for a good & clear dataset.
  • Model-Type - When training a GAN model, the accuracy change would be low(maximum cases) and an epoch run will consume good amount of GPU. In such cases its better to save checkpoint files after specific number of epochs with a low value of patience. And then use checkpoints to further improve as required. Analyse similarly for other model types.
  • Runtime Environment - When training on a CPU, an epoch run would be time consuming. So, we prefer a smaller value for patience. And may try larger value with GPU.

(2). min_delta

  • To decide min_delta, run a few epochs and see the change in error & validation accuracy. Depending on the rate of change, it should be defined. The default value 0 works pretty well in many cases.

Solution 2

Your parameters are valid first choices.

However, as pointed out by Akash, this is dependent on the dataset and on how you split your data, e.g. your cross-validation scheme. You might want to observe the behavior of your validation error for your model first and then choose these parameters accordingly.

Regarding min_delta: I've found that 0 or a choice of << 1 like yours works quite well a lot of times. Again, look at how wildly your error changes first.

Regarding patience: if you set it to n, you well get the model n epochs after the best model. Common choices lie between 0 and 10, but again, this will depend on your dataset and especially variability within the dataset.

Finally, EarlyStopping is behaving properly in the example you gave. The optimum that eventually triggered early stopping is found in epoch 4: val_loss: 0.0011. After that, the training finds 5 more validation losses that all lie above or are equal to that optimum and finally terminates 5 epochs later.

Share:
17,474
Nyxynyx
Author by

Nyxynyx

Hello :) I have no formal education in programming :( And I need your help! :D These days its web development: Node.js Meteor.js Python PHP Laravel Javascript / jQuery d3.js MySQL PostgreSQL MongoDB PostGIS

Updated on August 12, 2022

Comments

  • Nyxynyx
    Nyxynyx over 1 year

    I am new to deep learning and Keras and one of the improvement I try to make to my model training process is to make use of Keras's keras.callbacks.EarlyStopping callback function.

    Based on the output from training my model, does it seem reasonable to use the following parameters for EarlyStopping?

    EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=5, verbose=0, mode='auto')
    

    Also, why does it appear to be stopped sooner than it should if it was to wait for 5 consecutive epochs where the difference in val_loss is lesser than a min_delta of 0.0001?

    Output while training LSTM model (without EarlyStop)

    Runs all 100 epochs

    Epoch 1/100
    10200/10200 [==============================] - 133s 12ms/step - loss: 1.1236 - val_loss: 0.6431
    Epoch 2/100
    10200/10200 [==============================] - 141s 13ms/step - loss: 0.2783 - val_loss: 0.0301
    Epoch 3/100
    10200/10200 [==============================] - 143s 13ms/step - loss: 0.1131 - val_loss: 0.1716
    Epoch 4/100
    10200/10200 [==============================] - 145s 13ms/step - loss: 0.0586 - val_loss: 0.3671
    Epoch 5/100
    10200/10200 [==============================] - 146s 13ms/step - loss: 0.0785 - val_loss: 0.0038
    Epoch 6/100
    10200/10200 [==============================] - 146s 13ms/step - loss: 0.0549 - val_loss: 0.0041
    Epoch 7/100
    10200/10200 [==============================] - 147s 13ms/step - loss: 4.7482e-04 - val_loss: 8.9437e-05
    Epoch 8/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 1.5181e-05 - val_loss: 4.7367e-06
    Epoch 9/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 9.1632e-07 - val_loss: 3.6576e-07
    Epoch 10/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 1.4117e-07 - val_loss: 1.6058e-07
    Epoch 11/100
    10200/10200 [==============================] - 152s 14ms/step - loss: 1.2024e-07 - val_loss: 1.2804e-07
    Epoch 12/100
    10200/10200 [==============================] - 150s 14ms/step - loss: 0.0151 - val_loss: 0.4181
    Epoch 13/100
    10200/10200 [==============================] - 148s 14ms/step - loss: 0.0701 - val_loss: 0.0057
    Epoch 14/100
    10200/10200 [==============================] - 148s 14ms/step - loss: 0.0332 - val_loss: 5.0014e-04
    Epoch 15/100
    10200/10200 [==============================] - 147s 14ms/step - loss: 0.0367 - val_loss: 0.0020
    Epoch 16/100
    10200/10200 [==============================] - 151s 14ms/step - loss: 0.0040 - val_loss: 0.0739
    Epoch 17/100
    10200/10200 [==============================] - 148s 14ms/step - loss: 0.0282 - val_loss: 6.4996e-05
    Epoch 18/100
    10200/10200 [==============================] - 147s 13ms/step - loss: 0.0346 - val_loss: 1.6545e-04
    Epoch 19/100
    10200/10200 [==============================] - 147s 14ms/step - loss: 4.6678e-05 - val_loss: 6.8101e-06
    Epoch 20/100
    10200/10200 [==============================] - 148s 14ms/step - loss: 1.7270e-06 - val_loss: 6.7108e-07
    Epoch 21/100
    10200/10200 [==============================] - 147s 14ms/step - loss: 2.4334e-07 - val_loss: 1.5736e-07
    Epoch 22/100
    10200/10200 [==============================] - 147s 14ms/step - loss: 0.0416 - val_loss: 0.0547
    Epoch 23/100
    10200/10200 [==============================] - 148s 14ms/step - loss: 0.0413 - val_loss: 0.0145
    Epoch 24/100
    10200/10200 [==============================] - 148s 14ms/step - loss: 0.0045 - val_loss: 1.1096e-04
    Epoch 25/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 0.0218 - val_loss: 0.0083
    Epoch 26/100
    10200/10200 [==============================] - 148s 14ms/step - loss: 0.0029 - val_loss: 5.0954e-05
    Epoch 27/100
    10200/10200 [==============================] - 148s 14ms/step - loss: 0.0316 - val_loss: 0.0035
    Epoch 28/100
    10200/10200 [==============================] - 148s 14ms/step - loss: 0.0032 - val_loss: 0.2343
    Epoch 29/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 0.0299 - val_loss: 0.0021
    Epoch 30/100
    10200/10200 [==============================] - 150s 14ms/step - loss: 0.0171 - val_loss: 9.3622e-04
    Epoch 31/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 0.0167 - val_loss: 0.0023
    Epoch 32/100
    10200/10200 [==============================] - 148s 14ms/step - loss: 7.3654e-04 - val_loss: 4.1998e-05
    Epoch 33/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 7.3300e-06 - val_loss: 1.9043e-06
    Epoch 34/100
    10200/10200 [==============================] - 148s 14ms/step - loss: 6.6648e-07 - val_loss: 2.3814e-07
    Epoch 35/100
    10200/10200 [==============================] - 147s 14ms/step - loss: 1.5611e-07 - val_loss: 1.3155e-07
    Epoch 36/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 1.2159e-07 - val_loss: 1.2398e-07
    Epoch 37/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 1.1940e-07 - val_loss: 1.1977e-07
    Epoch 38/100
    10200/10200 [==============================] - 150s 14ms/step - loss: 1.1939e-07 - val_loss: 1.1935e-07
    Epoch 39/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1935e-07
    Epoch 40/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1935e-07
    Epoch 41/100
    10200/10200 [==============================] - 150s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
    Epoch 42/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
    Epoch 43/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
    Epoch 44/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
    Epoch 45/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
    Epoch 46/100
    10200/10200 [==============================] - 151s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
    Epoch 47/100
    10200/10200 [==============================] - 151s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
    Epoch 48/100
    10200/10200 [==============================] - 151s 14ms/step - loss: 1.1921e-07 - val_loss: 1.1921e-07
    

    Output with EarlyStop

    Stops (too early?) after 11 epoches

    10200/10200 [==============================] - 134s 12ms/step - loss: 1.2733 - val_loss: 0.9022
    Epoch 2/100
    10200/10200 [==============================] - 144s 13ms/step - loss: 0.5429 - val_loss: 0.4093
    Epoch 3/100
    10200/10200 [==============================] - 144s 13ms/step - loss: 0.1644 - val_loss: 0.0552
    Epoch 4/100
    10200/10200 [==============================] - 144s 13ms/step - loss: 0.0263 - val_loss: 0.9872
    Epoch 5/100
    10200/10200 [==============================] - 145s 13ms/step - loss: 0.1297 - val_loss: 0.1175
    Epoch 6/100
    10200/10200 [==============================] - 146s 13ms/step - loss: 0.0287 - val_loss: 0.0136
    Epoch 7/100
    10200/10200 [==============================] - 145s 13ms/step - loss: 0.0718 - val_loss: 0.0270
    Epoch 8/100
    10200/10200 [==============================] - 145s 13ms/step - loss: 0.0272 - val_loss: 0.0530
    Epoch 9/100
    10200/10200 [==============================] - 150s 14ms/step - loss: 3.3879e-04 - val_loss: 0.0575
    Epoch 10/100
    10200/10200 [==============================] - 146s 13ms/step - loss: 1.6789e-05 - val_loss: 0.0766
    Epoch 11/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 1.4124e-06 - val_loss: 0.0981
    
    Training stops early here.
    

     EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='min')
    

    Tried setting min_delta to 0. Why is it stopping even though val_loss increased from 0.0011 to 0.1045?

    10200/10200 [==============================] - 140s 13ms/step - loss: 1.1938 - val_loss: 0.5941
    Epoch 2/100
    10200/10200 [==============================] - 150s 14ms/step - loss: 0.3307 - val_loss: 0.0989
    Epoch 3/100
    10200/10200 [==============================] - 151s 14ms/step - loss: 0.0946 - val_loss: 0.0213
    Epoch 4/100
    10200/10200 [==============================] - 149s 14ms/step - loss: 0.0521 - val_loss: 0.0011
    Epoch 5/100
    10200/10200 [==============================] - 150s 14ms/step - loss: 0.0793 - val_loss: 0.0313
    Epoch 6/100
    10200/10200 [==============================] - 154s 14ms/step - loss: 0.0367 - val_loss: 0.0369
    Epoch 7/100
    10200/10200 [==============================] - 154s 14ms/step - loss: 0.0323 - val_loss: 0.0014
    Epoch 8/100
    10200/10200 [==============================] - 153s 14ms/step - loss: 0.0408 - val_loss: 0.0011
    Epoch 9/100
    10200/10200 [==============================] - 154s 14ms/step - loss: 0.0379 - val_loss: 0.1045
    
    Training stops early here.
    
  • Nyxynyx
    Nyxynyx almost 6 years
    I'm still pretty confused even though I think I understand min_delta and patience parameters after your explanation. In my updated question, i set min_delta to 0 and patience to 5. Why is the training being stopped even though val_loss increases from 0.0011 to 0.1045 in the final 2 epoch?
  • Akash Goyal
    Akash Goyal almost 6 years
    Print val_accuracy in logs - It should not decrease. And val_loss should not increase
  • Nyxynyx
    Nyxynyx almost 6 years
    This might sound like a silly question, but how do you print val_accu and accu? model.fit currently gives only loss and val_loss
  • Akash Goyal
    Akash Goyal almost 6 years
    You can specify the required metrics with model.compile(). Something like : model.compile( . . . , metrics=['accuracy'])
  • Nyxynyx
    Nyxynyx almost 6 years
    Thanks! Does it then use the model at epoch 4 with the lowest val_loss? Or the one 5 epochs later?
  • Simon Batzner
    Simon Batzner almost 6 years
    @Nyxynyx It will use the model when training has stopped, i.e. the model 5 epochs later. You can, however, save the parameters of the model at every epoch using the Modelcheckpoint callback here. You can find an example here.
  • salouri
    salouri over 3 years
    what is considered "high"(or "small") value of patience??