Keras RNN loss does not decrease over epoch

10,948

Solution 1

Your RNN functions seems to be ok.

The speed of reduction in loss depends on optimizer and learning rate.

Any how you are using decay rate 0.9. try with bigger learning rate, any how it is going to decrease with 0.9 rate.

Try out other optimizers with different learning rates Other optimizers available with keras: https://keras.io/optimizers/

Many times, some optimizers work well on some data sets while some may fails.

Solution 2

Have you tried changing activation function from relu to softmax?

Relu activation has the tendency to diverge. However, if initializing the weight with eigenmatrix may result in a better convergence.

Solution 3

Since you are using RNNs for regression problem (not for classification), you should use 'linear' activation at the last layer.

In your code,

model.add(TimeDistributed(Dense(output_dim=1, activation='relu'))) # sequence labeling 

change to activation='linear' instead of 'relu'.

If it doesn't work, remove activation='relu' in second layer.

Also learning rate for rmsprop usually ranges from 0.1 to 0.0001.

Share:
10,948
Munichong
Author by

Munichong

NLP, Text mining, Machine Learning

Updated on June 27, 2022

Comments

  • Munichong
    Munichong almost 2 years

    I built a RNN using Keras. The RNN is used to solve a regression problem:

    def RNN_keras(feat_num, timestep_num=100):
        model = Sequential()
        model.add(BatchNormalization(input_shape=(timestep_num, feat_num)))
        model.add(LSTM(input_shape=(timestep_num, feat_num), output_dim=512, activation='relu', return_sequences=True))
        model.add(BatchNormalization())  
        model.add(LSTM(output_dim=128, activation='relu', return_sequences=True))
        model.add(BatchNormalization())
        model.add(TimeDistributed(Dense(output_dim=1, activation='relu'))) # sequence labeling
    
        rmsprop = RMSprop(lr=0.00001, rho=0.9, epsilon=1e-08)
        model.compile(loss='mean_squared_error',
                      optimizer=rmsprop,
                      metrics=['mean_squared_error'])
        return model
    

    The whole process looks fine. But the loss stays the exact same over epochs.

    61267 in the training set
    6808 in the test set
    
    Building training input vectors ...
    888 unique feature names
    The length of each vector will be 888
    Using TensorFlow backend.
    
    Build model...
    
    # Each batch has 1280 examples
    # The training data are shuffled at the beginning of each epoch.
    
    ****** Iterating over each batch of the training data ******
    Epoch 1/3 : Batch 1/48 | loss = 11011073.000000 | root_mean_squared_error = 3318.232910
    Epoch 1/3 : Batch 2/48 | loss = 620.271667 | root_mean_squared_error = 24.904161
    Epoch 1/3 : Batch 3/48 | loss = 620.068665 | root_mean_squared_error = 24.900017
    ......
    Epoch 1/3 : Batch 47/48 | loss = 618.046448 | root_mean_squared_error = 24.859678
    Epoch 1/3 : Batch 48/48 | loss = 652.977051 | root_mean_squared_error = 25.552946
    ****** Epoch 1: RMSD(training) = 24.897174 
    
    Epoch 2/3 : Batch 1/48 | loss = 607.372620 | root_mean_squared_error = 24.644049
    Epoch 2/3 : Batch 2/48 | loss = 599.667786 | root_mean_squared_error = 24.487448
    Epoch 2/3 : Batch 3/48 | loss = 621.368103 | root_mean_squared_error = 24.926300
    ......
    Epoch 2/3 : Batch 47/48 | loss = 620.133667 | root_mean_squared_error = 24.901398
    Epoch 2/3 : Batch 48/48 | loss = 639.971924 | root_mean_squared_error = 25.297264
    ****** Epoch 2: RMSD(training) = 24.897174 
    
    Epoch 3/3 : Batch 1/48 | loss = 651.519836 | root_mean_squared_error = 25.523636
    Epoch 3/3 : Batch 2/48 | loss = 673.582581 | root_mean_squared_error = 25.952084
    Epoch 3/3 : Batch 3/48 | loss = 613.930054 | root_mean_squared_error = 24.776562
    ......
    Epoch 3/3 : Batch 47/48 | loss = 624.460327 | root_mean_squared_error = 24.988203
    Epoch 3/3 : Batch 48/48 | loss = 629.544250 | root_mean_squared_error = 25.090448
    ****** Epoch 3: RMSD(training) = 24.897174 
    

    I do NOT think it is normal. Do I miss something?


    UPDATE: I find that all predictions are always zero after all epochs. This is the reason why all RMSDs are all the same because the predictions are all the same, i.e. 0. I checked the training y. It only contains just a few zeros. So it is not due to data imbalance.

    So now I am thinking if it is because of the layers and activation that I am using.