How to setup 1D-Convolution and LSTM in Keras

29,711

Solution 1

If you want to predict one value for each timestep, two slightly different solutions come to my mind:

1) Remove the MaxPooling1D layer, add the padding='same' argument to Conv1D layer and add return_sequence=True argument to LSTM so that the LSTM returns the output of each timestep:

from keras.layers import Input, Dense, LSTM, MaxPooling1D, Conv1D
from keras.models import Model

input_layer = Input(shape=(400, 16))
conv1 = Conv1D(filters=32,
               kernel_size=8,
               strides=1,
               activation='relu',
               padding='same')(input_layer)
lstm1 = LSTM(32, return_sequences=True)(conv1)
output_layer = Dense(1, activation='sigmoid')(lstm1)
model = Model(inputs=input_layer, outputs=output_layer)

model.summary()

The model summary would be:

Layer (type)                 Output Shape              Param #   
=================================================================
input_4 (InputLayer)         (None, 400, 16)           0         
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 400, 32)           4128      
_________________________________________________________________
lstm_4 (LSTM)                (None, 400, 32)           8320      
_________________________________________________________________
dense_4 (Dense)              (None, 400, 1)            33        
=================================================================
Total params: 12,481
Trainable params: 12,481
Non-trainable params: 0
_________________________________________________________________

2) Just change the number of units in the Dense layer to 400 and reshape y to (n_samples, n_timesteps):

from keras.layers import Input, Dense, LSTM, MaxPooling1D, Conv1D
from keras.models import Model

input_layer = Input(shape=(400, 16))
conv1 = Conv1D(filters=32,
               kernel_size=8,
               strides=1,
               activation='relu')(input_layer)
pool1 = MaxPooling1D(pool_size=4)(conv1)
lstm1 = LSTM(32)(pool1)
output_layer = Dense(400, activation='sigmoid')(lstm1)
model = Model(inputs=input_layer, outputs=output_layer)

model.summary()

The model summary would be:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_6 (InputLayer)         (None, 400, 16)           0         
_________________________________________________________________
conv1d_6 (Conv1D)            (None, 393, 32)           4128      
_________________________________________________________________
max_pooling1d_5 (MaxPooling1 (None, 98, 32)            0         
_________________________________________________________________
lstm_6 (LSTM)                (None, 32)                8320      
_________________________________________________________________
dense_6 (Dense)              (None, 400)               13200     
=================================================================
Total params: 25,648
Trainable params: 25,648
Non-trainable params: 0
_________________________________________________________________

Don't forget that in both cases you must use 'binary_crossentropy' (not 'categorical_crossentropy') as the loss function. I expect this solution to have a lower accuracy than the solution #1; but you must experiment with both and try to change the parameters since it entirely depends on the specific problem you are trying to solve and the nature of the data you have.


Update:

You asked for a convolution layer that only covers one timestep and k adjacent features. Yes, you can do it using a Conv2D layer:

# first add an axis to your data
X = np.expand_dims(X)   # now X has a shape of (n_samples, n_timesteps, n_feats, 1)

# adjust input layer shape ...
conv2 = Conv2D(n_filters, (1, k), ...)   # covers one timestep and k features
# adjust other layers according to the output of convolution layer...

Although I have no idea why you are doing this, to use the output of the convolution layer (which is (?, n_timesteps, n_features, n_filters), one solution is to use a LSTM layer which is wrapped inside a TimeDistributed layer. Or another solution is to flatten the last two axis.

Solution 2

The input and output shape are (476, 400, 16) and (476, 1) - which means that it is just outputting one value per full sequence.

Your LSTM is not returing sequences (return_sequences = False). But even if you do the Conv1D and MaxPooling before the LSTM will squeeze the input. So LSTM itself is going to get a sample of (98,32).

I assume you want one output for each input step.

Assuming that Conv1D and MaxPooling are relavent for the input data, you can try a seq to seq approach where you give the output of the first N/w to another network to get back 400 outputs.

I recommend you look at some models like encoder decoder seq2seq networks as below

https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html

https://machinelearningmastery.com/define-encoder-decoder-sequence-sequence-model-neural-machine-translation-keras/

Share:
29,711
Thuan N.
Author by

Thuan N.

Updated on November 02, 2021

Comments

  • Thuan N.
    Thuan N. over 2 years

    I would like to use 1D-Conv layer following by LSTM layer to classify a 16-channel 400-timestep signal.

    The input shape is composed of:

    • X = (n_samples, n_timesteps, n_features), where n_samples=476, n_timesteps=400, n_features=16 are the number of samples, timesteps, and features (or channels) of the signal.

    • y = (n_samples, n_timesteps, 1). Each timestep is labeled by either 0 or 1 (binary classification).

    I use the 1D-Conv to extract the temporal information, as shown in the figure below. F=32 and K=8 are the filters and kernel_size. 1D-MaxPooling is used after 1D-Conv. 32-unit LSTM is used for signal classification. The model should return a y_pred = (n_samples, n_timesteps, 1).

    enter image description here

    The code snippet is shown as follow:

    input_layer = Input(shape=(dataset.n_timestep, dataset.n_feature))
    conv1 = Conv1D(filters=32,
                   kernel_size=8,
                   strides=1,
                   activation='relu')(input_layer)
    pool1 = MaxPooling1D(pool_size=4)(conv1)
    lstm1 = LSTM(32)(pool1)
    output_layer = Dense(1, activation='sigmoid')(lstm1)
    model = Model(inputs=input_layer, outputs=output_layer) 
    

    The model summary is shown below:

    enter image description here

    However, I got the following error:

    ValueError: Error when checking target: expected dense_15 to have 2 dimensions, but got array with shape (476, 400, 1).
    

    I guess the problem was the incorrect shape. Please let me know how to fix it.

    Another question is the number of timesteps. Because the input_shape is assigned in the 1D-Conv, how can we let LSTM know the timestep must be 400?


    I would like to add the model graph based on the suggestion of @today. In this case, the timestep of LSTM will be 98. Do we need to use TimeDistributed in this case? I failed to apply the TimeDistributed in the Conv1D.

    enter image description here

    Is there anyway to perform the convolution among channels, instead of timesteps? For example, a filter (2, 1) traverses each timestep, as shown in figure below. enter image description here

    Thanks.

  • Thuan N.
    Thuan N. almost 6 years
    Thanks for your reply. In the model 2, I suppose that LSTM's timesteps is identical to the size of max_pooling1d_5, or 98. We need a 400-unit Dense to convert the 32-unit LSTM's output into (400, 1) vector corresponding to y. Could you please confirm this point by looking into the picture I added in the post above? Is there anyway to perform the convolution among 16 channels, rather than 400 timesteps? The filter size will be (kernel_size, 1) which traverse each timestep. Please see the second picture added. Thanks.
  • today
    today almost 6 years
    @ThuanN. The point you mentioned is completely right. To be more precise, it is converted to shape (400,) (that's why I said you need to reshape the y or just remove the last axis). As for the convolution with kernel size of 1: yes, absolutely you can do this. It is very common in vision CNNs. Even you can add another Conv layer after this with kernel size greater than 1. It is more or less the same as "Inception" modules in vision CNNs. As for the use of TimeDistributed: No, you don't need to use that and you can't use it in solution #2, because the output shape of LSTM is (None, 32).
  • today
    today almost 6 years
    @ThuanN. Read my comment in reply to GurmeetSingh.
  • Thuan N.
    Thuan N. almost 6 years
    If I conduct the convolution among channels, do I need to transpose the dataset, i.e. (None, 400, 16) -> (None, 16, 400), or I need to use the conv2D? Could you show some snippets for this approach? Thanks.
  • today
    today almost 6 years
    @ThuanN. You mean if there are 16 channels then you want a convolution layer with kernel size of 1 (i.e. covers only one timestep) and has 16 separate filters for each of the 16 channels?
  • Thuan N.
    Thuan N. over 5 years
    As shown in the last figure in the post above, I'd like to do the convolution in each timestep but in one or several channels. For example, at timestep 0, a kernel will do the convolution between (channel 0, channel 1), then (channel 1, channel 2), then (channel 2, channel 3), and so on. The kernel size, hence, is 2x1. If we want to do the convolution among 4 channels, we use kernel 4x1. Do you think conv1D can be configured like that? Thanks.
  • today
    today over 5 years
    @ThuanN. You can't select some of the channels and perform convolution on them only. Setting the kernel size to one, performs the convolution in each timestep but on all the channels of that timestep.
  • today
    today over 5 years
    @ThuanN. Now I got what you want. I have updated my answer.
  • Thuan N.
    Thuan N. over 5 years
    Thanks for your update. I suppose that the Conv1D mentioned earlier did the convolution in both space and time domains (channels and timesteps). My approach is to use CNN to do the convolution in space domain (channels) only. The time domain (timesteps) is dealt by RNN later. I'm still not sure if this approach is feasible.
  • today
    today over 5 years
    @ThuanN. The Conv1D performs the convolution on just the time domain (i.e. timesteps). That's why the kernel_size argument should be one single integer. TBH, I have not seen before the convolution to be performed over the features of a timeseries. But, as I mentioned in my update, it is feasible and could be done using Conv2D layer. Surely, It is feasible but whether it improves the model performance compared to a normal usage of a Conv1D layer, depends on the nature of the data you have as well as the problem you are trying to solve. You must experiment.
  • today
    today over 5 years
    @ThuanN. If the answer resolved your issue, kindly accept it - see What should I do when someone answers my question?
  • EmJ
    EmJ about 4 years
    @today I think I have kind of similar question: stackoverflow.com/questions/60670540/… Please kindly let me know your thoughts on this. Looking forward to hearing from you :)