Keras - Input a 3 channel image into LSTM

12,686

Solution 1

If you want the number of images to be a sequence (like a movie with frames), you need to put pixels AND channels as features:

input_shape = (225,3072)  #a 3D input where the batch size 7338 wasn't informed

If you want more processing before throwing 3072 features into an LSTM, you can combine or interleave 2D convolutions and LSTMs for a more refined model (not necessarily better, though, each application has its particular behavior).

You can also try to use the new ConvLSTM2D, which will take the five dimensional input:

input_shape=(225,32,32,3) #a 5D input where the batch size 7338 wasn't informed

I'd probably create a convolutional net with several TimeDistributed(Conv2D(...)) and TimeDistributed(MaxPooling2D(...)) before adding a TimeDistributed(Flatten()) and finally the LSTM(). This will very probably improve both your image understanding and the performance of the LSTM.

Solution 2

There is now a guide how to create RNNs with nested structures in the keras guide which enable arbitrary input types for each timestep: https://www.tensorflow.org/guide/keras/rnn#rnns_with_listdict_inputs_or_nested_inputs

Share:
12,686
shubhamsingh
Author by

shubhamsingh

Updated on June 20, 2022

Comments

  • shubhamsingh
    shubhamsingh almost 2 years

    I have read a sequence of images into a numpy array with shape (7338, 225, 1024, 3) where 7338 is the sample size, 225 are the time steps and 1024 (32x32) are flattened image pixels, in 3 channels (RGB).

    I have a sequential model with an LSTM layer:

    model = Sequential()
    model.add(LSTM(128, input_shape=(225, 1024, 3))
    

    But this results in the error:

    Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4
    

    The documentation mentions that the input tensor for LSTM layer should be a 3D tensor with shape (batch_size, timesteps, input_dim), but in my case my input_dim is 2D.

    What is the suggested way to input a 3 channel image into an LSTM layer in Keras?

  • shubhamsingh
    shubhamsingh over 6 years
    I thought of reshaping my data from (1024, 3) to 3072, but I already had the data in batch size of 7338, and reshaping was taking a lot of time. And the LSTM is part of an auto encoder, so wasn't sure if this reshaping would help my cause. Will try reshaping first, then with ConvLSTM2D and TimeDistributed layers. Thanks for your answer.
  • Daniel Möller
    Daniel Möller over 6 years
    Reshaping taking time??? That doesn't sound ok.... the LSTM would be very very slow, though....
  • shubhamsingh
    shubhamsingh over 6 years
    Yes, I think that's cause I'll be reshaping 1651050 (7738*225) instances. So, instead of doing it all together, I resorted to Keras model method of fit_generator(), where I create a generator method to reshape the data set, while training.