Keras - Input a 3 channel image into LSTM
Solution 1
If you want the number of images to be a sequence (like a movie with frames), you need to put pixels AND channels as features:
input_shape = (225,3072) #a 3D input where the batch size 7338 wasn't informed
If you want more processing before throwing 3072 features into an LSTM, you can combine or interleave 2D convolutions and LSTMs for a more refined model (not necessarily better, though, each application has its particular behavior).
You can also try to use the new ConvLSTM2D, which will take the five dimensional input:
input_shape=(225,32,32,3) #a 5D input where the batch size 7338 wasn't informed
I'd probably create a convolutional net with several TimeDistributed(Conv2D(...))
and TimeDistributed(MaxPooling2D(...))
before adding a TimeDistributed(Flatten())
and finally the LSTM()
. This will very probably improve both your image understanding and the performance of the LSTM.
Solution 2
There is now a guide how to create RNNs with nested structures in the keras guide which enable arbitrary input types for each timestep: https://www.tensorflow.org/guide/keras/rnn#rnns_with_listdict_inputs_or_nested_inputs
shubhamsingh
Updated on June 20, 2022Comments
-
shubhamsingh almost 2 years
I have read a sequence of images into a numpy array with shape
(7338, 225, 1024, 3)
where7338
is the sample size,225
are the time steps and1024 (32x32)
are flattened image pixels, in3
channels (RGB).I have a sequential model with an LSTM layer:
model = Sequential() model.add(LSTM(128, input_shape=(225, 1024, 3))
But this results in the error:
Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4
The documentation mentions that the input tensor for LSTM layer should be a
3D tensor with shape (batch_size, timesteps, input_dim)
, but in my case myinput_dim
is 2D.What is the suggested way to input a 3 channel image into an LSTM layer in Keras?
-
shubhamsingh over 6 yearsI thought of reshaping my data from
(1024, 3)
to3072
, but I already had the data in batch size of7338
, and reshaping was taking a lot of time. And the LSTM is part of an auto encoder, so wasn't sure if this reshaping would help my cause. Will try reshaping first, then withConvLSTM2D
andTimeDistributed
layers. Thanks for your answer. -
Daniel Möller over 6 yearsReshaping taking time??? That doesn't sound ok.... the LSTM would be very very slow, though....
-
shubhamsingh over 6 yearsYes, I think that's cause I'll be reshaping
1651050 (7738*225)
instances. So, instead of doing it all together, I resorted to Keras model method of fit_generator(), where I create a generator method to reshape the data set, while training.