batch_input_shape tuple on Keras LSTM

11,257

According to this Keras Sequential Model guide on "stateful" LSTM (at the very bottom), we can see what those three elements mean:

Expected input batch shape: (batch_size, timesteps, data_dim). Note that we have to provide the full batch_input_shape since the network is stateful. the sample of index i in batch k is the follow-up for the sample i in batch k-1.

The first one as you already discovered is the size of the batches to be used during training. How much you should chose depends in part on your specific problem, but mostly is given by the size of your dataset. If you specify a batch size of x and your dataset contains N samples, during training your data will be split in N/x groups (batches) of size x each.

Therefore, you probably want your batch size to be smaller than the size of your dataset. There is no unique value, but you want it to be proportionally smaller (say one or two orders) than all your data. Some people prefer to use powers of 2 (32, 128, etc.) as their batch sizes. It is also possible in some cases to not use batches at all, and train with all your data at once (although not necessarily better).

The other two values are the timesteps (the size of your temporal dimension) or "frames" each sample sequence has, and the data dimension (that is, the size of your data vector on each timestep).

For example, say your input sequences look like X = [[0.54, 0.3], [0.11, 0.2], [0.37, 0.81]]. We can see that this sequence has a timestep of 3 and a data_dim of 2.

So, the ValueError you are getting is most probably due to this (the error even hints that it expected 3 dims). Also, make sure your array is a Numpy Array.

As a last comment, given that you say you have 32 samples total (that is your whole dataset contains 32 samples) I consider is too few data to be using batches; usually the minimum batch size I have seen is 32, so consider obtaining more data before trying to use batch training. Hope this helps.

Share:
11,257
André Heringer
Author by

André Heringer

Updated on June 16, 2022

Comments

  • André Heringer
    André Heringer almost 2 years

    I have the following feature vector that consists in a single feature for each sample and 32 samples at total:

    X = [[0.1], [0.12], [0.3] ... [0.10]]

    and a label vector that consists of binary values

    Y = [0, 1, 0 , 0, .... 1] (with 32 samples as well)

    I'm trying to use Keras LSTM to predict the next value of the sequence based on a new entry. What I can't figure out is what the "batch_input_shape" tuple means for instance:

     model.add(LSTM(neurons, batch_input_shape=(?, ?, ?), return_sequences=False, stateful=True))
    

    According to this article the first one is the batch size, but what about the other two? Are they the number of features for each sample and the number of samples? What should be the value of batch_size in this case?

    At the moment receiving the error message:

    ValueError: Error when checking input: expected lstm_1_input to have 3 dimensions, but got array with shape (32, 1)
    

    Edit: Here is the model declaration:

    def create_lstm(batch_size, n_samples, neurons, dropout):
    model = Sequential()
    model.add(LSTM(neurons, batch_size=batch_size, input_shape=(n_samples, 1), return_sequences=False, stateful=True))
    model.add(Dropout(dropout))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model