Input shape for Keras LSTM/GRU language model

python nlp keras lstm language-model

11,845

It depends on what you are trying to do. I guess that your data of shape (90582, 517) is a set of 90582 samples with 517 words each. If so, you have to transform your words into word vectors (=embeddings) in order for them to be meaningful. Then you will have the shape (90582, 517, embedding_dim), which can be handled by the GRU.

The Keras Embedding layer can do that for you. Add it as the first layer of your Neural Network before the fist GRU layer.

vocabulary_size = XXXXX     # give your vocabulary size here (largest word ID in the input)
embedding_dim = XXXX        # give your embedding dimension here (e.g. 100)

print('Build model...')
model = Sequential()
model.add(Embedding(vocabulary_size, embedding_dim, input_shape=(90582, 517)))
model.add(GRU(512, return_sequences=True))
model.add(Dropout(0.2))
model.add(GRU(512, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributedDense(1))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.fit(x_pad, y_pad, batch_size=128, nb_epoch=2)

11,845

Author by

ishido

Updated on July 30, 2022

Comments

ishido over 1 year

I am trying to train a language model on word level in Keras.

I have my X and Y, both with the shape (90582L, 517L)

When I try fit this model:

print('Build model...')
model = Sequential()
model.add(GRU(512, return_sequences=True, input_shape=(90582, 517)))
model.add(Dropout(0.2))
model.add(GRU(512, return_sequences=True))
model.add(Dropout(0.2))
model.add(TimeDistributedDense(1))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')
model.fit(x_pad, y_pad, batch_size=128, nb_epoch=2)

I get the error:

Exception: Error when checking model input: 
expected gru_input_7 to have 3 dimensions, but got array with shape (90582L, 517L)

I need some guidance as to what the input shape should be? I've done trial and error on all sorts of combinations but it seems I am misunderstanding something fundamental.

In the Keras text generation example, the X matrix had 3 dimensions. I have no idea what the third dimension is supposed to be though.