WARNING: WARNING:tensorflow:Model was constructed with shape (None, 150) , but it was called on an input with incompatible shape (None, 1)
Ok so, here is what I understood, correct me if I'm wrong:
-
x
contains 94556 integers, each being the index of one out of 2557 words. -
y
contains 94556 vectors of 2557 integers, each containing also the index of one word, but this time it is a one-hot encoding instead of a categorical encoding. - Finally, a corresponding pair of words from
x
andy
represents two words that are close by in the original text.
If I am correct so far, then the following runs correctly:
import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import *
from tensorflow.keras.models import *
x = np.random.randint(0,2557,94556)
y = np.eye((2557))[np.random.randint(0,2557,94556)]
xr = x.reshape((-1,1))
print("x.shape: {}\nxr.shape:{}\ny.shape: {}".format(x.shape, xr.shape, y.shape))
model = Sequential()
model.add(Embedding(2557, 64, input_length=1, embeddings_initializer='glorot_uniform'))
model.add(Reshape((64,)))
model.add(Dense(512, activation='sigmoid'))
model.add(Dense(2557, activation='softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
history=model.fit(xr, y, epochs=20, batch_size=32, validation_split=3/9)
The most import modifications:
- The
y
reshaping was losing the relationship between elements fromx
andy
. - The
input_length
in theEmbedding
layer should correspond to the second dimension ofxr
. - The output of the last layer from the network should be the same dimension as the second dimension of
y
.
I am actually surprised the code ran without crashing.
Finally, from my research, it seems that people are not training skipgrams like this in practice, but rather they are trying to predict whether a training example is correct (the two words are close by) or not. Maybe this is the reason you came up with an output of dimension one.
Here is a model inspired from https://github.com/PacktPublishing/Deep-Learning-with-Keras/blob/master/Chapter05/keras_skipgram.py :
word_model = Sequential()
word_model.add(Embedding(2557, 64, embeddings_initializer="glorot_uniform", input_length=1))
word_model.add(Reshape((embed_size,)))
context_model = Sequential()
context_model.add(Embedding(2557, 64, embeddings_initializer="glorot_uniform", input_length=1))
context_model.add(Reshape((64,)))
model = Sequential()
model.add(Merge([word_model, context_model], mode="dot", dot_axes=0))
model.add(Dense(1, kernel_initializer="glorot_uniform", activation="sigmoid"))
In that case, you would have 3 vectors, all from the same size (94556, 1)
(or probably even bigger than 94556, since you might have to generate additional negative samples):
-
x
containing integers from 0 to 2556 -
y
containing integers from 0 to 2556 -
output
containing 0s and 1s, whether each pair fromx
andy
is a negative or a positive example
and the training would look like:
history = model.fit([x, y], output, epochs=20, batch_size=32, validation_split=3/9)
DolceVita34
Updated on July 20, 2022Comments
-
DolceVita34 over 1 year
So I'm trying to build a word embedding model but I keep getting this error. During training, the accuracy does not change and the val_loss remains "nan"
The raw shape of the data is
x.shape, y.shape ((94556,), (94556, 2557))
Then I reshape it so:
xr= np.asarray(x).astype('float32').reshape((-1,1)) yr= np.asarray(y).astype('float32').reshape((-1,1)) ((94556, 1), (241779692, 1))
Then I run it through my model
model = Sequential() model.add(Embedding(2557, 64, input_length=150, embeddings_initializer='glorot_uniform')) model.add(Flatten()) model.add(Reshape((64,), input_shape=(94556, 1))) model.add(Dense(512, activation='sigmoid')) model.add(Dense(128, activation='sigmoid')) model.add(Dense(64, activation='relu')) model.add(Dense(10, activation='sigmoid')) model.add(Dense(1, activation='relu')) # compile the mode model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) # summarize the model print(model.summary()) plot_model(model, show_shapes = True, show_layer_names=False)
After training, I get a constant accuracy and a val_loss nan for every epoch
history=model.fit(xr, yr, epochs=20, batch_size=32, validation_split=3/9) Epoch 1/20 WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1). WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1). 1960/1970 [============================>.] - ETA: 0s - loss: nan - accuracy: 0.9996WARNING:tensorflow:Model was constructed with shape (None, 150) for input Tensor("embedding_6_input:0", shape=(None, 150), dtype=float32), but it was called on an input with incompatible shape (None, 1). 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 2/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 3/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 4/20 1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 5/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 6/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 7/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 8/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 9/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 10/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 11/20 1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 12/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 13/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 14/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 15/20 1970/1970 [==============================] - 8s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 16/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 17/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 18/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 19/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996 Epoch 20/20 1970/1970 [==============================] - 7s 4ms/step - loss: nan - accuracy: 0.9996 - val_loss: nan - val_accuracy: 0.9996
I think it has to do whit the input/output shape but I'm not certain. I tried modifying the model in various ways, adding layers/ removing layers/ different optimizers/ different batch sizes and nothing worked so far.