Tensorflow keras with tf dataset input
Solution 1
To your original question as to why you're getting the error:
Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (32,)
The reason your code breaks is because you haven't applied the .batch()
back to the dataset
variable, like so:
dataset = dataset.batch(10)
You simply called dataset.batch()
.
This breaks because without the batch()
the output tensors are not batched, i.e. you get shape (32,)
instead of (1,32)
.
Solution 2
You are missing defining an iterator which is the reason why there is an error.
data = np.random.random((1000,32))
labels = np.random.random((1000,10))
dataset = tf.data.Dataset.from_tensor_slices((data,labels))
dataset = dataset.batch(10).repeat()
inputs = Input(shape=(32,)) # Returns a placeholder tensor
# A layer instance is callable on a tensor, and returns a tensor.
x = Dense(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)
# Instantiate the model given inputs and outputs.
model = keras.Model(inputs=inputs, outputs=predictions)
# The compile step specifies the training configuration.
model.compile(optimizer=tf.train.RMSPropOptimizer(0.001),
loss='categorical_crossentropy',
metrics=['accuracy'])
# Trains for 5 epochs
model.fit(dataset.make_one_shot_iterator(), epochs=5, steps_per_epoch=100)
Epoch 1/5 100/100 [==============================] - 1s 8ms/step - loss: 11.5787 - acc: 0.1010
Epoch 2/5 100/100 [==============================] - 0s 4ms/step - loss: 11.4846 - acc: 0.0990
Epoch 3/5 100/100 [==============================] - 0s 4ms/step - loss: 11.4690 - acc: 0.1270
Epoch 4/5 100/100 [==============================] - 0s 4ms/step - loss: 11.4611 - acc: 0.1300
Epoch 5/5 100/100 [==============================] - 0s 4ms/step - loss: 11.4546 - acc: 0.1360
This is the result on my system.
Related videos on Youtube
wxy
Updated on June 04, 2022Comments
-
wxy almost 2 years
I'm new to tensorflow keras and dataset. Can anyone help me understand why the following code doesn't work?
import tensorflow as tf import tensorflow.keras as keras import numpy as np from tensorflow.python.data.ops import dataset_ops from tensorflow.python.data.ops import iterator_ops from tensorflow.python.keras.utils import multi_gpu_model from tensorflow.python.keras import backend as K data = np.random.random((1000,32)) labels = np.random.random((1000,10)) dataset = tf.data.Dataset.from_tensor_slices((data,labels)) print( dataset) print( dataset.output_types) print( dataset.output_shapes) dataset.batch(10) dataset.repeat(100) inputs = keras.Input(shape=(32,)) # Returns a placeholder tensor # A layer instance is callable on a tensor, and returns a tensor. x = keras.layers.Dense(64, activation='relu')(inputs) x = keras.layers.Dense(64, activation='relu')(x) predictions = keras.layers.Dense(10, activation='softmax')(x) # Instantiate the model given inputs and outputs. model = keras.Model(inputs=inputs, outputs=predictions) # The compile step specifies the training configuration. model.compile(optimizer=tf.train.RMSPropOptimizer(0.001), loss='categorical_crossentropy', metrics=['accuracy']) # Trains for 5 epochs model.fit(dataset, epochs=5, steps_per_epoch=100)
It failed with the following error:
model.fit(x=dataset, y=None, epochs=5, steps_per_epoch=100) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/keras/engine/training.py", line 1510, in fit validation_split=validation_split) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/keras/engine/training.py", line 994, in _standardize_user_data class_weight, batch_size) File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/keras/engine/training.py", line 1113, in _standardize_weights exception_prefix='input') File "/home/wuxinyu/pyEnv/lib/python3.5/site-packages/tensorflow/python/keras/engine/training_utils.py", line 325, in standardize_input_data 'with shape ' + str(data_shape)) ValueError: Error when checking input: expected input_1 to have 2 dimensions, but got array with shape (32,)
According to tf.keras guide, I should be able to directly pass the dataset to model.fit, as this example shows:
Input tf.data datasets
Use the Datasets API to scale to large datasets or multi-device training. Pass a tf.data.Dataset instance to the fit method:
# Instantiates a toy dataset instance: dataset = tf.data.Dataset.from_tensor_slices((data, labels)) dataset = dataset.batch(32) dataset = dataset.repeat()
Don't forget to specify
steps_per_epoch
when callingfit
on a dataset.model.fit(dataset, epochs=10, steps_per_epoch=30) Here, the fit method uses the steps_per_epoch argument—this is the number of training steps the model runs before it moves to the next epoch. Since the Dataset yields batches of data, this snippet does not require a batch_size.
Datasets can also be used for validation:
dataset = tf.data.Dataset.from_tensor_slices((data, labels)) dataset = dataset.batch(32).repeat() val_dataset = tf.data.Dataset.from_tensor_slices((val_data, val_labels)) val_dataset = val_dataset.batch(32).repeat() model.fit(dataset, epochs=10, steps_per_epoch=30, validation_data=val_dataset, validation_steps=3)
What's the problem with my code, and what's the correct way of doing it?
-
Roy Shilkrot about 5 yearsActually an iterator is not needed, the
tf.Dataset
should work fine inmodel.fit()
. -
SantoshGupta7 about 5 yearsI am wondering how Keras is able to do 5 epochs when the make_one_shot_iterator() which only supports iterating once through a dataset?
-
kvish about 5 yearsSeemed to have missed this completely! I'm guessing the version of tf then required the iterator and now the support is there without one. OP seemed to have the batch function with a value already. Keras is able to do 5 epochs because we have the repeat function here added to our operations.