What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?
Solution 1
from_tensors
combines the input and returns a dataset with a single element:
>>> t = tf.constant([[1, 2], [3, 4]])
>>> ds = tf.data.Dataset.from_tensors(t)
>>> [x for x in ds]
[<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[1, 2],
[3, 4]], dtype=int32)>]
from_tensor_slices
creates a dataset with a separate element for each row of the input tensor:
>>> t = tf.constant([[1, 2], [3, 4]])
>>> ds = tf.data.Dataset.from_tensor_slices(t)
>>> [x for x in ds]
[<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 2], dtype=int32)>,
<tf.Tensor: shape=(2,), dtype=int32, numpy=array([3, 4], dtype=int32)>]
Solution 2
1) Main difference between the two is that nested elements in from_tensor_slices
must have the same dimension in 0th rank:
# exception: ValueError: Dimensions 10 and 9 are not compatible
dataset1 = tf.data.Dataset.from_tensor_slices(
(tf.random_uniform([10, 4]), tf.random_uniform([9])))
# OK, first dimension is same
dataset2 = tf.data.Dataset.from_tensors(
(tf.random_uniform([10, 4]), tf.random_uniform([10])))
2) The second difference, explained here, is when the input to a tf.Dataset is a list. For example:
dataset1 = tf.data.Dataset.from_tensor_slices(
[tf.random_uniform([2, 3]), tf.random_uniform([2, 3])])
dataset2 = tf.data.Dataset.from_tensors(
[tf.random_uniform([2, 3]), tf.random_uniform([2, 3])])
print(dataset1) # shapes: (2, 3)
print(dataset2) # shapes: (2, 2, 3)
In the above, from_tensors
creates a 3D tensor while from_tensor_slices
merge the input tensor. This can be handy if you have different sources of different image channels and want to concatenate them into a one RGB image tensor.
3) A mentioned in the previous answer, from_tensors
convert the input tensor into one big tensor:
import tensorflow as tf
tf.enable_eager_execution()
dataset1 = tf.data.Dataset.from_tensor_slices(
(tf.random_uniform([4, 2]), tf.random_uniform([4])))
dataset2 = tf.data.Dataset.from_tensors(
(tf.random_uniform([4, 2]), tf.random_uniform([4])))
for i, item in enumerate(dataset1):
print('element: ' + str(i + 1), item[0], item[1])
print(30*'-')
for i, item in enumerate(dataset2):
print('element: ' + str(i + 1), item[0], item[1])
output:
element: 1 tf.Tensor(... shapes: ((2,), ()))
element: 2 tf.Tensor(... shapes: ((2,), ()))
element: 3 tf.Tensor(... shapes: ((2,), ()))
element: 4 tf.Tensor(... shapes: ((2,), ()))
-------------------------
element: 1 tf.Tensor(... shapes: ((4, 2), (4,)))
Solution 3
Try this :
import tensorflow as tf # 1.13.1
tf.enable_eager_execution()
t1 = tf.constant([[11, 22], [33, 44], [55, 66]])
print("\n========= from_tensors ===========")
ds = tf.data.Dataset.from_tensors(t1)
print(ds.output_types, end=' : ')
print(ds.output_shapes)
for e in ds:
print (e)
print("\n========= from_tensor_slices ===========")
ds = tf.data.Dataset.from_tensor_slices(t1)
print(ds.output_types, end=' : ')
print(ds.output_shapes)
for e in ds:
print (e)
output :
========= from_tensors ===========
<dtype: 'int32'> : (3, 2)
tf.Tensor(
[[11 22]
[33 44]
[55 66]], shape=(3, 2), dtype=int32)
========= from_tensor_slices ===========
<dtype: 'int32'> : (2,)
tf.Tensor([11 22], shape=(2,), dtype=int32)
tf.Tensor([33 44], shape=(2,), dtype=int32)
tf.Tensor([55 66], shape=(2,), dtype=int32)
The output is pretty much self-explanatory but as you can see, from_tensor_slices() slices the output of (what would be the output of) from_tensors() on its first dimension. You can also try with :
t1 = tf.constant([[[11, 22], [33, 44], [55, 66]],
[[110, 220], [330, 440], [550, 660]]])
Solution 4
I think @MatthewScarpino clearly explained the differences between these two methods.
Here I try to describe the typical usage of these two methods:
from_tensors
can be used to construct a larger dataset from several small datasets, i.e., the size (length) of the dataset becomes larger;while
from_tensor_slices
can be used to combine different elements into one dataset, e.g., combine features and labels into one dataset (that's also why the 1st dimension of the tensors should be the same). That is, the dataset becomes "wider".
Solution 5
In simple:
from_tensors()
returns: single element,
type: TensorDataset
from_tensor_slices()
returns: multiple elements of input length,
type: TensorSliceDataset
Explanation:
from_tensors()
With 1-D input
import tensorflow as tf
dataset_ft = tf.data.Dataset.from_tensors([1, 2, 3])
type(dataset_ft)
>>> tensorflow.python.data.ops.dataset_ops.TensorDataset
Now, if we loop through this Dataset we will only get one object:
for _ in dataset_ft:
print(_)
>>> tf.Tensor([1 2 3], shape=(3,), dtype=int32)
What if we provide 2-D or more dimensional input?
With 2-D input
import tensorflow as tf
dataset_ft = tf.data.Dataset.from_tensors([[1, 2, 3], [4, 5, 6]])
type(dataset_ft)
>>> tensorflow.python.data.ops.dataset_ops.TensorDataset
Now, if we loop through this Dataset we will still get only one object:
for _ in dataset_ft:
print(_)
>>> tf.Tensor(
>>> [[1 2 3]
>>> [4 5 6]], shape=(2, 3), dtype=int32)
As you see that the shape or produced tensor is as of input. There is no change in the shape.
from_tensor_slices()
It removes the first dimension and use it as a dataset dimension.
With 1-D input
import tensorflow as tf
dataset_fts = tf.data.Dataset.from_tensor_slices([1, 2, 3])
type(dataset_fts)
>>> tensorflow.python.data.ops.dataset_ops.TensorSliceDataset
Now, if we loop through this Dataset we will have multiple objects:
for _ in dataset_fts:
print(_)
>>> tf.Tensor(1, shape=(), dtype=int32)
>>> tf.Tensor(2, shape=(), dtype=int32)
>>> tf.Tensor(3, shape=(), dtype=int32)
What if we provide 2-D or more dimensional input?
With 2-D input
import tensorflow as tf
dataset_fts = tf.data.Dataset.from_tensor_slices([[1, 2, 3], [4, 5, 6]])
type(dataset_fts)
>>> tensorflow.python.data.ops.dataset_ops.TensorSliceDataset
If we loop through this 2-D dataset we will have two 1-D elements:
for _ in dataset_fts:
print(_)
>>> tf.Tensor([1 2 3], shape=(3,), dtype=int32)
>>> tf.Tensor([4 5 6], shape=(3,), dtype=int32)
That's the simplest I can explain. To get a better understanding I would suggest you run both these functions with different inputs and see the shape of returned elements.
Related videos on Youtube
Comments
-
Llewlyn almost 2 years
I have a dataset represented as a NumPy matrix of shape
(num_features, num_examples)
and I wish to convert it to TensorFlow typetf.Dataset
.I am struggling trying to understand the difference between these two methods:
Dataset.from_tensors
andDataset.from_tensor_slices
. What is the right one and why?TensorFlow documentation (link) says that both method accept a nested structure of tensor although when using
from_tensor_slices
the tensor should have same size in the 0-th dimension. -
dhiraj suvarna over 5 years@MathewScarpino: can you elaborate more on when to use when?
-
Ray Tayek over 4 yearswith tf 2 i get: AttributeError: 'TensorDataset' object has no attribute 'output_types'
-
HopeKing almost 4 yearsI think the source of confusion (at least for it was), is the name. Since the from_tensor_slices creates slices from the original data...the ideal name should have been "to_tensor_slices" - Because you are taking your data and create tensor slices out of it. Once you think along those lines all documentation from TF2 became very clear for me !
-
user1488777 almost 4 yearsA key piece of info for me that was absent from the docs was that multiple tensors are passed to these methods as a tuple, e.g.
from_tensors((t1,t2,t3,))
. With that knowledge,from_tensors
makes a dataset where each input tensor is like a row of your dataset, andfrom_tensor_slices
makes a dataset where each input tensor is column of your data; so in the latter case all tensors must be the same length, and the elements (rows) of the resulting dataset are tuples with one element from each column. -
J W over 3 yearsPS: It should be tf.random.uniform not tf.random_uniform
-
Areza almost 2 yearshow can one convert one type to another ? I can see some tf functions return error depending what type is used