Convert python sequence with multiple datatypes to tensor

10,533

Solution 1

In tensorflow you can't have a tensor with more than one data type.

Quoting the documentation:

It is not possible to have a tf.Tensor with more than one data type. It is possible, however, to serialize arbitrary data structures as strings and store those in tf.Tensors.

Hence a workaround could be to create a tensor with data type tf.String and, on the occurrence, cast the field to the desired data type

Solution 2

You want a tensor for each of your features (columns). Only if it's a multi-dimensional feature (like an image, a video, list of strings, vector) would you have more dimensions in the tensor and even then they would all have the same datatype.

tf.data.Dataset.from_tensor_slices() will accept your input as a dictionary of lists (key is the name of the feature, value is a list of the values in that feature), or as a list of lists. I can't remember if it eats Pandas dataframes but if it doesn't you can easily convert it to a dictionary df.to_dict().

However, you can't input None values. You will have to find some value for those before converting into a tensor. Classic approaches to that is median value, zero value, most common value, "missing"/"unknown" value for strings or categories, or imputation.

Share:
10,533
Michael
Author by

Michael

Updated on June 12, 2022

Comments

  • Michael
    Michael almost 2 years

    I'm using TensorFlow r1.7 and python3.6.5. I am also very new to TensorFlow, so I'd like easy to read explanations if possible.

    I'm trying to convert my input data into a dataset of tensors with this function tf.data.Dataset.from_tensor_slices(). I pass my tuple with mixed datatypes into this function. However, when running my code I get this error: ValueError: Can't convert Python sequence with mixed types to Tensor.

    I want to know why I am receiving this error, and how I can convert my data to a dataset of tensors even with mixed datatypes.

    Here's a printout of the top 5 entries in my tuple.

    (13501, 2, None, 51, '2232', 'S35', '734.72', 'CLA', '240', 1035, 2060, 1252, 1182, 10, '967.28', '338.50', None, 14, 102, 3830)
    (15124, 2, None, 57, '2641', 'S35', '234.80', 'DDA', '240', 743, 1597, 4706, 156, 0, None, None, None, 3, 27, 981)
    (40035, 2, None, None, '21', 'K00', '60.06', 'CHK', '520', 76, 1863, 12, None, 1, '85.06', '25.00', None, 1, 5, 245)
    (42331, 3, None, 62, '121', 'S50', '1859.01', 'ACT', '420', 952, 1583, 410, 255, 0, None, None, None, 6, 117, 1795)
    (201721, 3, None, 42, '2472', 'S35', '1413.84', 'CLA', '350', 868, 1746, 963, 264, 0, None, None, None, 18, 65, 4510)
    

    As you can see, I have a mix of integers, floats, and strings in my input data.

    Here is a traceback of the error:

    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/miikey101/Documents/Khalen_Case_Loader/tensorflow/k_means/k_means.py", line 10, in prepare_dataset
        dataset = tf.data.Dataset.from_tensor_slices(dm_data)
      File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 222, in from_tensor_slices
        return TensorSliceDataset(tensors)
      File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1017, in __init__
        for i, t in enumerate(nest.flatten(tensors))
      File "/usr/local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1017, in <listcomp>
        for i, t in enumerate(nest.flatten(tensors))
      File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 950, in convert_to_tensor
        as_ref=False)
      File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1040, in internal_convert_to_tensor
        ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
      File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 235, in _constant_tensor_conversion_function
        return constant(v, dtype=dtype, name=name)
      File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 185, in constant
        t = convert_to_eager_tensor(value, ctx, dtype)
      File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 131, in convert_to_eager_tensor
        return ops.EagerTensor(value, context=handle, device=device, dtype=dtype)
    ValueError: Can't convert Python sequence with mixed types to Tensor.
    
  • Michael
    Michael about 6 years
    I see, thank you. I converted all my strings to integer representations, then all the integers to floats and I was able to successfully convert the data into tensors.