How to inspect a Tensorflow .tfrecord file?


Solution 1

Found it!

import tensorflow as tf

for example in tf.python_io.tf_record_iterator("data/foobar.tfrecord"):

You can also add:

from google.protobuf.json_format import MessageToJson
jsonMessage = MessageToJson(tf.train.Example.FromString(example))

Solution 2

Above solutions didn't work for me so for TF 2.0 use this:

import tensorflow as tf 
raw_dataset ="path-to-file")

for raw_record in raw_dataset.take(1):
    example = tf.train.Example()

Solution 3

Improvement of the accepted solution :

import tensorflow as tf
import json

dataset ="mydata.tfrecord")
for d in dataset:
    ex = tf.train.Example()
    m = json.loads(MessageToJson(ex))

In my case, I was running on TF2, and a single example was too big to fit on my screen, so I needed to use a dictionary to inspect the keys (the accepted solution return a full string).

Solution 4

If your .tftrecord contains SequenceExample, the accepted answer won't show you everything. You can use:

import tensorflow as tf

for example in tf.python_io.tf_record_iterator("data/foobar.tfrecord"):
    result = tf.train.SequenceExample.FromString(example)

This will show you the content of the first example.

Then you can also inspect individual Features using their keys:


And for FeatureLists:


Solution 5

If it's an option to install another Python package, tfrecord_lite is very convenient.


In [1]: import tensorflow as tf
   ...: from tfrecord_lite import decode_example
   ...: it = tf.python_io.tf_record_iterator('nsynth-test.tfrecord')
   ...: decode_example(next(it))
{'audio': array([ 3.8138387e-06, -3.8721851e-06,  3.9331076e-06, ...,
        -3.6526076e-06,  3.7041993e-06, -3.7578957e-06], dtype=float32),
 'instrument': array([417], dtype=int64),
 'instrument_family': array([0], dtype=int64),
 'instrument_family_str': [b'bass'],
 'instrument_source': array([2], dtype=int64),
 'instrument_source_str': [b'synthetic'],
 'instrument_str': [b'bass_synthetic_033'],
 'note': array([149013], dtype=int64),
 'note_str': [b'bass_synthetic_033-100-100'],
 'pitch': array([100], dtype=int64),
 'qualities': array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int64),
 'sample_rate': array([16000], dtype=int64),
 'velocity': array([100], dtype=int64)}

You can install it by pip install tfrecord_lite.

Bob van Luijt
Author by

Bob van Luijt

Founder of SeMI Technologies & Weaviate.

Updated on July 05, 2022


  • Bob van Luijt
    Bob van Luijt almost 2 years

    I have a .tfrecord but I don't know how it is structured. How can I inspect the schema to understand what the .tfrecord file contains?

    All Stackoverflow answers or documentation seem to assume I know the structure of the file.

    reader = tf.TFRecordReader()
    file = tf.train.string_input_producer("record.tfrecord")
    _, serialized_record =
    ...HOW TO INSPECT serialized_record...
  • PatriceG
    PatriceG over 5 years
    It seems that this solution doesn't show all the content of the file.
  • Bob van Luijt
    Bob van Luijt over 5 years
    Is that so? I didn’t have that issue
  • Mahmoud Akl
    Mahmoud Akl about 5 years
    If I'm not mistaken, this loops through the entire TFRecord file to give you the contents of one single example. Is there a more efficient way to just read one example?
  • Robert Lugg
    Robert Lugg almost 5 years
    TFRecord files must be read sequentially from the start per documentation. I'm sure there is a way to read them randomly but maybe no supported standard.
  • SantoshGupta7
    SantoshGupta7 almost 4 years
    Answer should be changed to this one
  • wvxvw
    wvxvw over 3 years
    broken link11111
  • Vinson Ciawandy
    Vinson Ciawandy about 3 years
    Is the MessageToJson comes from google protobuf?
  • HyeonPhil Youn
    HyeonPhil Youn almost 3 years
    This is the one
  • PascalIv
    PascalIv about 2 years
    I get: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 206: invalid start byte