Function call stack: keras_scratch_graph Error

python tensorflow keras nlp tensorflow2.0

26,569

Solution 1

My situation is tensorflow sample code works fine in Google colab but not in my machine as I got keras_scratch_graph error.

Then i add this Python code at the beginning and it works fine.

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Restrict TensorFlow to only use the fourth GPU
        tf.config.experimental.set_visible_devices(gpus[0], 'GPU')

        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process.

In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process.

For example, you want to train multiple small models with one GPU at the same time. By calling tf.config.experimental.set_memory_growth, which attempts to allocate only as much GPU memory in needed for the runtime allocations: it starts out allocating very little memory, and as the program gets run and more GPU memory is needed, we extend the GPU memory region allocated to the TensorFlow process.

Hope it helps!

Solution 2

I was getting similar error. I reduced the batch size and the error disappeared. I don't know why but it worked for me. I am guessing something related to over stacking.

Solution 3

I think it's a thing about the gpu. look at the traceback:

File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 572, in __call__
    return self._call_flat(args)

tf is calling on eager execution, which means that gpu will be used if the version is available. I had the same issue when I was testing a dense network:

inputs=Input(shape=(100,)
             )
x=Dense(32, activation='relu')(inputs)
x=Dense(32, activation='relu')(x)
x=Dense(32, activation='relu')(x)
outputs=Dense(10, activation='softmax')(x)
model=Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
t=tf.zeros([1,100])
model.predict(t, steps=1, batch_size=1)

... and it gave a similar traceback, also linking to eager execution. Then when I disabled gpu using the following line:

tf.config.experimental.set_visible_devices([], 'GPU')

... the code ran just fine. See if this would help solve the issue. Btw, does colab even support gpu? I didn't even know.

Solution 4

it my case I had to update keras and tensorflow

pip install -U tensorflow keras

Solution 5

If you use Tensorflow-GPU, then add:

physical_devices = tf.config.experimental.list_physical_devices('GPU')
print("physical_devices-------------", len(physical_devices))
tf.config.experimental.set_memory_growth(physical_devices[0], True)

In addition, you can reduce your batch_size or change another computer or cloud services, like google colab, amazon cloud to run your codes because I think this is because the limitation of memory.

View more solutions

26,569

Author by

user8882401

Updated on August 06, 2020

Comments

user8882401 over 3 years

I am reimplementing a text2speech project. I am facing a Function call stack : keras_scratch_graph error in decoder part. The network architecture is from Deep Voice 3 paper.

I am using keras from TF 2.0 on Google Colab. Below is the code for Decoder Keras Model.

y1 = tf.ones(shape = (16, 203, 320))
def Decoder(name = "decoder"):
    # Decoder Prenet
    din = tf.concat((tf.zeros_like(y1[:, :1, -hp.mel:]), y1[:, :-1, -hp.mel:]), 1)
    keys = K.Input(shape = (180, 256), batch_size = 16, name = "keys")
    vals = K.Input(shape = (180, 256), batch_size = 16, name = "vals")
    prev_max_attentions_li = tf.ones(shape=(hp.dlayer, hp.batch_size), dtype=tf.int32)
    #prev_max_attentions_li = K.Input(tensor = prev_max_attentions_li)
    for i in range(hp.dlayer):
        dpout = K.layers.Dropout(rate = 0 if i == 0 else hp.dropout)(din)
        fc_out = K.layers.Dense(hp.char_embed, activation = 'relu')(dpout)

    print("=======================================================================================================")
    print("The FC value is ", fc_out)
    print("=======================================================================================================")

    query_pe = K.layers.Embedding(hp.Ty, hp.char_embed)(tf.tile(tf.expand_dims(tf.range(hp.Ty // hp.r), 0), [hp.batch_size, 1]))
    key_pe = K.layers.Embedding(hp.Tx, hp.char_embed)(tf.tile(tf.expand_dims(tf.range(hp.Tx), 0), [hp.batch_size, 1]))

    alignments_li, max_attentions_li = [], []
    for i in range(hp.dlayer):
        dpout = K.layers.Dropout(rate = 0)(fc_out)
        queries = K.layers.Conv1D(hp.datten_size, hp.dfilter, padding = 'causal', dilation_rate = 2**i)(dpout)
        fc_out = (queries + fc_out) * tf.math.sqrt(0.5)
        print("=======================================================================================================")
        print("The FC value is ", fc_out)
        print("=======================================================================================================")
        queries = fc_out + query_pe
        keys += key_pe

        tensor, alignments, max_attentions = Attention(name = "attention")(queries, keys, vals, prev_max_attentions_li[i])

        fc_out = (tensor + queries) * tf.math.sqrt(0.5)

        alignments_li.append(alignments)
        max_attentions_li.append(max_attentions)

    decoder_output = fc_out

    dpout = K.layers.Dropout(rate = 0)(decoder_output)
    mel_logits = K.layers.Dense(hp.mel * hp.r)(dpout)

    dpout = K.layers.Dropout(rate = 0)(fc_out)
    done_output = K.layers.Dense(2)(dpout)

    return K.Model(inputs = [keys, vals], outputs = [mel_logits, done_output, decoder_output, alignments_li, max_attentions_li], name = name)

decode = Decoder()
kin = tf.ones(shape = (16, 180, 256))
vin = tf.ones(shape = (16, 180, 256))
print(decode(kin, vin))
tf.keras.utils.plot_model(decode, to_file = "decoder.png", show_shapes = True)

When I test with some data, it shows the error messages below. It's going to be some problem with "fc_out", but I dun know how to pass "fc_out" output from the first for loop to the second for loop? Any answer would be appreciated.

File "Decoder.py", line 60, in <module>
    decode = Decoder()
  File "Decoder.py", line 33, in Decoder
    dpout = K.layers.Dropout(rate = 0)(fc_out)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 596, in __call__
    base_layer_utils.create_keras_history(inputs)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 199, in create_keras_history
    _, created_layers = _create_keras_history_helper(tensors, set(), [])
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 245, in _create_keras_history_helper
    layer_inputs, processed_ops, created_layers)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 245, in _create_keras_history_helper
    layer_inputs, processed_ops, created_layers)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 245, in _create_keras_history_helper
    layer_inputs, processed_ops, created_layers)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 243, in _create_keras_history_helper
    constants[i] = backend.function([], op_input)([])
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3510, in __call__
    outputs = self._graph_fn(*converted_inputs)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 572, in __call__
    return self._call_flat(args)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 671, in _call_flat
    outputs = self._inference_function.call(ctx, args)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 445, in call
    ctx=ctx)
  File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.FailedPreconditionError:  Error while reading resource variable _AnonymousVar19 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar19/N10tensorflow3VarE does not exist.
     [[node dense_7/BiasAdd/ReadVariableOp (defined at Decoder.py:33) ]] [Op:__inference_keras_scratch_graph_566]

Function call stack:
keras_scratch_graph

Leonard about 4 years

Thanks for the elaborate answer. I'm facing the same error when tf.config.experimental.list_physical_devices('GPU') yields an empty list, i.e. gpus is False. Can you conceive any reason for this?
Suraj Donthi about 4 years

@Rumo It's mostly an installation problem. If you're using TF 2.0+, an easy way to check whether you've installed TF with GPU support is to either use tf.test.is_built_with_cuda() or tf.test.is_built_with_gpu_support(). If this returns False, you'll have to reinstall TensorFlow as in the documentation.
Leonard about 4 years

@SurajDonthi Thanks, but I do not want to use GPU support. What I meant is that this error is not only attributed to GPU issues. In my case, the reason was that I used tf.metrics.iou which is not supported for eager mode. It worked after switching to tf.keras.MeanIoU. I think this error can arise for very different reasons.
Ahmad Moussa almost 4 years

I have these versions of keras and tensorflow and still have that error
Hafizur Rahman almost 4 years

Did you try keras=2.3.2?
user3352632 almost 2 years

where to add it. got it anyway ...