Function call stack: keras_scratch_graph Error

26,569

Solution 1

My situation is tensorflow sample code works fine in Google colab but not in my machine as I got keras_scratch_graph error.

Then i add this Python code at the beginning and it works fine.

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        # Restrict TensorFlow to only use the fourth GPU
        tf.config.experimental.set_visible_devices(gpus[0], 'GPU')

        # Currently, memory growth needs to be the same across GPUs
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Memory growth must be set before GPUs have been initialized
        print(e)

By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process.

In some cases it is desirable for the process to only allocate a subset of the available memory, or to only grow the memory usage as is needed by the process.

For example, you want to train multiple small models with one GPU at the same time. By calling tf.config.experimental.set_memory_growth, which attempts to allocate only as much GPU memory in needed for the runtime allocations: it starts out allocating very little memory, and as the program gets run and more GPU memory is needed, we extend the GPU memory region allocated to the TensorFlow process.

Hope it helps!

Solution 2

I was getting similar error. I reduced the batch size and the error disappeared. I don't know why but it worked for me. I am guessing something related to over stacking.

Solution 3

I think it's a thing about the gpu. look at the traceback:

File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 572, in __call__
    return self._call_flat(args)

tf is calling on eager execution, which means that gpu will be used if the version is available. I had the same issue when I was testing a dense network:

inputs=Input(shape=(100,)
             )
x=Dense(32, activation='relu')(inputs)
x=Dense(32, activation='relu')(x)
x=Dense(32, activation='relu')(x)
outputs=Dense(10, activation='softmax')(x)
model=Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
t=tf.zeros([1,100])
model.predict(t, steps=1, batch_size=1)

... and it gave a similar traceback, also linking to eager execution. Then when I disabled gpu using the following line:

tf.config.experimental.set_visible_devices([], 'GPU')

... the code ran just fine. See if this would help solve the issue. Btw, does colab even support gpu? I didn't even know.

Solution 4

it my case I had to update keras and tensorflow

pip install -U tensorflow keras 

Solution 5

If you use Tensorflow-GPU, then add:

physical_devices = tf.config.experimental.list_physical_devices('GPU')
print("physical_devices-------------", len(physical_devices))
tf.config.experimental.set_memory_growth(physical_devices[0], True)

In addition, you can reduce your batch_size or change another computer or cloud services, like google colab, amazon cloud to run your codes because I think this is because the limitation of memory.

Share:
26,569
user8882401
Author by

user8882401

Updated on August 06, 2020

Comments

  • user8882401
    user8882401 over 3 years

    I am reimplementing a text2speech project. I am facing a Function call stack : keras_scratch_graph error in decoder part. The network architecture is from Deep Voice 3 paper.

    I am using keras from TF 2.0 on Google Colab. Below is the code for Decoder Keras Model.

    y1 = tf.ones(shape = (16, 203, 320))
    def Decoder(name = "decoder"):
        # Decoder Prenet
        din = tf.concat((tf.zeros_like(y1[:, :1, -hp.mel:]), y1[:, :-1, -hp.mel:]), 1)
        keys = K.Input(shape = (180, 256), batch_size = 16, name = "keys")
        vals = K.Input(shape = (180, 256), batch_size = 16, name = "vals")
        prev_max_attentions_li = tf.ones(shape=(hp.dlayer, hp.batch_size), dtype=tf.int32)
        #prev_max_attentions_li = K.Input(tensor = prev_max_attentions_li)
        for i in range(hp.dlayer):
            dpout = K.layers.Dropout(rate = 0 if i == 0 else hp.dropout)(din)
            fc_out = K.layers.Dense(hp.char_embed, activation = 'relu')(dpout)
    
        print("=======================================================================================================")
        print("The FC value is ", fc_out)
        print("=======================================================================================================")
    
        query_pe = K.layers.Embedding(hp.Ty, hp.char_embed)(tf.tile(tf.expand_dims(tf.range(hp.Ty // hp.r), 0), [hp.batch_size, 1]))
        key_pe = K.layers.Embedding(hp.Tx, hp.char_embed)(tf.tile(tf.expand_dims(tf.range(hp.Tx), 0), [hp.batch_size, 1]))
    
        alignments_li, max_attentions_li = [], []
        for i in range(hp.dlayer):
            dpout = K.layers.Dropout(rate = 0)(fc_out)
            queries = K.layers.Conv1D(hp.datten_size, hp.dfilter, padding = 'causal', dilation_rate = 2**i)(dpout)
            fc_out = (queries + fc_out) * tf.math.sqrt(0.5)
            print("=======================================================================================================")
            print("The FC value is ", fc_out)
            print("=======================================================================================================")
            queries = fc_out + query_pe
            keys += key_pe
    
            tensor, alignments, max_attentions = Attention(name = "attention")(queries, keys, vals, prev_max_attentions_li[i])
    
            fc_out = (tensor + queries) * tf.math.sqrt(0.5)
    
            alignments_li.append(alignments)
            max_attentions_li.append(max_attentions)
    
        decoder_output = fc_out
    
        dpout = K.layers.Dropout(rate = 0)(decoder_output)
        mel_logits = K.layers.Dense(hp.mel * hp.r)(dpout)
    
        dpout = K.layers.Dropout(rate = 0)(fc_out)
        done_output = K.layers.Dense(2)(dpout)
    
        return K.Model(inputs = [keys, vals], outputs = [mel_logits, done_output, decoder_output, alignments_li, max_attentions_li], name = name)
    
    
    decode = Decoder()
    kin = tf.ones(shape = (16, 180, 256))
    vin = tf.ones(shape = (16, 180, 256))
    print(decode(kin, vin))
    tf.keras.utils.plot_model(decode, to_file = "decoder.png", show_shapes = True)
    
    

    When I test with some data, it shows the error messages below. It's going to be some problem with "fc_out", but I dun know how to pass "fc_out" output from the first for loop to the second for loop? Any answer would be appreciated.

    File "Decoder.py", line 60, in <module>
        decode = Decoder()
      File "Decoder.py", line 33, in Decoder
        dpout = K.layers.Dropout(rate = 0)(fc_out)
      File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 596, in __call__
        base_layer_utils.create_keras_history(inputs)
      File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 199, in create_keras_history
        _, created_layers = _create_keras_history_helper(tensors, set(), [])
      File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 245, in _create_keras_history_helper
        layer_inputs, processed_ops, created_layers)
      File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 245, in _create_keras_history_helper
        layer_inputs, processed_ops, created_layers)
      File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 245, in _create_keras_history_helper
        layer_inputs, processed_ops, created_layers)
      File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer_utils.py", line 243, in _create_keras_history_helper
        constants[i] = backend.function([], op_input)([])
      File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3510, in __call__
        outputs = self._graph_fn(*converted_inputs)
      File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 572, in __call__
        return self._call_flat(args)
      File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 671, in _call_flat
        outputs = self._inference_function.call(ctx, args)
      File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 445, in call
        ctx=ctx)
      File "/Users/ydc/dl-npm/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 67, in quick_execute
        six.raise_from(core._status_to_exception(e.code, message), None)
      File "<string>", line 3, in raise_from
    tensorflow.python.framework.errors_impl.FailedPreconditionError:  Error while reading resource variable _AnonymousVar19 from Container: localhost. This could mean that the variable was uninitialized. Not found: Resource localhost/_AnonymousVar19/N10tensorflow3VarE does not exist.
         [[node dense_7/BiasAdd/ReadVariableOp (defined at Decoder.py:33) ]] [Op:__inference_keras_scratch_graph_566]
    
    Function call stack:
    keras_scratch_graph
    
    
  • Leonard
    Leonard about 4 years
    Thanks for the elaborate answer. I'm facing the same error when tf.config.experimental.list_physical_devices('GPU') yields an empty list, i.e. gpus is False. Can you conceive any reason for this?
  • Suraj Donthi
    Suraj Donthi about 4 years
    @Rumo It's mostly an installation problem. If you're using TF 2.0+, an easy way to check whether you've installed TF with GPU support is to either use tf.test.is_built_with_cuda() or tf.test.is_built_with_gpu_support(). If this returns False, you'll have to reinstall TensorFlow as in the documentation.
  • Leonard
    Leonard about 4 years
    @SurajDonthi Thanks, but I do not want to use GPU support. What I meant is that this error is not only attributed to GPU issues. In my case, the reason was that I used tf.metrics.iou which is not supported for eager mode. It worked after switching to tf.keras.MeanIoU. I think this error can arise for very different reasons.
  • Ahmad Moussa
    Ahmad Moussa almost 4 years
    I have these versions of keras and tensorflow and still have that error
  • Hafizur Rahman
    Hafizur Rahman almost 4 years
    Did you try keras=2.3.2?
  • user3352632
    user3352632 almost 2 years
    where to add it. got it anyway ...