Tensorflow: Failed to create session

19,119

Solution 1

Are you using GPU? If yes, maybe it's just simply out of GPU Memory due to the previous process failed to be killed.

This ticket helps me identify the problem: https://github.com/tensorflow/tensorflow/issues/9549

To see your GPU status: in terminal, nvidia-smi -l 2 to update your gpu stat every 2 seconds

This post shows you how to kill the process that currently taking all the memory of your GPU: https://www.quora.com/How-do-I-kill-all-the-computer-processes-shown-in-nvidia-smi

Solution 2

Happened to me when I had a separate Tensorflow session running in another terminal. Closing that terminal made it work.

Solution 3

maybe out of GPU memory? Try running with

export CUDA_VISIBLE_DEVICES=''

Also please provide details about what platform you are using (operating system, architecture). Also include your TensorFlow version.

Were you able to create a simple session from python console. Something like this:

import tensorflow as tf
hello = tf.constant('hi,tensorflow')
sess = tf.Session()

Solution 4

After you execute

export CUDA_VISIBLE_DEVICES=''

your tensorflow may not use GPU. It may start training the model using CPU only.

You can find a better solution here. This doesn't require any restart, and you can apply it in server.

Solution 5

I had exactly same problem and this is what I did:

  1. Nvidia Driver:

    $ nvidia-smi | NVIDIA-SMI 384.130 Driver Version: 384.130

Found driver 384.130

  1. Updated Driver

    $ sudo add-apt-repository ppa:graphics-drivers/ppa

    $ sudo apt update

The above log showed nvidia 396 is being installed.

nvidia_396:
 Running module version sanity check.

Restarted machine and checked nvidia driver:

 $ nvidia-smi
 NVIDIA-SMI 396.54                 Driver Version: 396.54

Checked nvcc:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17

Checked Cuda:

$ cat /usr/local/cuda/version.txt
CUDA Version 9.0.176

Checked Conda & Tensroflow

$ conda list | grep tensorflow
tensorflow                1.10.0          gpu_py36hcebf108_0    Anaconda
tensorflow-base           1.10.0          gpu_py36had579c0_0    Anaconda
tensorflow-gpu            1.10.0               hf154084_0    Anaconda

Finally tested tensorflow again

>>> import tensorflow as tf
>>> hello = tf.constant('hi,tensorflow')
>>> sess = tf.Session()
>>>

It all worked. The issue was that nvidia driver was not compatible with cuda tensorflow so i updated to latest and it worked.

Share:
19,119
Frank.Fan
Author by

Frank.Fan

Updated on June 08, 2022

Comments

  • Frank.Fan
    Frank.Fan almost 2 years

    I get an error when I run my code, the error is:

    tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

    Here is my code:

    # -*- coding: utf-8 -*-
    import ...
    import ...
    
    checkpoint='/home/vrview/tensorflow/example/char/data/model/'
    MODEL_SAVE_PATH = "/home/vrview/tensorflow/example/char/data/model/"
    
    def getAllImages(folder):
        assert os.path.exists(folder)
        assert os.path.isdir(folder)
        imageList = os.listdir(folder)
        imageList = [os.path.join(folder,item) for item in imageList ]
        num=len(imageList)
        return imageList,num
    
    def get_labei():
        img_dir, num = getAllImages(r"/home/vrview/tensorflow/example/char/data/model/file/")
        for i in range(num):
            image = Image.open(img_dir[i])
            image = image.resize([56, 56])
            image = np.array(image)
            image_array = image
    
            with tf.Graph().as_default():
                image = tf.cast(image_array, tf.float32)
                image_1 = tf.image.per_image_standardization(image)
                image_2 = tf.reshape(image_1, [1, 56, 56, 3])
    
                logit = color_inference.inference(image_2)
                y = tf.nn.softmax(logit)
                x = tf.placeholder(tf.float32, shape=[56, 56, 3])
    
                saver = tf.train.Saver()
                with tf.Session() as sess:
                  ckpt = tf.train.get_checkpoint_state(MODEL_SAVE_PATH)
                  if ckpt and ckpt.model_checkpoint_path:
                       global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
                       saver.restore(sess, ckpt.model_checkpoint_path)
                       print('Loading success, global_step is %s' % global_step)
                       prediction = sess.run(y)
                       max_index = np.argmax(prediction)
                  else:
                       print('No checkpoint file found')
    
            path='/home/vrview/tensorflow/example/char/data/move_file/'+str(max_index)
            isExists = os.path.exists(path)
            if not isExists :
                os.makedirs(path)
            shutil.copyfile(img_dir[i], path)
    
    def main(argv=None):
        get_labei()
    
    if __name__ == '__main__':
        tf.app.run()
    

    And here is my error:

    Traceback (most recent call last):
      File "/home/vrview/tensorflow/example/char/data/model/color_class_2.py", line 61, in <module>
        tf.app.run()
      File "/home/vrview/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
        _sys.exit(main(_sys.argv[:1] + flags_passthrough))
      File "/home/vrview/tensorflow/example/char/data/model/color_class_2.py", line 58, in main
        get_labei()
      File "/home/vrview/tensorflow/example/char/data/model/color_class_2.py", line 40, in get_labei
        with tf.Session() as sess:
      File "/home/vrview/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1187, in __init__
        super(Session, self).__init__(target, graph, config=config)
      File "/home/vrview/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 552, in __init__
        self._session = tf_session.TF_NewDeprecatedSession(opts, status)
      File "/usr/lib/python2.7/contextlib.py", line 24, in __exit__
        self.gen.next()
      File "/home/vrview/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
        pywrap_tensorflow.TF_GetCode(status))
    tensorflow.python.framework.errors_impl.InternalError: Failed to create session.
    
  • Frank.Fan
    Frank.Fan almost 7 years
    I restart my computer and this problem is solved .But I don't know why...
  • JCooke
    JCooke almost 7 years
    Possibly a memory issue. Restarting will clear any locks on the memory and will clear it out.
  • Preetom Saha Arko
    Preetom Saha Arko about 6 years
    After export CUDA_VISIBLE_DEVICES='' is used, no GPU acceleration is being used now. Also, if I work on a server, how can I handle it? I can't restart a server.