How to get current available GPUs in tensorflow?

344,270

Solution 1

There is an undocumented method called device_lib.list_local_devices() that enables you to list the devices available in the local process. (N.B. As an undocumented method, this is subject to backwards incompatible changes.) The function returns a list of DeviceAttributes protocol buffer objects. You can extract a list of string device names for the GPU devices as follows:

from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == 'GPU']

Note that (at least up to TensorFlow 1.4), calling device_lib.list_local_devices() will run some initialization code that, by default, will allocate all of the GPU memory on all of the devices (GitHub issue). To avoid this, first create a session with an explicitly small per_process_gpu_fraction, or allow_growth=True, to prevent all of the memory being allocated. See this question for more details.

Solution 2

You can check all device list using following code:

from tensorflow.python.client import device_lib

device_lib.list_local_devices()

Solution 3

There is also a method in the test util. So all that has to be done is:

tf.test.is_gpu_available()

and/or

tf.test.gpu_device_name()

Look up the Tensorflow docs for arguments.

Solution 4

Since TensorFlow 2.1, you can use tf.config.list_physical_devices('GPU'):

import tensorflow as tf

gpus = tf.config.list_physical_devices('GPU')
for gpu in gpus:
    print("Name:", gpu.name, "  Type:", gpu.device_type)

If you have two GPUs installed, it outputs this:

Name: /physical_device:GPU:0   Type: GPU
Name: /physical_device:GPU:1   Type: GPU

In TF 2.0, you must add experimental:

gpus = tf.config.experimental.list_physical_devices('GPU')

See:

Solution 5

The accepted answer gives you the number of GPUs but it also allocates all the memory on those GPUs. You can avoid this by creating a session with fixed lower memory before calling device_lib.list_local_devices() which may be unwanted for some applications.

I ended up using nvidia-smi to get the number of GPUs without allocating any memory on them.

import subprocess

n = str(subprocess.check_output(["nvidia-smi", "-L"])).count('UUID')
Share:
344,270

Related videos on Youtube

Sangwon Kim
Author by

Sangwon Kim

Hi, I'm interested in data science and machine learning based on cyber security. I graduated from Graduate School of Information Security in KAIST.

Updated on July 08, 2022

Comments

  • Sangwon Kim
    Sangwon Kim almost 2 years

    I have a plan to use distributed TensorFlow, and I saw TensorFlow can use GPUs for training and testing. In a cluster environment, each machine could have 0 or 1 or more GPUs, and I want to run my TensorFlow graph into GPUs on as many machines as possible.

    I found that when running tf.Session() TensorFlow gives information about GPU in the log messages like below:

    I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
    I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
    I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
    

    My question is how do I get information about current available GPU from TensorFlow? I can get loaded GPU information from the log, but I want to do it in a more sophisticated, programmatic way. I also could restrict GPUs intentionally using the CUDA_VISIBLE_DEVICES environment variable, so I don't want to know a way of getting GPU information from OS kernel.

    In short, I want a function like tf.get_available_gpus() that will return ['/gpu:0', '/gpu:1'] if there are two GPUs available in the machine. How can I implement this?

    • eric
      eric over 2 years
      why aren't simple things just easier in tensorflow?
  • Yaroslav Bulatov
    Yaroslav Bulatov almost 8 years
    PS, if this method ever gets moved/renamed, I would look inside tensorflow/python/platform/test.py:is_gpu_available since that's being used quite a bit
  • aarbelle
    aarbelle over 7 years
    Is there a way to get the devices Free and Total memory? I see that there is a memory_limit field in the DeviceAttributes and I think it is the free memory and not total
  • Charlie Parker
    Charlie Parker about 7 years
    I remember that for earlier versions than 1 tensorflow would print some info about gpus when it was imported in python. Have those messages been removed in the newer tensorflow versions? (hence your suggestion the only way to check gpu stuff)?
  • mrry
    mrry about 7 years
    @CharlieParker I believe we still print one log line per GPU device on startup in TF1.1.
  • n1k31t4
    n1k31t4 almost 7 years
    @aarbelle - using the above mentioned method to return all attributes includes a field Free memory for me, using tensorflow1.1. In python: from tensorflow.python.client import device_lib, then device_lib.list_local_devices()
  • Davidmh
    Davidmh almost 7 years
    @Kulbear because it contains strictly less information than the existing answer.
  • loretoparisi
    loretoparisi about 6 years
    This seems that it is not working in Google's Colab with GPU environment, who knows why...
  • jarandaf
    jarandaf almost 6 years
    For some reason I don't know, this function call seizes all available GPU memory regardless of whatever session configuration is provided...
  • Trisoloriansunscreen
    Trisoloriansunscreen almost 6 years
    This returns just GPU:0
  • repoleved
    repoleved almost 6 years
    @Tal that means you have 1 GPU available (at PCI slot ID 0). So tf.test.is_gpu_available() will return True
  • Trisoloriansunscreen
    Trisoloriansunscreen almost 6 years
    The OP requested a method that returns a list of available GPUS. At least on my multi-GPU setup, tf.test.gpu_device_name() returns only the name of the first one.
  • aboettcher
    aboettcher over 5 years
    Still prefer this answer due to its simplicity. I am using it directly from bash: python3 -c "from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())"
  • Steven
    Steven about 5 years
    I agree, this answer saved me time. I just copy/pasted the code without having to read the longer official answer. I know the details, just needed the line of code. It already wasn't picked as the answer and that's sufficient. No need to downvote.
  • Siddharth Das
    Siddharth Das over 4 years
    getting error cannot import name 'format_exc' from 'traceback'
  • Siddharth Das
    Siddharth Das over 4 years
    getting error cannot import name 'format_exc' from 'traceback'
  • Siddharth Das
    Siddharth Das over 4 years
    AttributeError: module 'tensorflow' has no attribute 'test'
  • shivas
    shivas over 4 years
    Does this work when i use a scaleTier of BASIC_GPU too. When i run this code it give me just the CPUs
  • FluxLemur
    FluxLemur over 4 years
    Duplicate answer of MiniQuark (but with less detail..)
  • Vivek Subramanian
    Vivek Subramanian about 4 years
    Command worked great. I had to change 'GPU' to 'XLA_GPU'.
  • Rahul Iyer
    Rahul Iyer over 3 years
    @mrry Would you happen to know the answer to this question ? : stackoverflow.com/questions/63374495/…
  • iperov
    iperov about 3 years
    such list does not match tensorflow list. Enumeration can be different.
  • CQ is not hot
    CQ is not hot over 2 years
    Another thing is after setting tf.config.set_visible_devices(), the aforementioned commands still get all GPUs in that machine.