How to get current available GPUs in tensorflow?

python gpu tensorflow

344,270

Solution 1

There is an undocumented method called device_lib.list_local_devices() that enables you to list the devices available in the local process. (N.B. As an undocumented method, this is subject to backwards incompatible changes.) The function returns a list of DeviceAttributes protocol buffer objects. You can extract a list of string device names for the GPU devices as follows:

from tensorflow.python.client import device_lib

def get_available_gpus():
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type == 'GPU']

Note that (at least up to TensorFlow 1.4), calling device_lib.list_local_devices() will run some initialization code that, by default, will allocate all of the GPU memory on all of the devices (GitHub issue). To avoid this, first create a session with an explicitly small per_process_gpu_fraction, or allow_growth=True, to prevent all of the memory being allocated. See this question for more details.

Solution 2

You can check all device list using following code:

from tensorflow.python.client import device_lib

device_lib.list_local_devices()

Solution 3

There is also a method in the test util. So all that has to be done is:

tf.test.is_gpu_available()

and/or

tf.test.gpu_device_name()

Look up the Tensorflow docs for arguments.

Solution 4

Since TensorFlow 2.1, you can use tf.config.list_physical_devices('GPU'):

import tensorflow as tf

gpus = tf.config.list_physical_devices('GPU')
for gpu in gpus:
    print("Name:", gpu.name, "  Type:", gpu.device_type)

If you have two GPUs installed, it outputs this:

Name: /physical_device:GPU:0   Type: GPU
Name: /physical_device:GPU:1   Type: GPU

In TF 2.0, you must add experimental:

gpus = tf.config.experimental.list_physical_devices('GPU')

See:

Solution 5

The accepted answer gives you the number of GPUs but it also allocates all the memory on those GPUs. You can avoid this by creating a session with fixed lower memory before calling device_lib.list_local_devices() which may be unwanted for some applications.

I ended up using nvidia-smi to get the number of GPUs without allocating any memory on them.

import subprocess

n = str(subprocess.check_output(["nvidia-smi", "-L"])).count('UUID')

View more solutions

344,270

Sangwon Kim

Hi, I'm interested in data science and machine learning based on cyber security. I graduated from Graduate School of Information Security in KAIST.

Updated on July 08, 2022

Comments

Sangwon Kim almost 2 years
I have a plan to use distributed TensorFlow, and I saw TensorFlow can use GPUs for training and testing. In a cluster environment, each machine could have 0 or 1 or more GPUs, and I want to run my TensorFlow graph into GPUs on as many machines as possible.

I found that when running tf.Session() TensorFlow gives information about GPU in the log messages like below:
```
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
```
My question is how do I get information about current available GPU from TensorFlow? I can get loaded GPU information from the log, but I want to do it in a more sophisticated, programmatic way. I also could restrict GPUs intentionally using the CUDA_VISIBLE_DEVICES environment variable, so I don't want to know a way of getting GPU information from OS kernel.

In short, I want a function like tf.get_available_gpus() that will return ['/gpu:0', '/gpu:1'] if there are two GPUs available in the machine. How can I implement this?
- eric over 2 years
  
  why aren't simple things just easier in tensorflow?
Yaroslav Bulatov almost 8 years

PS, if this method ever gets moved/renamed, I would look inside tensorflow/python/platform/test.py:is_gpu_available since that's being used quite a bit
aarbelle over 7 years

Is there a way to get the devices Free and Total memory? I see that there is a memory_limit field in the DeviceAttributes and I think it is the free memory and not total
Charlie Parker about 7 years

I remember that for earlier versions than 1 tensorflow would print some info about gpus when it was imported in python. Have those messages been removed in the newer tensorflow versions? (hence your suggestion the only way to check gpu stuff)?
mrry about 7 years

@CharlieParker I believe we still print one log line per GPU device on startup in TF1.1.
n1k31t4 almost 7 years

@aarbelle - using the above mentioned method to return all attributes includes a field Free memory for me, using tensorflow1.1. In python: from tensorflow.python.client import device_lib, then device_lib.list_local_devices()
Davidmh almost 7 years

@Kulbear because it contains strictly less information than the existing answer.
loretoparisi about 6 years

This seems that it is not working in Google's Colab with GPU environment, who knows why...
jarandaf almost 6 years

For some reason I don't know, this function call seizes all available GPU memory regardless of whatever session configuration is provided...
Trisoloriansunscreen almost 6 years

This returns just GPU:0
repoleved almost 6 years

@Tal that means you have 1 GPU available (at PCI slot ID 0). So tf.test.is_gpu_available() will return True
Trisoloriansunscreen almost 6 years

The OP requested a method that returns a list of available GPUS. At least on my multi-GPU setup, tf.test.gpu_device_name() returns only the name of the first one.
aboettcher over 5 years

Still prefer this answer due to its simplicity. I am using it directly from bash: python3 -c "from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())"
Steven about 5 years

I agree, this answer saved me time. I just copy/pasted the code without having to read the longer official answer. I know the details, just needed the line of code. It already wasn't picked as the answer and that's sufficient. No need to downvote.
Siddharth Das over 4 years

getting error cannot import name 'format_exc' from 'traceback'
Siddharth Das over 4 years

getting error cannot import name 'format_exc' from 'traceback'
Siddharth Das over 4 years

AttributeError: module 'tensorflow' has no attribute 'test'
shivas over 4 years

Does this work when i use a scaleTier of BASIC_GPU too. When i run this code it give me just the CPUs
FluxLemur over 4 years

Duplicate answer of MiniQuark (but with less detail..)
Vivek Subramanian about 4 years

Command worked great. I had to change 'GPU' to 'XLA_GPU'.
Rahul Iyer over 3 years

@mrry Would you happen to know the answer to this question ? : stackoverflow.com/questions/63374495/…
iperov about 3 years

such list does not match tensorflow list. Enumeration can be different.
CQ is not hot over 2 years

Another thing is after setting tf.config.set_visible_devices(), the aforementioned commands still get all GPUs in that machine.