How to get current available GPUs in tensorflow?
Solution 1
There is an undocumented method called device_lib.list_local_devices()
that enables you to list the devices available in the local process. (N.B. As an undocumented method, this is subject to backwards incompatible changes.) The function returns a list of DeviceAttributes
protocol buffer objects. You can extract a list of string device names for the GPU devices as follows:
from tensorflow.python.client import device_lib
def get_available_gpus():
local_device_protos = device_lib.list_local_devices()
return [x.name for x in local_device_protos if x.device_type == 'GPU']
Note that (at least up to TensorFlow 1.4), calling device_lib.list_local_devices()
will run some initialization code that, by default, will allocate all of the GPU memory on all of the devices (GitHub issue). To avoid this, first create a session with an explicitly small per_process_gpu_fraction
, or allow_growth=True
, to prevent all of the memory being allocated. See this question for more details.
Solution 2
You can check all device list using following code:
from tensorflow.python.client import device_lib
device_lib.list_local_devices()
Solution 3
There is also a method in the test util. So all that has to be done is:
tf.test.is_gpu_available()
and/or
tf.test.gpu_device_name()
Look up the Tensorflow docs for arguments.
Solution 4
Since TensorFlow 2.1, you can use tf.config.list_physical_devices('GPU')
:
import tensorflow as tf
gpus = tf.config.list_physical_devices('GPU')
for gpu in gpus:
print("Name:", gpu.name, " Type:", gpu.device_type)
If you have two GPUs installed, it outputs this:
Name: /physical_device:GPU:0 Type: GPU
Name: /physical_device:GPU:1 Type: GPU
In TF 2.0, you must add experimental
:
gpus = tf.config.experimental.list_physical_devices('GPU')
See:
Solution 5
The accepted answer gives you the number of GPUs but it also allocates all the memory on those GPUs. You can avoid this by creating a session with fixed lower memory before calling device_lib.list_local_devices() which may be unwanted for some applications.
I ended up using nvidia-smi to get the number of GPUs without allocating any memory on them.
import subprocess
n = str(subprocess.check_output(["nvidia-smi", "-L"])).count('UUID')
Related videos on Youtube
Sangwon Kim
Hi, I'm interested in data science and machine learning based on cyber security. I graduated from Graduate School of Information Security in KAIST.
Updated on July 08, 2022Comments
-
Sangwon Kim almost 2 years
I have a plan to use distributed TensorFlow, and I saw TensorFlow can use GPUs for training and testing. In a cluster environment, each machine could have 0 or 1 or more GPUs, and I want to run my TensorFlow graph into GPUs on as many machines as possible.
I found that when running
tf.Session()
TensorFlow gives information about GPU in the log messages like below:I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0 I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y I tensorflow/core/common_runtime/gpu/gpu_device.cc:838] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
My question is how do I get information about current available GPU from TensorFlow? I can get loaded GPU information from the log, but I want to do it in a more sophisticated, programmatic way. I also could restrict GPUs intentionally using the CUDA_VISIBLE_DEVICES environment variable, so I don't want to know a way of getting GPU information from OS kernel.
In short, I want a function like
tf.get_available_gpus()
that will return['/gpu:0', '/gpu:1']
if there are two GPUs available in the machine. How can I implement this?-
eric over 2 yearswhy aren't simple things just easier in tensorflow?
-
-
Yaroslav Bulatov almost 8 yearsPS, if this method ever gets moved/renamed, I would look inside tensorflow/python/platform/test.py:is_gpu_available since that's being used quite a bit
-
aarbelle over 7 yearsIs there a way to get the devices Free and Total memory? I see that there is a memory_limit field in the DeviceAttributes and I think it is the free memory and not total
-
Charlie Parker about 7 yearsI remember that for earlier versions than 1 tensorflow would print some info about gpus when it was imported in python. Have those messages been removed in the newer tensorflow versions? (hence your suggestion the only way to check gpu stuff)?
-
mrry about 7 years@CharlieParker I believe we still print one log line per GPU device on startup in TF1.1.
-
n1k31t4 almost 7 years@aarbelle - using the above mentioned method to return all attributes includes a field
Free memory
for me, usingtensorflow1.1
. In python:from tensorflow.python.client import device_lib
, thendevice_lib.list_local_devices()
-
Davidmh almost 7 years@Kulbear because it contains strictly less information than the existing answer.
-
loretoparisi about 6 yearsThis seems that it is not working in Google's Colab with GPU environment, who knows why...
-
jarandaf almost 6 yearsFor some reason I don't know, this function call seizes all available GPU memory regardless of whatever session configuration is provided...
-
Trisoloriansunscreen almost 6 yearsThis returns just GPU:0
-
repoleved almost 6 years@Tal that means you have 1 GPU available (at PCI slot ID 0). So
tf.test.is_gpu_available()
will returnTrue
-
Trisoloriansunscreen almost 6 yearsThe OP requested a method that returns a list of available GPUS. At least on my multi-GPU setup, tf.test.gpu_device_name() returns only the name of the first one.
-
aboettcher over 5 yearsStill prefer this answer due to its simplicity. I am using it directly from bash:
python3 -c "from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())"
-
Steven about 5 yearsI agree, this answer saved me time. I just copy/pasted the code without having to read the longer official answer. I know the details, just needed the line of code. It already wasn't picked as the answer and that's sufficient. No need to downvote.
-
Siddharth Das over 4 yearsgetting error
cannot import name 'format_exc' from 'traceback'
-
Siddharth Das over 4 yearsgetting error
cannot import name 'format_exc' from 'traceback'
-
Siddharth Das over 4 yearsAttributeError: module 'tensorflow' has no attribute 'test'
-
shivas over 4 yearsDoes this work when i use a scaleTier of BASIC_GPU too. When i run this code it give me just the CPUs
-
FluxLemur over 4 yearsDuplicate answer of MiniQuark (but with less detail..)
-
Vivek Subramanian about 4 yearsCommand worked great. I had to change
'GPU'
to'XLA_GPU'
. -
Rahul Iyer over 3 years@mrry Would you happen to know the answer to this question ? : stackoverflow.com/questions/63374495/…
-
iperov about 3 yearssuch list does not match tensorflow list. Enumeration can be different.
-
CQ is not hot over 2 yearsAnother thing is after setting
tf.config.set_visible_devices()
, the aforementioned commands still get all GPUs in that machine.