A top-like utility for monitoring CUDA activity on a GPU

cuda process-monitoring resource-monitor

243,921

Solution 1

I find gpustat very useful. In can be installed with pip install gpustat, and prints breakdown of usage by processes or users.

Solution 2

To get real-time insight on used resources, do:

nvidia-smi -l 1

This will loop and call the view at every second.

If you do not want to keep past traces of the looped call in the console history, you can also do:

watch -n0.1 nvidia-smi

Where 0.1 is the time interval, in seconds.

Solution 3

I'm not aware of anything that combines this information, but you can use the nvidia-smi tool to get the raw data, like so (thanks to @jmsu for the tip on -l):

$ nvidia-smi -q -g 0 -d UTILIZATION -l

==============NVSMI LOG==============

Timestamp                       : Tue Nov 22 11:50:05 2011

Driver Version                  : 275.19

Attached GPUs                   : 2

GPU 0:1:0
    Utilization
        Gpu                     : 0 %
        Memory                  : 0 %

Solution 4

Just use watch nvidia-smi, it will output the message by 2s interval in default.

For example, as the below image:

You can also use watch -n 5 nvidia-smi (-n 5 by 5s interval).

Solution 5

Use argument "--query-compute-apps="

nvidia-smi --query-compute-apps=pid,process_name,used_memory --format=csv

for further help, please follow

nvidia-smi --help-query-compute-app

View more solutions

243,921

natorro

Actuary, love data analysis, with special interest in risk, financial mathematics, data mining, spacial data analysis, social networking analysis, supercomputing, HPC, CUDA, OpenCL, Hadoop, big data analysis and visualization.

Updated on April 15, 2022

Comments

natorro about 2 years

I'm trying to monitor a process that uses CUDA and MPI, is there any way I could do this, something like the command "top" but that monitors the GPU too?
- changqi.xia over 5 years
  
  "nvidia-smi pmon -i 0" can monitor all process running on nvidia GPU 0
jmsu over 12 years

I think if you add a -l to that you get it to update continuously effectively monitoring the GPU and memory utilization.
natorro over 12 years

What if when I run it the GPU utilizacion just says N/A??
jmsu over 12 years

@natorro Looks like nVidia dropped support for some cards. Check this link forums.nvidia.com/index.php?showtopic=205165
ali_m over 8 years

I prefer watch -n 0.5 nvidia-smi, which avoids filling your terminal with output
william_grisaitis about 8 years

Or you can just do nvidia-smi -l 2. Or to prevent repeated console output, watch -n 2 'nvidia-smi'
Lenar Hoyt over 7 years

You can also get the PIDs of compute programs that occupy the GPU of all users without sudo like this: nvidia-smi --query-compute-apps=pid --format=csv,noheader
Mick T about 6 years

Querying the card every 0.1 seconds? Is that going to cause load on the card? Plus, using watch, your starting a new process every 0.1 seconds.
rand about 6 years

Sometimes nvidia-smi does not list all processes, so you end up with your memory used by processes not listed there. This is the main way I can track and kill those processes.
SebMa about 6 years

@grisaitis Carefull, I don't think the pmem given by ps takes into account the total memory of the GPU but that of the CPU because ps is not "Nvidia GPU" aware
changqi.xia over 5 years

nvidia-smi pmon -i 0
abhimanyuaryan almost 5 years

after you put watch gpustat -cp you can see stats continuously but colors are gone. How do you fix that? @Alleo
CasualScience almost 5 years

@AbhimanyuAryan use watch -c. @Roman Orac, Thank you, this also worked for me on redhat 8 when I was getting some error due to importing _curses in python.
Lee Netherton over 4 years

watch -c gpustat -cp --color
Gabriel Romon over 4 years

watch -n 0.5 -c gpustat -cp --color
Mohammad Javad over 4 years

@MickT Is it a big deal? As the Nvidia-smi have this building loop! Is the "watch" command very different from the nvidia-smi -l ?
Mick T over 4 years

It might be, I've seen lower-end cards have weird lock-ups and I think it's because too many users were running nvidia-smi on the cards. I think using 'nvidia-smi -l' is a better way to go as your not forking a new process every time. Also, checking the card every 0.1 second is overkill, I'd do every second when I'm trying to debug an issue, otherwise I do every 5 minutes to monitor performance. I hope that helps! :)
TrostAft over 4 years

@Gulzar yes, it is.
jayelm about 4 years

gpustat now has a --watch option: gpustat -cp --watch
Hossein over 3 years

very neat! thanks a lot! its also available in latest ubuntu (20.04) which was a breeze for me just doing sudo apt install nvtop and done!
user894319twitter over 3 years

Not quite "filtered on processes that consume your GPUs.". They can just change settings... But I don't know a better alternative...
user894319twitter over 3 years

right now you monitor CPU performance of any processes that operate (actually compute, change settings or even monitor) GPUs. I guess this is NOT what was asked in original question. I think question was just about "compute" part...
user894319twitter over 3 years

nvidia-smi --help-query-compute-app Invalid combination of input arguments. Please run nvidia-smi -h for help.
Alexey over 2 years

use --help-query-compute-apps
Pramit over 2 years

Nice interface, good stuff! Thanks for sharing.
n1k31t4 over 2 years

You can run nvidia-smi -lms 500 (every 500 milliseconds) over a long period of time - e.g. a week - without any issues that you might face using watch.
Mello over 2 years

I received an error after install nvitop: _curses.error: curs_set() returned ERR
Jacob Waters about 2 years

Updating every .1s, aka every 100ms, is a long time for a computer. I doubt it would make a difference in performance either way.