A top-like utility for monitoring CUDA activity on a GPU

243,921

Solution 1

I find gpustat very useful. In can be installed with pip install gpustat, and prints breakdown of usage by processes or users.

enter image description here

Solution 2

To get real-time insight on used resources, do:

nvidia-smi -l 1

This will loop and call the view at every second.

If you do not want to keep past traces of the looped call in the console history, you can also do:

watch -n0.1 nvidia-smi

Where 0.1 is the time interval, in seconds.

enter image description here

Solution 3

I'm not aware of anything that combines this information, but you can use the nvidia-smi tool to get the raw data, like so (thanks to @jmsu for the tip on -l):

$ nvidia-smi -q -g 0 -d UTILIZATION -l

==============NVSMI LOG==============

Timestamp                       : Tue Nov 22 11:50:05 2011

Driver Version                  : 275.19

Attached GPUs                   : 2

GPU 0:1:0
    Utilization
        Gpu                     : 0 %
        Memory                  : 0 %

Solution 4

Just use watch nvidia-smi, it will output the message by 2s interval in default.

For example, as the below image:

enter image description here

You can also use watch -n 5 nvidia-smi (-n 5 by 5s interval).

Solution 5

Use argument "--query-compute-apps="

nvidia-smi --query-compute-apps=pid,process_name,used_memory --format=csv

for further help, please follow

nvidia-smi --help-query-compute-app
Share:
243,921

Related videos on Youtube

natorro
Author by

natorro

Actuary, love data analysis, with special interest in risk, financial mathematics, data mining, spacial data analysis, social networking analysis, supercomputing, HPC, CUDA, OpenCL, Hadoop, big data analysis and visualization.

Updated on April 15, 2022

Comments

  • natorro
    natorro about 2 years

    I'm trying to monitor a process that uses CUDA and MPI, is there any way I could do this, something like the command "top" but that monitors the GPU too?

    • changqi.xia
      changqi.xia over 5 years
      "nvidia-smi pmon -i 0" can monitor all process running on nvidia GPU 0
  • jmsu
    jmsu over 12 years
    I think if you add a -l to that you get it to update continuously effectively monitoring the GPU and memory utilization.
  • natorro
    natorro over 12 years
    What if when I run it the GPU utilizacion just says N/A??
  • jmsu
    jmsu over 12 years
    @natorro Looks like nVidia dropped support for some cards. Check this link forums.nvidia.com/index.php?showtopic=205165
  • ali_m
    ali_m over 8 years
    I prefer watch -n 0.5 nvidia-smi, which avoids filling your terminal with output
  • william_grisaitis
    william_grisaitis about 8 years
    Or you can just do nvidia-smi -l 2. Or to prevent repeated console output, watch -n 2 'nvidia-smi'
  • Lenar Hoyt
    Lenar Hoyt over 7 years
    You can also get the PIDs of compute programs that occupy the GPU of all users without sudo like this: nvidia-smi --query-compute-apps=pid --format=csv,noheader
  • Mick T
    Mick T about 6 years
    Querying the card every 0.1 seconds? Is that going to cause load on the card? Plus, using watch, your starting a new process every 0.1 seconds.
  • rand
    rand about 6 years
    Sometimes nvidia-smi does not list all processes, so you end up with your memory used by processes not listed there. This is the main way I can track and kill those processes.
  • SebMa
    SebMa about 6 years
    @grisaitis Carefull, I don't think the pmem given by ps takes into account the total memory of the GPU but that of the CPU because ps is not "Nvidia GPU" aware
  • changqi.xia
    changqi.xia over 5 years
    nvidia-smi pmon -i 0
  • abhimanyuaryan
    abhimanyuaryan almost 5 years
    after you put watch gpustat -cp you can see stats continuously but colors are gone. How do you fix that? @Alleo
  • CasualScience
    CasualScience almost 5 years
    @AbhimanyuAryan use watch -c. @Roman Orac, Thank you, this also worked for me on redhat 8 when I was getting some error due to importing _curses in python.
  • Lee Netherton
    Lee Netherton over 4 years
    watch -c gpustat -cp --color
  • Gabriel Romon
    Gabriel Romon over 4 years
    watch -n 0.5 -c gpustat -cp --color
  • Mohammad Javad
    Mohammad Javad over 4 years
    @MickT Is it a big deal? As the Nvidia-smi have this building loop! Is the "watch" command very different from the nvidia-smi -l ?
  • Mick T
    Mick T over 4 years
    It might be, I've seen lower-end cards have weird lock-ups and I think it's because too many users were running nvidia-smi on the cards. I think using 'nvidia-smi -l' is a better way to go as your not forking a new process every time. Also, checking the card every 0.1 second is overkill, I'd do every second when I'm trying to debug an issue, otherwise I do every 5 minutes to monitor performance. I hope that helps! :)
  • TrostAft
    TrostAft over 4 years
    @Gulzar yes, it is.
  • jayelm
    jayelm about 4 years
    gpustat now has a --watch option: gpustat -cp --watch
  • Hossein
    Hossein over 3 years
    very neat! thanks a lot! its also available in latest ubuntu (20.04) which was a breeze for me just doing sudo apt install nvtop and done!
  • user894319twitter
    user894319twitter over 3 years
    Not quite "filtered on processes that consume your GPUs.". They can just change settings... But I don't know a better alternative...
  • user894319twitter
    user894319twitter over 3 years
    right now you monitor CPU performance of any processes that operate (actually compute, change settings or even monitor) GPUs. I guess this is NOT what was asked in original question. I think question was just about "compute" part...
  • user894319twitter
    user894319twitter over 3 years
    nvidia-smi --help-query-compute-app Invalid combination of input arguments. Please run nvidia-smi -h for help.
  • Alexey
    Alexey over 2 years
    use --help-query-compute-apps
  • Pramit
    Pramit over 2 years
    Nice interface, good stuff! Thanks for sharing.
  • n1k31t4
    n1k31t4 over 2 years
    You can run nvidia-smi -lms 500 (every 500 milliseconds) over a long period of time - e.g. a week - without any issues that you might face using watch.
  • Mello
    Mello over 2 years
    I received an error after install nvitop: _curses.error: curs_set() returned ERR
  • Jacob Waters
    Jacob Waters about 2 years
    Updating every .1s, aka every 100ms, is a long time for a computer. I doubt it would make a difference in performance either way.