Please help configuring NVIDIA-SMI Ubuntu 20.04 on WSL 2

12,397

Solution 1

If nbody works then you have everything well configured. The problem is NVIDIA drivers limitations. https://docs.nvidia.com/cuda/wsl-user-guide/index.html#known-limitations

NVIDIA Management Library (NVML) APIs are not supported.

nvidia-smi is based on top of the NVIDIA Management Library (NVML).

Solution 2

An update to @onoma's answer. From https://docs.nvidia.com/cuda/wsl-user-guide/index.html#known-limitations :

6. nvidia-smi is not yet packaged for CUDA on WSL 2.

Hopefully this will be solved in future by nvidia.

Share:
12,397
Lars Ericson
Author by

Lars Ericson

Updated on September 18, 2022

Comments

  • Lars Ericson
    Lars Ericson over 1 year

    Following this announcement and somewhat trying to follow this confusing thread, I

    • installed Windows Version 10.0.20150 Build 20150
    • installed NVidia Driver version 455.51
    • installed Ubuntu 20.04 LTS from the Windows Store

    I started Ubuntu and tried to run NVIDIA-SMI. It told me it wasn't there but that I could install it with one of these options:

    Command 'nvidia-smi' not found, but can be installed with:
    
    sudo apt install nvidia-340        # version 340.108-0ubuntu2, or
    sudo apt install nvidia-utils-390  # version 390.132-0ubuntu2
    sudo apt install nvidia-utils-435  # version 435.21-0ubuntu7
    sudo apt install nvidia-utils-440  # version 440.82+really.440.64-0ubuntu6
    

    Note that there is no nvidia-utils-450 option corresponding to my 455.51, which the NVidia thread above said somewhere is required to make things go. I then ran

    sudo apt install nvidia-utils-440
    nvidia-smi
    

    and it said "No devices found".

    Then I found this guide. I uninstalled Ubunto 20.04, and then followed the guide. The guide asked me to

    • install a vanilla Ubuntu (no release number), which I did instead of 20.04. (This turns out to give me 20.04).
    • install Windows Terminal (I chose the Preview version)
    • check to receive updates for related Windows programs
    • update the kernel to 4.9.121
    • install NVIDIA CUDA drivers on Windows 10 (I already did 455, have to check the CUDA release)
    • install Docker
    • install NVidia Container Toolkit
    • test

    The "install docker" part of that guide seems to be buggy. I couldn't get docker service to start. So I uninstalled my Ubuntu and repeated the steps up to that point, without touching Docker. Then (my version), the steps from the Docker point are (for docker part I am following these instructions to get Docker):

    sudo apt-get update
    sudo apt-get upgrade
    sudo apt update
    sudo apt install apt-transport-https ca-certificates curl software-properties-common
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu focal stable"
    sudo apt update
    apt-cache policy docker-ce
    sudo apt install docker-ce
    sudo systemctl status docker
    

    The last step fails. I get this message:

    $ sudo systemctl status docker
    System has not been booted with systemd as init system (PID 1). Can't operate.
    Failed to connect to bus: Host is down
    

    That led me here and the 4th and almost lowest-scored answer seems to work, except it needs to be run in background mode:

    sudo dockerd &
    sudo usermod -aG docker your-user
    

    Then I go back to the guide post-Docker install step and resume with

    docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
    

    and this fails with

    ERRO[2020-06-23T07:28:28.582848400-04:00] 5cd9b9d7011ba20f72971dd27900b23b2c0f6be656b0bd53b9e178944fe4eba6 cleanup: failed to delete container from containerd: no such container
    ERRO[2020-06-23T07:28:28.582946600-04:00] Handler for POST /v1.40/containers/5cd9b9d7011ba20f72971dd27900b23b2c0f6be656b0bd53b9e178944fe4eba6/start returned error: could not select device driver "" with capabilities: [[gpu]]
    docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
    ERRO[0018] error waiting for container: context canceled
    

    Finally I went back to the NVidia announcement and did these steps:

    sudo apt-get update
    distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
    curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
    curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee /etc/apt/sources.list.d/libnvidia-container-experimental.list
    sudo apt-get update
    sudo apt-get install -y nvidia-docker2
    sudo dockerd &
    docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark
    

    SUCCESS: and I got a happy result:

    > Windowed mode
    > Simulation data stored in video memory
    > Single precision floating point simulation
    > 1 Devices used for simulation
    GPU Device 0: "Quadro M500M" with compute capability 5.0
    
    > Compute 5.0 CUDA device: [Quadro M500M]
    3072 bodies, total time for 10 iterations: 3.817 ms
    = 24.724 billion interactions per second
    = 494.487 single-precision GFLOP/s at 20 flops per interaction
    

    HOWEVER, per answer below, there is no NVIDIA-SMI, per known NVIDIA limitations.

    FURTHER NOTE: The docker container test above works on Ubuntu shell. It does not work on Windows Powershell Preview with the Ubuntu tab.

    • Rémi
      Rémi about 3 years
      Note that you can run the Windows version of nvidia-smi from inside wsl.