How do I install CUDA on an EC2 Ubuntu 18.04 instance?

8,625

You should follow the official AWS documentation for EC2 to use NVidia there. You must also have a GPU-enabled instance. The regular instances do not have NVidia access, as I understand.

Share:
8,625
tsujp
Author by

tsujp

Updated on September 18, 2022

Comments

  • tsujp
    tsujp over 1 year

    I've read quite a few guides around the web, gists, and other posts on this exchange and I cannot find anything that works. Every time I get to nvidia-smi it returns that it cannot communicate.

    NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

    Installing CUDA:

    1. sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
    2. sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
    3. sudo apt-get update
    4. sudo apt-get install cuda
    • MrYouMath
      MrYouMath over 5 years
      do you want to use tensorflow-gpu?
  • tsujp
    tsujp over 5 years
    How do I check if it's GPU enabled, I'm provisioning the EC2 P3 instances. Specifically the P3.8 and I have followed their guide both downloading it manually and downloading the public drivers and I still get the same error at nvidia-smi.
  • tsujp
    tsujp over 5 years
    I also get this error whenever I try and install the appropriate drivers for V100s on the instances which makes no sense to me because these are the exact drivers for these exact GPUs: ` WARNING: You do not appear to have an NVIDIA GPU supported by the 410.79 NVIDIA Linux graphics driver installed in this system. For further details, please see the appendix SUPPORTED NVIDIA GRAPHICS CHIPS in the README available on the Linux driver download page at www.nvidia.com. `
  • dobey
    dobey over 5 years
    aws.amazon.com/ec2/instance-types suggests P3 instances are GPU enabled, so if you are following the documentation from Amazon, and using that instance type, and it's not working, maybe it would be best to contact AWS support for further help. It sounds like perhaps your instance is supposed to have GPUs and doesn't. Do they show up in lspci output?