How to increase nvme_core.io_timeout on my c5 EC2 instacnce

6,302

Based on my own experimentation, we do this while building our AMIs.

cp /etc/default/grub /tmp/grub cat >>/tmp/grub <<'EOF' GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX} nvme_core.io_timeout=255" EOF sudo mv /tmp/grub /etc/default/grub sudo update-grub

Then create an AMI from the instance. When you start a new EC2 instance from the AMI, it comes up with the correct setting.

Obviously this can be modify to set any kernel parameter.

Share:
6,302
mchawre
Author by

mchawre

Updated on September 18, 2022

Comments

  • mchawre
    mchawre over 1 year

    We have mesos cluster where we're running centos7 c5 instances on aws. The kernel version is the latest 4.16.1-1.

    In c5 instance type the volumes uses nvme drivers. The nvme volumes seems to have a behavior as mentioned here where if there is an io timeout on a volume, the volume mount becomes read only and no further writes can happen. So if there is heavy read-write operations on your device like on root drive then after the io timeout no further writes can happen so its dangerous.

    In AWS documentation it mentioned to set an io timeout as high as possible and it seems to be 4294967295 sec.

    AWS doc specify that default io timeout is 30sec, but it is max 255 sec for kernel prior to 4.15 version and 4294967295 sec for kernel 4.15+. As we have latest 4.16.1 kernel we should set it to max 4294967295 sec.

    But when I try to set the nvme_core.io_timeout parameter to the max value, it didn't get refelected. I tried this

    sh-4.2# modprobe nvme_core io_timeout=123457
    sh-4.2# cat /sys/module/nvme_core/parameters/io_timeout
    30
    sh-4.2#
    

    What is the correct way to set nvme_core.io_timeout I tried lot of other things like

    1. setting it in /etc/default/grub file
    2. sysctl command
    3. Overriding /sys/module/nvme_core/parameters/io_timeout file

    But Nothing helped.