How to increase nvme_core.io_timeout on my c5 EC2 instacnce

linux kernel aws modprobe nvme

6,302

Based on my own experimentation, we do this while building our AMIs.

cp /etc/default/grub /tmp/grub cat >>/tmp/grub <<'EOF' GRUB_CMDLINE_LINUX="${GRUB_CMDLINE_LINUX} nvme_core.io_timeout=255" EOF sudo mv /tmp/grub /etc/default/grub sudo update-grub

Then create an AMI from the instance. When you start a new EC2 instance from the AMI, it comes up with the correct setting.

Obviously this can be modify to set any kernel parameter.

6,302

Author by

mchawre

Updated on September 18, 2022

Comments

mchawre over 1 year
We have mesos cluster where we're running centos7 c5 instances on aws. The kernel version is the latest 4.16.1-1.

In c5 instance type the volumes uses nvme drivers. The nvme volumes seems to have a behavior as mentioned here where if there is an io timeout on a volume, the volume mount becomes read only and no further writes can happen. So if there is heavy read-write operations on your device like on root drive then after the io timeout no further writes can happen so its dangerous.

In AWS documentation it mentioned to set an io timeout as high as possible and it seems to be 4294967295 sec.

AWS doc specify that default io timeout is 30sec, but it is max 255 sec for kernel prior to 4.15 version and 4294967295 sec for kernel 4.15+. As we have latest 4.16.1 kernel we should set it to max 4294967295 sec.

But when I try to set the nvme_core.io_timeout parameter to the max value, it didn't get refelected. I tried this
```
sh-4.2# modprobe nvme_core io_timeout=123457
sh-4.2# cat /sys/module/nvme_core/parameters/io_timeout
30
sh-4.2#
```
What is the correct way to set nvme_core.io_timeout I tried lot of other things like
1. setting it in /etc/default/grub file
2. sysctl command
3. Overriding /sys/module/nvme_core/parameters/io_timeout file
But Nothing helped.