What does "task mysqld:xxx blocked for more than 120 seconds" mean?

32,857

Solution 1

echo 0 > /proc/sys/kernel/hung_task_timeout_secs only silences the warning. Besides that it has no effect whatsoever. Any value above zero will cause this message to be issued whenever a task is blocked for that amount of time.

The warning is given to indicate a problem with the system. In my experience it means that the process is blocked in kernel space for at least 120 seconds usually because the process is starved of disk I/O. This can be because of heavy swapping due to too much memory being used, e.g. if you have a heavy webserver load and you've configured too many apache child processes for your system. In your case it may just be that there are too many mysql processes competing for memory and data IO.

It can also happen if the underlying storage system is not performing well, e.g. if you have a SAN which is overloaded, or if there are soft errors on a disk which cause a lot of retries. Whenever a task has to wait long for its IO commands to complete, these warning may be issued.

Solution 2

"hung tasks" are bugs that cause the kernel to leave a task stuck in uninterruptible "D" state. A task/process in that state cannot be killed, neither with kill -9.

Your message in /var/log/messages means that the task 4123, obviously the mysql daemon is in that state since 120 seconds. That means the mysql daemon waits for ressources. Mostly the "D" state is caused due to I/O.

When a process requests a read system call on an opened filehandle, it gets into state "D" as long as is needs to read the file. The process is "blocked".

That value in /proc/sys/kernel/hung_task_timeout_secs is only the definition of when that message you see, should appear in the logs. As your message says, you can disable the message when you set the value to 0. But, that doesn't solve your problem.

You can only solve an uninterrubtible state in 2 ways:

  1. Reboot the system, or...
  2. Find out which ressource the process needs, and make sure the ressource is available.
Share:
32,857
Sreeraj
Author by

Sreeraj

Updated on September 18, 2022

Comments

  • Sreeraj
    Sreeraj almost 2 years

    We are troubleshooting a MySQL issue where some queries are taking a very long time complete and I see many of these entry in /var/log/messages:

    Jan 28 05:52:15 64455-alpha01 kernel: [2529273.616327] INFO: task mysqld:4123 blocked for more than 120 seconds.
    Jan 28 05:52:15 64455-alpha01 kernel: [2529273.616525] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Jan 28 05:52:15 64455-alpha01 kernel: [2529273.616813] mysqld        D  000000000000000d     0  4123   3142 0x00000080
    

    What does it mean? How does it affect that MySQL thread (4123 is the thread id?)

    The value in /proc/sys/kernel/hung_task_timeout_secs when I checked now is:

    $ cat /proc/sys/kernel/hung_task_timeout_secs
    120
    

    I specifically would like to know how does it affect the process?

    I read in a forum that it means it happens when that process is holding up too much memory.

    • Anthon
      Anthon over 9 years
      Are you running the database files of a Btrfs filesystem?
    • Sreeraj
      Sreeraj over 9 years
      @Anthon. No. Its an ext4 FS.
    • Anthon
      Anthon over 9 years
      I have seen this on Btrfs, but I am not sure it was caused by this, that's why I asked. Make sure you run smartctl to check each of your drives, one might just be reaching end-of-life.
    • Rob C
      Rob C almost 9 years
      The error message is supposed to include a stack trace, which may provide some hints. It is not necessarily related to memory pressure. I have seen it happen on a system where only half the RAM was in use.
  • Sreeraj
    Sreeraj over 9 years
    I think this is the case here. A holland-backup was running (for a long time, unexpected) and it looks like that was causing the long query time. This blocked for more than 120 seconds was an after-effect.