df hangs and I have no idea why
Thanks to @StéphaneChazelas I used strace to see where df was hanging which lead to this answer: https://serverfault.com/a/918160/256146
Restarting that and then restarting dovecot restored system operation.
Related videos on Youtube
TheDavidFactor
Updated on September 18, 2022Comments
-
TheDavidFactor almost 2 years
[backstory] I've got a CentOS 7.5.1804 mail server (postfix/dovecot) that seems to be working, however when I added a new email client, it didn't create the mail folder for the user [/backstory]
I ran df -h to see if I was out of space and it hung. I then ran df which also hung. I found this question: How to kill a process which can't be killed without rebooting? and started poking around.
I've only got two physical drives on the server. I can run df on each of those mounts (/ and /mnt/external) and it runs correctly. I ran mount and it shows a ton of entries that I'm not familiar with. I was pretty sure they don't all show up in df, so I compared the output of df with mount on another CentOS box I've got (same version) and df shows a handful of mount points (/, /dev, /dev/shm, /run, /sys/fs/cgroup, /boot, /run/user/1000) so I tried running df against each of these individually on the suspect server and they all completed correctly.
I'm able to access /proc/$PID/* on the hung df instances (one of the answers in the referenced question suggested that) and I can cat /proc/$PID/syscall which isn't changing, but I don't know how to interpret it.
StackExchange, you're my only hope! (Ok, maybe not, but I'm hoping someone can point me in the right direction)
dmesg output:
[4400391.164795] systemd-journald[16672]: File /run/log/journal/b2bcc90256f446df876bea47007f387b/system.journal corrupted or uncleanly shut down, renaming and replacing. [4400392.726853] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs this repeats 556 times [4856400.839491] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs [4856640.666072] INFO: task df:22269 blocked for more than 120 seconds. [4856640.666180] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [4856640.666310] df D ffff88007fc13680 0 22269 21526 0x00000084 [4856640.666316] ffff88004fa57a30 0000000000000082 ffff880079fd4440 ffff88004fa57fd8 [4856640.666320] ffff88004fa57fd8 ffff88004fa57fd8 ffff880079fd4440 ffff88003641cf98 [4856640.666324] ffff88003641cfa0 7fffffffffffffff ffff880079fd4440 ffff880035e44e00 [4856640.666328] Call Trace: [4856640.666338] [<ffffffff816096a9>] schedule+0x29/0x70 [4856640.666342] [<ffffffff816075f9>] schedule_timeout+0x209/0x2d0 [4856640.666347] [<ffffffff810980e6>] ? finish_wait+0x56/0x70 [4856640.666351] [<ffffffff81607a42>] ? mutex_lock+0x12/0x2f [4856640.666355] [<ffffffff8124a8fa>] ? autofs4_wait+0x37a/0x870 [4856640.666359] [<ffffffff81609ba6>] wait_for_completion+0x116/0x170 [4856640.666363] [<ffffffff810a9500>] ? wake_up_state+0x20/0x20 [4856640.666366] [<ffffffff8124ba5b>] autofs4_expire_wait+0x6b/0x110 [4856640.666371] [<ffffffff81270fcd>] ? selinux_inode_setsecurity+0x6d/0x140 [4856640.666374] [<ffffffff81248b52>] do_expire_wait+0x172/0x190 [4856640.666378] [<ffffffff81248d7b>] autofs4_d_manage+0x9b/0x160 [4856640.666383] [<ffffffff811d0b85>] follow_managed+0xb5/0x300 [4856640.666386] [<ffffffff811d149b>] lookup_fast+0x19b/0x2e0 [4856640.666390] [<ffffffff811d4495>] path_lookupat+0x165/0x7a0 [4856640.666394] [<ffffffff811ab5a5>] ? kmem_cache_alloc+0x35/0x1d0 [4856640.666397] [<ffffffff811d6a0f>] ? getname_flags+0x4f/0x1a0 [4856640.666400] [<ffffffff811d4afb>] filename_lookup+0x2b/0xc0 [4856640.666404] [<ffffffff811d7b37>] user_path_at_empty+0x67/0xc0 [4856640.666408] [<ffffffff810f0ad2>] ? from_kgid_munged+0x12/0x20 [4856640.666412] [<ffffffff811cba1f>] ? cp_new_stat+0x14f/0x180 [4856640.666415] [<ffffffff811d7ba1>] user_path_at+0x11/0x20 [4856640.666419] [<ffffffff811cb513>] vfs_fstatat+0x63/0xc0 [4856640.666422] [<ffffffff811cba7e>] SYSC_newstat+0x2e/0x60 [4856640.666426] [<ffffffff810fac96>] ? __audit_syscall_exit+0x1e6/0x280 [4856640.666429] [<ffffffff811cbd5e>] SyS_newstat+0xe/0x10 [4856640.666434] [<ffffffff81614209>] system_call_fastpath+0x16/0x1b
strace df ends with:
stat("/sys/fs/selinux", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 stat("/proc/sys/fs/binfmt_misc",
-
Stéphane Chazelas almost 6 yearsWhat do you see if you do
strace df -h
? -
Stéphane Chazelas almost 6 yearsAny detail from
/proc/self/mountinfo
about those tons of entries from the output ofmount
? -
Stéphane Chazelas almost 6 yearsAnything of interest in the output of
dmesg
? -
TheDavidFactor almost 6 yearsI evidently don't have strace installed, I'm working on that. Looking at the "ton" of entries, the ones that df doesn't list (on my working CentOS box) are the ones under /dev and /sys/fs/cgroup (df lists those directories, but mount lists a bunch of subdirectories)
-
TheDavidFactor almost 6 years@StéphaneChazelas I added info to the question
-
Bob almost 6 yearsWhat is the question? Where namely
df
doesn't work if it works on/
and/mnt/external
? -
TheDavidFactor almost 6 yearsFound the answer using the output of strace here: serverfault.com/questions/911811/centos-7-df-started-to-hang THANK YOU @StéphaneChazelas for the pointers
-