Find out what NFSD processes are actually doing?
Solution 1
In this kind of situation I often found very useful to capture the NFS traffic (e.g., with tcpdump or Wireshark) and have a look at it to see if there is a specific reason for the high load.
For example, you can use something like:
tcpdump -w filename.cap "port 2049"
to save only NFS traffic (being on port 2049) to a capture file, then you can open that file on a PC with Wireshark and analyze it more in detail—the last time I had a similar problem, it was a bunch of computation jobs from the same user who was over disk quota, and the clients (18 different machines) were trying over and over to write, raising the load on the old NFS server very high.
Solution 2
Couple of tools for you:
lsof
shows you the open file handlesiotop
shows the process-wise I/O statistics in the top mannernethogs
shows you the per-process network trafficstrace
allows you to see what a process is doing
Related videos on Youtube
Comments
-
BT643 almost 2 years
When I view
top
on one of our servers there are a lot of nfsd processes consuming CPU:PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2769 root 20 0 0 0 0 R 20 0.0 2073:14 nfsd 2774 root 20 0 0 0 0 S 19 0.0 2058:44 nfsd 2767 root 20 0 0 0 0 S 18 0.0 2092:54 nfsd 2768 root 20 0 0 0 0 S 18 0.0 2076:56 nfsd 2771 root 20 0 0 0 0 S 17 0.0 2094:25 nfsd 2773 root 20 0 0 0 0 S 14 0.0 2091:34 nfsd 2772 root 20 0 0 0 0 S 14 0.0 2083:43 nfsd 2770 root 20 0 0 0 0 S 12 0.0 2077:59 nfsd
How do I find out what these are actually doing? Can I see a list of files being accessed by each PID, or any more info?
We're on
Ubuntu Server 12.04
.I tried
nfsstat
but it's not giving me much useful info about what's actually going on.Edit - Additional stuff tried based on comments/answers:
Doing
lsof -p 2774
on each of the PIDs shows the following:COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME nfsd 2774 root cwd DIR 8,1 4096 2 / nfsd 2774 root rtd DIR 8,1 4096 2 / nfsd 2774 root txt unknown /proc/2774/exe
Does that mean no files are being accessed?
When I try and view a process with
strace -f -p 2774
it gives me this error:attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
A
tcpdump | grep nfs
is showing tons of activity between two of our servers, over nfs, but as far as I'm aware they shouldn't be. A lot of entries like:13:56:41.120020 IP 192.168.0.20.nfs > 192.168.0.21.729: Flags [.], ack 4282288820, win 32833, options [nop,nop,TS val 627282027 ecr 263985319,nop,nop,sack 3 {4282317780:4282319228}{4282297508:4282298956}{4282290268:4282291716}], len
-
Ale over 9 yearsIn this kind of situation I often found very useful to capture the NFS traffic (e.g., with
tcpdump
or Wireshark) and have a look at it to see if there is a specific reason for the high load. -
BT643 over 9 yearsInteresting... a
tcpdump | grep nfs
is showing tons of activity between two of our servers, over nfs, but as far as I'm aware they shouldn't be. A lot of entries like:13:56:41.120020 IP 192.168.0.20.nfs > 192.168.0.21.729: Flags [.], ack 4282288820, win 32833, options [nop,nop,TS val 627282027 ecr 263985319,nop,nop,sack 3 {4282317780:4282319228}{4282297508:4282298956}{4282290268:4282291716}], len
-
Ale over 9 yearsyou can use something like
tcpdump -w filename.cap "port 2049"
to save only NFS traffic (being on port 2049) to a capture file, then you can open that file on a PC with Wireshark and analyze it more in detail -- the last time I had a similar problem, it was a bunch of computation jobs from the same user who was over disk quota, and the clients (18 different machines) were trying over and over to write, raising the load on the old NFS server very high -
Ale over 9 yearsAnswer posted :) I'm glad you solved the problem, NFS can be very tricky to debug! Especially when there is lot of activity but no actual disk access (like my over quota user).
-
-
BT643 over 9 yearsFor some reason I'm getting
attach: ptrace(PTRACE_ATTACH, ...): Operation not permitted Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
even though I'm running it as root? -
BT643 over 9 yearsSorry, thought I'd move my comment here before I realised you'd replied :) Thanks! I was able to track down the cause with tcpdump! It was caused by a stuck PHP script which happened to be accessing an NFS share on our second server. I don't think it was actually doing anything which is why it didn't really show in top, iotop, etc, but the amount of stuck processes on that mount seemed to be causing issues :) Thanks again!