Weird nfs performance: 1 thread better than 8, 8 better than 2!

linux performance xen nfs

7,409

When a client request comes in, it gets handed off to one of the threads and the rest of the threads are asked to do read-ahead operations. The fastest way to read a file is to have one thread do it sequentially... So for one file this is overkill, and the threads are in essence making more work for themselves. But what's true for 1 client reading 1 file won't necessarily be true when you deploy in the real world, so stick with the formula for basing number of threads and number of read-aheads off bandwidth/cpu specs.

7,409

Joe

Updated on September 17, 2022

Comments

Joe almost 2 years
I'm trying to determine the cause of poor nfs performance between two Xen Virtual Machines (client & server) running on the same host. Specifically, the speed at which I can sequentially read a 1GB file on the client is much lower than what would be expected based on the measured network connection speed between the two VMs and the measured speed of reading the file directly on the server. The VMs are running Ubuntu 9.04 and the server is using the nfs-kernel-server package.

According to various NFS tuning resources, changing the number of nfsd threads (in my case kernel threads) can affect performance. Usually this advice is framed in terms of increasing the number from the default of 8 on heavily-used servers. What I find in my current configuration:

RPCNFSDCOUNT=8: (default): 13.5-30 seconds to cat a 1GB file on the client so 35-80MB/sec

RPCNFSDCOUNT=16: 18s to cat the file 60MB/s

RPCNFSDCOUNT=1: 8-9 seconds to cat the file (!!?!) 125MB/s

RPCNFSDCOUNT=2: 87s to cat the file 12MB/s

I should mention that the file I'm exporting is on a RevoDrive SSD mounted on the server using Xen's PCI-passthrough; on the server I can cat the file in under seconds (> 250MB/s). I am dropping caches on the client before each test.

I don't really want to leave the server configured with just one thread as I'm guessing that won't work so well when there are multiple clients, but I might be misunderstanding how that works. I have repeated the tests a few times (changing the server config in between) and the results are fairly consistent. So my question is: why is the best performance with 1 thread?

A few other things I have tried changing, to little or no effect:
- increasing the values of /proc/sys/net/ipv4/ipfrag_low_thresh and /proc/sys/net/ipv4/ipfrag_high_thresh to 512K, 1M from the default 192K,256K
- increasing the value of /proc/sys/net/core/rmem_default and /proc/sys/net/core/rmem_max to 1M from the default of 128K
- mounting with client options rsize=32768, wsize=32768
From the output of sar -d I understand that the actual read sizes going to the underlying device are rather small (<100 bytes) but this doesn't cause a problem when reading the file locally on the client.

The RevoDrive actually exposes two "SATA" devices /dev/sda and /dev/sdb, then dmraid picks up a fakeRAID-0 striped across them which I have mounted to /mnt/ssd and then bind-mounted to /export/ssd. I've done local tests on my file using both locations and see the good performance mentioned above. If answers/comments ask for more details I will add them.
Joe over 13 years

I agree with this logic, however it doesn't really explain why 1 thread is better than 8 but much worse than 2 -- that behavior suggests a weird interaction/bug involving perhaps kernel thread scheduling, I/O buffer sharing/locking, etc. The read-ahead angle is interesting, but still doesn't explain the steep performance drop.
Admin over 13 years

docstore.mik.ua/orelly/networking_2ndEd/nfs/ch18_05.htm
Admin over 13 years

In your question you have 1 thread being the fastest... to me that makes sense, when you think about what the disk is doing, and the thread overhead/synchronization. As far as 3 vs 6 vs 9, etc... a lot of that is noise. The link above explains it a bit better.
Joe over 12 years

pdm, thanks for your answer way back. Although it didn't solve the underlying issue (which I still haven't solved but only happens on Xen so there's a lot of magic to consider), you did answer the actual question (bold in the original question text) quite well.