Why Ubuntu is slow on massive network, disk I/O?
Solution 1
You could experiment with IO schedulers. The default IO scheduler is CFQ which works pretty well for desktops but its been my experience that for file servers Deadline tends to work better. You can change the IO Scheduler on the fly so you can experiment with it easily to see what works best in your situation.
To list the available io schedulers use this command.
cat /sys/block/sdb/queue/scheduler
This should return noop anticipatory deadline [cfq]
To change your scheduler to deadline use the following command on the appropriate device.
sudo echo "deadline" > /sys/block/sdb/queue/scheduler
Solution 2
Try running iotop
- it should show you something.
Solution 3
Do you see that many interrupts (System - in) and Context Switches (System - cs) during normal operation? I wonder because of your description of even the mouse cursor becoming slow. If there is a problem causing your system to be overwhelmed by interrupts under load this would cause everything to slow down.
And just to take a total shot in the dark, is there anything in /var/log/dmesg about errors or timeouts from your disks or raid devices?
Edit 1:
I ran across an article this morning that really sounded like the issue that you are seeing on your box. Greg Smith walks through an analysis of a server that seems to freeze disk writes for extended periods of time. His particular investigative method involves running the command:
while [ 1 ]; do cat /proc/meminfo; sleep 1; done
and looking at the "Writeback:" cache size before and during a period where the system seems to hang. If the writeback cache is indeed filling up (roughly >40% full) and causing the system to suspend writes while it flushes then Greg suggests some OS tuning that might mitigate the problem. Greg's blog entry can be found at http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html
Solution 4
I'm not sure if it happens on Linux, but on Windows Samba transfers on a high speed network can outpace disk I/O speed, and because some earlier Windows versions have very non-smart network transfer caching, you can end up with a very large bunch of data in your memory in buffers that are waiting to be written to disk. This often kills responsiveness on XP and earlier systems (maybe Vista, too, IDK I never have used it significantly).
Solution 5
I want to say that ReiserFS has a single lock, and is not really suitable for a large (many disks) raid for that reason. But is has been a long time, so I could be wrong.
I suspect changing the scheduler would help quite a bit.
Related videos on Youtube
Alex N
Updated on September 17, 2022Comments
-
Alex N almost 2 years
Not sure where to start on this one, but I am constantly seeing this strange issue on my Ubuntu Hardy.
System is Core i7-920 with RAID10 disks and 3Gb RAM, though that maybe besides the point. It has multiple Samba shares on it. Every time someone uploads something large(multiple gigs) to the share, system responsiveness drops significantly(noticeably).
Filesystem: ReiserFS (v3)
Both vmstat and top show no significant wait time for I/O, very few blocking processes(like 2 for 4 core system), and occasional writes of ~13000 blocks to disk. Avg. load is constantly under 0.5(again system is quad core with HT enabled, so it has 8 logical cores).
However, even when I move a mouse cursor it lags badly...
here is typical vmstat output during heavy incoming network I/O:
vmstat -n 1 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 1 0 419268 93724 48052 2071148 0 0 9 3 11 4 1 1 95 2 1 0 419268 91560 48052 2073292 0 0 0 0 2396 5716 5 1 94 0 0 0 419268 89636 48056 2075164 0 0 0 0 2173 5537 2 1 97 0 2 0 419268 87836 48056 2077136 0 0 0 0 2057 5216 1 1 98 0 1 0 419268 85716 48060 2078812 0 0 0 10104 2108 5261 2 1 97 0 0 0 419268 91940 48060 2071748 0 0 0 0 2221 6153 2 1 97 0 2 0 419268 90368 48064 2073640 0 0 0 0 2104 5384 1 1 98 0 0 0 419268 89000 48064 2075092 0 0 0 0 1781 4700 1 1 98 0 1 0 419268 87140 48064 2076640 0 0 0 0 2045 5104 1 1 98 0 1 1 419268 85584 48068 2078240 0 0 0 10112 1945 4343 2 1 91 7 0 0 419268 92668 48068 2071764 0 0 0 16 2064 5197 2 1 96 1
-
feverzsj over 14 yearsWhat filesystem are you using?
-
Alex N over 14 yearsRAID is on ReiserFS
-
tearman over 12 yearsQuick thought, are you using the so called "Green" drives? WD makes their GP edition which has issues with very large file systems. I had this same issue where my load average would go through the roof whenever any I/O operations were performed on that drive at all. Not sure if that's relevant to your situation however.
-
Alex N over 12 yearsInteresting.. I was not using the WD drives in that setup, it was all Seagate enterprise level disks(forgot the actual model.. this was a while ago :). I have another RAID5 fileserver that actually does use WD 'green' drives. So far I only had an issue with one disk re-allocating too many blocks.
-
-
Alex N over 14 yearsnothing irregular as a matter of fact, few thousands(2-4) CS and few hundreds INs. Looked at dmesg - all is good :) RAID is healthy
-
Alex N over 14 yearsit does show relatively small throughput(~2Mb/s).. ;(
-
Alex N over 14 yearsThat's pretty interesting, thank you Jeff! I'll definitely try doing the same.
-
hdgarrood over 11 yearswhy does one of those commands work and not the other?
-
3dinfluence over 11 yearshdgarrood: The problem with using sudo is that the elevated privileges that sudo grants you doesn't carry beyond the command itself so the privileges end at the redirect, ">". The 2nd command wraps up the command and the redirect inside of the sudo command where the privileges are elevated so that the redirect has write permission to the sysfs.