Why Ubuntu is slow on massive network, disk I/O?

linux raid

6,417

Solution 1

You could experiment with IO schedulers. The default IO scheduler is CFQ which works pretty well for desktops but its been my experience that for file servers Deadline tends to work better. You can change the IO Scheduler on the fly so you can experiment with it easily to see what works best in your situation.

To list the available io schedulers use this command.

cat /sys/block/sdb/queue/scheduler

This should return noop anticipatory deadline [cfq]

To change your scheduler to deadline use the following command on the appropriate device.

sudo echo "deadline" > /sys/block/sdb/queue/scheduler

Solution 2

Try running iotop - it should show you something.

Solution 3

Do you see that many interrupts (System - in) and Context Switches (System - cs) during normal operation? I wonder because of your description of even the mouse cursor becoming slow. If there is a problem causing your system to be overwhelmed by interrupts under load this would cause everything to slow down.

And just to take a total shot in the dark, is there anything in /var/log/dmesg about errors or timeouts from your disks or raid devices?

Edit 1:

I ran across an article this morning that really sounded like the issue that you are seeing on your box. Greg Smith walks through an analysis of a server that seems to freeze disk writes for extended periods of time. His particular investigative method involves running the command:

while [ 1 ]; do cat /proc/meminfo; sleep 1; done

and looking at the "Writeback:" cache size before and during a period where the system seems to hang. If the writeback cache is indeed filling up (roughly >40% full) and causing the system to suspend writes while it flushes then Greg suggests some OS tuning that might mitigate the problem. Greg's blog entry can be found at http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html

Solution 4

I'm not sure if it happens on Linux, but on Windows Samba transfers on a high speed network can outpace disk I/O speed, and because some earlier Windows versions have very non-smart network transfer caching, you can end up with a very large bunch of data in your memory in buffers that are waiting to be written to disk. This often kills responsiveness on XP and earlier systems (maybe Vista, too, IDK I never have used it significantly).

Solution 5

I want to say that ReiserFS has a single lock, and is not really suitable for a large (many disks) raid for that reason. But is has been a long time, so I could be wrong.

I suspect changing the scheduler would help quite a bit.

View more solutions

6,417

Alex N

Updated on September 17, 2022

Comments

Alex N almost 2 years
Not sure where to start on this one, but I am constantly seeing this strange issue on my Ubuntu Hardy.

System is Core i7-920 with RAID10 disks and 3Gb RAM, though that maybe besides the point. It has multiple Samba shares on it. Every time someone uploads something large(multiple gigs) to the share, system responsiveness drops significantly(noticeably).

Filesystem: ReiserFS (v3)

Both vmstat and top show no significant wait time for I/O, very few blocking processes(like 2 for 4 core system), and occasional writes of ~13000 blocks to disk. Avg. load is constantly under 0.5(again system is quad core with HT enabled, so it has 8 logical cores).

However, even when I move a mouse cursor it lags badly...

here is typical vmstat output during heavy incoming network I/O:
```
vmstat -n 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0 419268  93724  48052 2071148    0    0     9     3   11    4  1  1 95  2
 1  0 419268  91560  48052 2073292    0    0     0     0 2396 5716  5  1 94  0
 0  0 419268  89636  48056 2075164    0    0     0     0 2173 5537  2  1 97  0
 2  0 419268  87836  48056 2077136    0    0     0     0 2057 5216  1  1 98  0
 1  0 419268  85716  48060 2078812    0    0     0 10104 2108 5261  2  1 97  0
 0  0 419268  91940  48060 2071748    0    0     0     0 2221 6153  2  1 97  0
 2  0 419268  90368  48064 2073640    0    0     0     0 2104 5384  1  1 98  0
 0  0 419268  89000  48064 2075092    0    0     0     0 1781 4700  1  1 98  0
 1  0 419268  87140  48064 2076640    0    0     0     0 2045 5104  1  1 98  0
 1  1 419268  85584  48068 2078240    0    0     0 10112 1945 4343  2  1 91  7
 0  0 419268  92668  48068 2071764    0    0     0    16 2064 5197  2  1 96  1
```
- feverzsj over 14 years
  
  What filesystem are you using?
- Alex N over 14 years
  
  RAID is on ReiserFS
- tearman over 12 years
  
  Quick thought, are you using the so called "Green" drives? WD makes their GP edition which has issues with very large file systems. I had this same issue where my load average would go through the roof whenever any I/O operations were performed on that drive at all. Not sure if that's relevant to your situation however.
- Alex N over 12 years
  
  Interesting.. I was not using the WD drives in that setup, it was all Seagate enterprise level disks(forgot the actual model.. this was a while ago :). I have another RAID5 fileserver that actually does use WD 'green' drives. So far I only had an issue with one disk re-allocating too many blocks.
Alex N over 14 years

nothing irregular as a matter of fact, few thousands(2-4) CS and few hundreds INs. Looked at dmesg - all is good :) RAID is healthy
Alex N over 14 years

it does show relatively small throughput(~2Mb/s).. ;(
Alex N over 14 years

That's pretty interesting, thank you Jeff! I'll definitely try doing the same.
hdgarrood over 11 years

why does one of those commands work and not the other?
3dinfluence over 11 years

hdgarrood: The problem with using sudo is that the elevated privileges that sudo grants you doesn't carry beyond the command itself so the privileges end at the redirect, ">". The 2nd command wraps up the command and the redirect inside of the sudo command where the privileges are elevated so that the redirect has write permission to the sysfs.