Disk IO causing high load on Xen/CentOS guest

5,018

I would blame it on the old xen-version used on your host (CentOS 5.5). I can really recommend using SLES 10 or 11. 11 has XEN 4.x built in.

I have no performance issues with my CentOS 5 DomUs running on SLES 10 Dom0s.

Share:
5,018

Related videos on Youtube

Peter Lindqvist
Author by

Peter Lindqvist

Updated on September 17, 2022

Comments

  • Peter Lindqvist
    Peter Lindqvist almost 2 years

    Background

    I'm having serious issues with a xen based server, this is on the guest partition. It's a paravirtualized CentOS 5.5. I'm not sure if it's hardware or software related, or in between (drivers).

    Basic information

    Updated controller firmware (this was done as the last step)

    Smart Array 6i in Slot 0
       Hardware Revision: Rev B
       Firmware Version: 2.84
    

    Updated kernel

    Linux domU 2.6.18-194.32.1.el5xen #1 SMP Wed Jan 5 19:32:33 EST 2011 i686 i686 i386 GNU/Linux
    

    The problem is in disk write speed.

    Baseline performance is

    • dom0 ~30MB/s
    • domU ~4MB/s (small files)
    • domU ~1.5MB/s (large files)

    The following numbers are taken from top while copying a large file over the network.

    If i copy the file another time the speed decreases in relation to load average. So the second time it's half the speed of the first time.

    It needs some time to cool off after this. Load average slowly decreases until it's once again usable. ls / takes about 30 seconds.

    top - 13:26:44 up 13 days, 21:44,  2 users,  load average: 7.03, 5.08, 3.15
    Tasks: 134 total,   2 running, 132 sleeping,   0 stopped,   0 zombie
    Cpu(s):  0.0%us,  0.1%sy,  0.0%ni, 25.3%id, 74.5%wa,  0.0%hi,  0.0%si,  0.1%st
    Mem:   1048752k total,  1041460k used,     7292k free,     3116k buffers
    Swap:  2129912k total,       40k used,  2129872k free,   904740k cached
    
      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
     1506 root      10  -5     0    0    0 S  0.3  0.0   0:03.94 cifsd
        1 root      15   0  2172  644  556 S  0.0  0.1   0:00.08 init
    

    Meanwhile the host is ~0.5 load avg and steady over time. ~50% wait

    Server hardware is dual xeon, 3gb ram, 170gb scsi 320 10k rpm, and shouldn't have any problems with copying files over the network.

    disk = [ "tap:aio:/vm/domU.img,xvda,w" ]
    

    I also get these in the log

    INFO: task syslogd:1350 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    syslogd       D 00062E4F  2208  1350      1          1353  1312 (NOTLB)
           c0ef0ed0 00000286 6e71a411 00062e4f c0ef0f18 00000009 c0f20000 6e738bfd
           00062e4f 0001e7ec c0f2010c c181a724 c1abd200 00000000 ffffffff c0ef0ecc
           c041a180 00000000 c0ef0ed8 c03d6a50 00000000 00000000 c03d6a00 00000000
    Call Trace:
     [<c041a180>] __wake_up+0x2a/0x3d
     [<ee06a1ea>] log_wait_commit+0x80/0xc7 [jbd]
     [<c043128b>] autoremove_wake_function+0x0/0x2d
     [<ee065661>] journal_stop+0x195/0x1ba [jbd]
     [<c0490a32>] __writeback_single_inode+0x1a3/0x2af
     [<c04568ea>] do_writepages+0x2b/0x32
     [<c045239b>] __filemap_fdatawrite_range+0x66/0x72
     [<c04910ce>] sync_inode+0x19/0x24
     [<ee09b007>] ext3_sync_file+0xaf/0xc4 [ext3]
     [<c047426f>] do_fsync+0x41/0x83
     [<c04742ce>] __do_fsync+0x1d/0x2b
     [<c0405413>] syscall_call+0x7/0xb
     =======================
    

    I have tried disabling irqbalanced as suggested here but it does not seem to make any difference.

    Updates:

    domU# cat /sys/block/xvda/queue/scheduler
    [noop] anticipatory deadline cfq
    

    Copying files from/to the disk causes load to remain < 4. Subsequent copying causes load to increase. Copying files over the network causes load > 4 in the first run, subsequent copying causes the server to halt almost completely, demanding time to cool off, it never goes down, just give it 10-15 mins to get back up. But it's really not viable for a server to behave like that.

    The network traffic is in itself not causing any trouble running iperf for instance does not produce any measurable effect. Reported bandwidth is > 1gbit.

    Write performance on dom0 is OK

    dom0# dd if=/dev/zero of=./test1024M bs=1024k count=1024 conv=fsync
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB) copied, 34.9725 seconds, 30.7 MB/s
    

    Write performance on domU is sluggish

    domU# dd if=/dev/zero of=./test1024M bs=1024k count=1024 conv=fsync
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB) copied, 622.163 seconds, 1.7 MB/s
    

    This is ~95% performance loss.

    Read performance is OK

    dom0# hdparm -tT /dev/cciss/c0d0p1
    
    /dev/cciss/c0d0p1:
     Timing cached reads:   3352 MB in  2.00 seconds = 1676.70 MB/sec
     Timing buffered disk reads:  100 MB in  2.59 seconds =  38.57 MB/sec
    
    domU# hdparm -tT /dev/xvda
    
    /dev/xvda:
     Timing cached reads:   3144 MB in  2.00 seconds = 1571.51 MB/sec
     Timing buffered disk reads:  120 MB in  3.03 seconds =  39.67 MB/sec
    

    Update:

    So it appears that this was hardware related after all. But it didn't show up until running xen. The battery was not charging ok, this led to the cache being disabled. This in combination with a VM running intensive IO led to high wait times.

    Now immediately after upgrading firmware nothing had changed, but since the firmware upgrade, the battery is now charging correctly. And after the battery is fully charged, the write speeds are now acceptable, for small files the exceed those of dom0, no clue as to why that happens.

    domU# dd if=/dev/zero of=./test1024M bs=1024k count=1024 conv=fsync
    1024+0 records in
    1024+0 records out
    1073741824 bytes (1.1 GB) copied, 39.1087 seconds, 27.5 MB/s
    
    • MastaJeet
      MastaJeet over 13 years
      What is the OS of the host? On the guest what is the output of cat /sys/block/xvda/queue/scheduler?
    • Peter Lindqvist
      Peter Lindqvist over 13 years
      What scheduler am i supposed to use?
    • MastaJeet
      MastaJeet over 13 years
      You are supposed to use the noop, which you are.
    • Peter Lindqvist
      Peter Lindqvist over 13 years
      But this can't be expected, can it? I mean the performance is really bad. Can it have something to do with preallocated/sparse files? If i do ls on the file it reports 50GB but if i do df it only reports the part of the file that's being used, say 10GB.
    • DutchUncle
      DutchUncle over 13 years
      This looks like a bug report for Xen/CentOS, I would take your problem to both of them and see what ideas they might have. Check against their known bugs database first.
    • Peter Lindqvist
      Peter Lindqvist over 13 years
      I thought i'd try to narrow the problem down to software before reporting it as a bug. I'll order a new battery for the cache module before reporting it.
    • Peter Lindqvist
      Peter Lindqvist over 13 years
      So this has been resolved. But i'll leave it up here for future reference.
    • DutchUncle
      DutchUncle over 13 years
      So didn't your RAID controller raise some kind of alarm?
    • Peter Lindqvist
      Peter Lindqvist over 13 years
      It did during POST, but who watches that on a server in an off-limits server room? :) The company i work for doesnt have ILO licenses either. It wasn't until I started fiddling with the firmware that I noticed really. It was a fun excercise though.