Slow disk I/O in KVM with LVM and md raid5

5,190

Solution 1

Not sure that my note covers all the problem but with this storage configuration you could not get proper alignment.

Lets see,

  • You can align partition boundaries according to RAID stripe size, it is OK.
  • You can set file-system optimization parameters accordingly, it is also OK.

  • But in order to align LVM properly you need the RAID stripes to fit LVM extent.

LVM extent size is always power of 2. Thus your RAID stripe size needs to be power of 2. In order to get it you need a number of disks in RAID5 to be equal to 2^N+1 = 3, 5, 9 ...

With 4 disks in RAID5 it is impossible.

Since software RAID5 has no protected write-back cache It may significantly vulnerable with "partial stripe write penalty".

May be you also have another causes restricting write performance but the first I would have done - migration to RAID10. With RAID10 on all 6 disks you are able to get read performance and guest storage capacity comparable to your initial setup ... and no headache with alignment ;).

Solution 2

I've done extensive testing on KVM and caching performance (you can read here, here and here) and many of the recommendation that you find on the Internet is obsolete of plain wrong. But let's proceed one step at a time...

  1. RAID5 needs a much smaller chunk size (in the order of 32-64K) than yours (512K, as seen from pastebin) and a BBU (for writeback caching) to be viable for anything different that sequential read/write patterns.
  2. Avoid RAID5 for any workloads that need IOPS and/or issues a significant portion of random writes. This is EXACTLY what a guest OS need (IOPS and random write speed), so the choice of RAID5 (without and hardware BBU for enabling array-wide writeback caching) is the wrong one. Use RAID10 instead.
  3. Absolutely enable cache=writeback for your guests. Cache=none is slightly better only on sequential reads/writes or absolutely irregular and cache-unfriendly patterns.
  4. Use virtio drivers inside your guests each time it is possible.
Share:
5,190

Related videos on Youtube

FrontSlash
Author by

FrontSlash

Updated on September 18, 2022

Comments

  • FrontSlash
    FrontSlash over 1 year

    I have been battling with a kvm setup under debian for two weeks now, specifically with guest I/O disk performance.

    The system:

    -Supermicro 1018d-73mtf (X10SL7-F motherboard)
    -16GB ECC/UB
    -Intel Xeon E3-1240v3
    -6xWD Red 750GB 6Gb/s
    

    On this i am running a Debian Wheezy on two of the disks, the other four disks are setup with md for raid5 with LVM on top for guest storage. Performance directly on the raid5 (measured by creating a LV and mounting it and running bonnie++ and dd tests) is fine, giving me ~220/170MB/s read/write, but on guests i get decent reads and 40-50MB/s writes, tested on both Windows (Server 2012) and Linux (Debian). I have read up on aligning disks and partitions and recreated the raid and lvm setup by the book, but havnt recieved any performance boosts.

    When measuring loads with atop during writes directly from the host i can see that the disks and lvm are getting high loads, but measuring while a guest is writing shows the disks at ~20-30% and lvm getting "red" (100%).

    The normal tweaks of KVM/host system have been done, setting scheduler to deadline, setting stripe caches for the raid, cache=none on the guests, reflashing the SAS controller card to IT-mode (LSI 2308) and i am out of ideas, here is a pastebin with relevant information about the setup in the hopes of someone noticing something i've done wrong http://pastebin.com/vuykxeVg.

    If you need anything else il paste it.

    Edits:

    This is basicly how the drives, md and lvm is set up, with some changes because i am running 3 disks + spare. http://dennisfleurbaaij.blogspot.se/2013/01/setting-up-linux-mdadm-raid-array-with.html

    Screenshots of atop during host and guest writetests (bonnie++)

    Host: http://i.imgur.com/IsTprqA.png

    Guest: http://i.imgur.com/uVmhFCK.png

    • Admin
      Admin over 10 years
      Can you show me the output of cat /sys/block/sda/queue/scheduler?
    • Admin
      Admin over 10 years
      sda? Sure but its not used for the guests: noop deadline [cfq] For the actual raid-disks its: noop [deadline] cfq
    • Admin
      Admin over 10 years
      I meant you should replace sda with your internal disk inside the guest.
    • Admin
      Admin over 10 years
      On the linux guest i have been running noop, just did another bonnie++ test and got slightly better performance than i did previously on the linux vm, from 50 to 65MB/s seq writes, without changing anything
    • Admin
      Admin over 10 years
      What is RAID(5?!) doing inside VM o_O? (Following snapshot link for Guest)
    • Admin
      Admin over 10 years
      The atop reading was done on the host while writes were being done within the guest, otherwise i wouldnt be able to spot performance problems on the raid or the lvm in this case (which might be a falsepositive)
    • Admin
      Admin over 10 years
      Please provide bonnie+ command line parameters
    • Admin
      Admin over 10 years
      Is lxc an option for you (instead of using KVM)?
    • Admin
      Admin over 10 years
      @Veniamin, bonnie++ -u root -d /tmp
    • Admin
      Admin over 10 years
      @Nils, no, the main point of this server is to have Windows 2012 running on it, the linux guest was just to test if windows was the culprit
    • Admin
      Admin over 10 years
      You can see that writes are still misaligned for Guest case: notable read activity on sd[cde] and KiB/w far from ideal 512K. What I can not understand at all - KiB/w for LVM/MD devices equals to 100K. Even if a subsystem is not aware of a stripe size then 128K or some other 2^N would be expectable.
    • Admin
      Admin about 9 years
      The first thing i see wrong with your setup is the stripe (chunk) size. As far as my experience goes, you won't get much out of a stripe this large. For normal use I'd try with way smaller (8k-16k?), to see what it does.
  • FrontSlash
    FrontSlash over 10 years
    Mind you that only 3 of the disks are active in the raid5, the fourth is spare, as seen in the mdstat output
  • FrontSlash
    FrontSlash over 10 years
    Using latest virtio, cache settings have all been tried and cache=none gave the best performance
  • Veniamin
    Veniamin over 10 years
    Ok, did you perform partition aligning and FS optimization inside VM?
  • FrontSlash
    FrontSlash over 10 years
    Yep, see the pastebin it has parted info about the windows part
  • David Corsalini
    David Corsalini over 10 years
    That's because people use KVM on distributions that don't do proper testing and integration, like Debian.
  • Veniamin
    Veniamin over 10 years
    Windows is not aware that your stripe is 1MB, nor is virtio driver. I would try RAID10 at least to finaly commit the issue with partial stripe write.
  • FrontSlash
    FrontSlash over 10 years
    My stripe isnt 1MB, i've done the same tests with a raid1 mounted in windows and debian with no improvement, might that you missed that i have tested this with both linux and windows, linux with properly aligned sectors (mkfs.ext4 –m 0 -b 4096 -E stripe-width=256,stride=128) Stripe-width based on the fact that there are two data disks and stride being 128 since stripe-size/block-size.
  • Veniamin
    Veniamin over 10 years
    stripe-width=256=4k*256=1M - exactly. If you want to enjoy with RAID5 - be sure to increase "stripe_cache_size" parameter of mdraid, default value of 265K does not work at all with your jumbo stripes. Finished.
  • Nils
    Nils over 10 years
    @FrontSlash So the "do not use caching" is propably your best path here - if you do not want to abandon KVM and/or Debian.
  • Nils
    Nils over 10 years
    @dyasny are there other distributions that do proper testing and integration?
  • FrontSlash
    FrontSlash over 10 years
    @Nils, your link to "do not use caching" is broken, was it anything other than setting cache=none for the guest?
  • David Corsalini
    David Corsalini over 10 years
    @Nils of course. KVM and QEMU and developed on Fedora and get heavy QA on RHEL, everybody else simply ports them over, sometimes quite poorly
  • Nils
    Nils over 10 years
    @FrontSlash The link is working perfectly - even on a different browser and android. The core of "no use caching" was to disable caching on the KVM server and enable it within the VM.
  • FrontSlash
    FrontSlash over 10 years
    the stripe_cache_size has been tested with several values, none giving much results, set at 16384 for now