Disk latency within a virtual machine

7,904

Solution 1

Measuring time within a VM can be problematic, as the virtual processors don't execute continuously. If you want to get a clear view of what's actually happening, use Performance Monitor in the management OS. Look for Hyper-V Virtual Storage Device. You can correlate that with data from Resource Monitor, too, to see what's contending for access to the disks.

In general, the response time of a particular VHD will have everything to do with what else is happening on the volume hosting that VHD.

Solution 2

Your 'disk latency' on the VM could be CPU latency on the host since the host has to use CPU cycles for IO requests.

Is the host heavily loaded overall? Or is it just running a lot of VMs? Not sure what the hyper-v equivalent, but the VMWare metric is CPU ready time - basically how often is the VM waiting on the host to run.

Share:
7,904

Related videos on Youtube

john
Author by

john

Technology and programming enthusiast with a solid understanging of network communications. I started out moonlighting as a PHP developer and then got hired as a junior technician in a school. Now, 10 years on, I'm working as a Senior Tech in Infrastructure for an Engineering company. I still maintain my web development skills when I can. My main areas of expertise are automation, virtualisation and storage. I don't think there's ever a time when I don't have a PowerShell console open. I maintain a VM infrastructure, a hybrid Office 365 implementation and an AD forest.

Updated on September 18, 2022

Comments

  • john
    john over 1 year

    I've been looking at some performance issues on a clustered virtual machine in our organisation. Actually this problem seems to affect most of the virtual machines I have looked at. Both host and VM are 2008R2 with SP1.

    I believe - from what I have read in various articles and advice I have been given - that I/O latency is the most important metric to be looking at. I've looked at this metric in three different places:

    • LUN latency on the storage appliance
    • Logical disk average sec/write and average sec/read on the Hyper-v host
    • The same as above, but on the virtual machines themselves

    This is in an effort to narrow down the source of any latency that might be happening. Sure enough, this is what I found....

    What I'm seeing is what I would consider to be acceptable latency (3-15ms) on the LUNs, up to 20ms (still acceptable) on the Hyper-V host. When I look at the same metrics on a VM I'm seeing regular spikes of up to 300ms for up to 10 seconds at a time and an average of about 20-30ms.

    This particular VM is a SQL server, but the same applies to non-SQL servers too. The relevant exceptions are added to our AV solution to avoid on-access scanning of DB files. Also, our VHDs are of a fixed size as opposed to dynamically expanding.

    So for my question:

    What are the likely causes of this latency, and/or what other metrics could I be using within the VM (or even on the Host) to narrow this down?

  • john
    john over 10 years
    When you say 'volume', I guess you would be referring to the CSV as opposed to any volume the LUNs might sit in?
  • Jake Oshins
    Jake Oshins over 10 years
    I'm not sure what your terminology is getting at. Yes, CSV is the file system that is typically used for VHDs, and that sits on a volume, which is shared across the host cluster. That volume sits on top of a LUN, which is part of a disk pool, usually (with Server 2008 R2) in a SAN. What, specifically, are you asking?
  • john
    john over 10 years
    Sorry, we have volumes on our storage appliances too. Are you suggesting that the logical disk metrics on the VM are useless?
  • john
    john over 10 years
    The idle time overall is about 80-90% on the host, so not the host is not overloaded.
  • john
    john over 10 years
    Additionally, Microsoft assert that the same metrics can be used: technet.microsoft.com/en-us/library/cc768535%28BTS.10%29.asp‌​x
  • Jake Oshins
    Jake Oshins over 10 years
    No, I'm not suggesting they're useless. I'm merely pointing out that time within a VM is virtualized, and thus measurements of anything that tries to look at small time quanta will result in messy data. Data you collect over long periods of time will be generally right. Looking at any small-span time period may not be accurate.
  • john
    john over 10 years
    Ok, that's a very useful clarification. Do you think you could add that to your answer please? Unfortunately, set you alluded to doesn't contain a metric for latency.
  • Nick Westgate
    Nick Westgate almost 9 years
    There's a similar opinion in the comments here.