ZFS - zpool ARC cache plus L2ARC benchmarking

6,899

Solution 1

It seems your tests are very sequential like writing a large file with dd then reading it. ZFS L2ARC cache is designed to boost performance on random reads workloads, not for streaming like patterns. Also, to get optimal performance, you might want to wait a longer time until the cache is warm. Another point would be to make sure your working set fit into the SSDs. Having io statistics observed during the tests would help figuring out what devices are used and how they perform.

Solution 2

Given the state of the answer here I will provide one.

Instead of answering with a question or an answer irrelevant to the question I will try to given an answer that is relevant.

Sadly I do not know the factual answer as to what should be going on, but I can answer with my own experience.

From my own experience, a zvol bigger than the ARC (or L2ARC) will not be cached. Other than avoiding read amplification.

You can run arc_summary on linux to get access to the ARC statistics.

I tested with accessing the same file over and over inside a virtual machine with its drive hosted on a zvol, which meant the same parts of the zvol should have been accessed over and over, but all the i/o was not even registering in the ARC at all as if it was been bypassed.

On the other hand I have another virtual machine hosted on a raw file on a zfs dataset, and that is caching just fine.

To confirm if ARC is enabled for the zvol (or dataset), check the primarycache variable, and for the l2arc, check the secondarycache variable.

Solution 3

Anyone attempting to benchmark the L2ARC will want to see how "warm" the L2ARC is, and to also assess what that their requests are hitting the L2ARC. There is a nice tool and article for doing just that: arcstat.pl updated for L2ARC statistics

Solution 4

Did you consider the ARC space compared to your test? In testing the I/O benefit of SSD's used as L2ARC (pool read cache) and/or ZIL (pool synchronous write cache) you need to consider the size of your ARC in contrast to your test's working set. If ARC can be used, it will be without pulling from L2ARC. Likewise, if write caching is enabled, writes will be coalesced regardless of ZIL unless flush and explicit synchronous behavior is enabled (i.e. the initiator's write cache is disabled too, etc.)

If you want to see the value of SSD for smaller working sets, consider that 16 disk RAID10 will deliver about 1200+ IOPS (SAS/SATA?) for writes and about twice that for reads. Reducing the disk set to two (for testing) and reducing the ARC to minimum (about 1/8th main memory) will then allow you to contrast spindle vs. SSD. You'd otherwise need to get more threads banging on your pool (multiple LUNs) to see the benefit. Oh yes, and get more interfaces working too, so you're not BW bound by a single 1Gbps interface...

Share:
6,899

Related videos on Youtube

jemmille
Author by

jemmille

Linux Admin/Engineer, Dad, Husband. I work with backup and storage systems as well as VPS technologies (Xen, OpenVZ, KVM) and have developed cloud products for 2 companies. When I'm not in front of a computer I do my best to go kayaking, canoeing and hiking -- basically anywhere there are trees and water you can find me.

Updated on September 17, 2022

Comments

  • jemmille
    jemmille over 1 year

    I have been doing lots of I/O testing on a ZFS system I will eventually use to serve virtual machines. I thought I would try adding SSD's for use as cache to see how much faster I can get the read speed. I also have 24GB of RAM in the machine that acts as ARC. vol0 is 6.4TB and the cache disks are 60GB SSD's. The zvol is as follows:

    pool: vol0
     state: ONLINE
     scrub: none requested
    config:
    
            NAME                     STATE     READ WRITE CKSUM
            vol0                     ONLINE       0     0     0
              c1t8d0                 ONLINE       0     0     0
            cache
              c3t5001517958D80533d0  ONLINE       0     0     0
              c3t5001517959092566d0  ONLINE       0     0     0
    

    The issue is I'm not seeing any difference with the SSD's installed. I've tried bonnie++ benchmarks and some simple dd commands to write a file then read the file. I have run benchmarks before and after adding the SSD's.

    I've ensured the file sizes are at least double my RAM so there is no way it can all get cached locally.

    Am I missing something here? When am I going to see benefits of having all that cache? Am I simply not under these circumstances? Are the benchmark programs not good for testing the effect of cache because of the the way (and what) it writes and reads?

    • 3dinfluence
      3dinfluence over 14 years
      Assuming that you're testing your production configuration here. I have a few things to point out. With ZFS you don't really want a 1 device zpool. It's not that it won't work but you are loosing out on some of the data protection that ZFS offers. In this configuration it will only be able to detect CRC errors and not correct them. It also limits the scrubbing feature to just identifying problems rather than fixing them. ZFS mirrors and RAIDZ1/2 configurations also has advantages over hardware RAID solutions. Like reslivering only the used space and no write hole with RAIDZ1/2.
    • ewwhite
      ewwhite over 14 years
      Is this for serving via NFS or iSCSI? What are the bonnie++ results like so far?
    • 3dinfluence
      3dinfluence over 14 years
      I should add that you can get some protection from CRC errors by using this command. "zfs set copies=2 vol0" This will cut your usable space in half and double the amount of IO involved in writes. So this isn't always an ideal solution. But for more info check out blogs.sun.com/relling/entry/zfs_copies_and_data_protection
    • jemmille
      jemmille over 14 years
      Seeing one zvol on my output is a bit deceiving (although technicall true). This is really coming from a vTrak promise array, 16 1TB disks in a RAID 10 configuration (2 spares). The vtrak is attached to a Nexenta head machine which created the zvol.
    • jemmille
      jemmille over 14 years
      Current results: WRITE CPU RE-WRITE CPU READ CPU RND-SEEKS 381MB/s 22% 202MB/s 14% 469MB/s 11% 791/sec
    • jemmille
      jemmille over 14 years
      oh, and this is iSCSI
    • 3dinfluence
      3dinfluence over 14 years
      Ok just so we are on the same page. The VTrak is setup as a JBOD and the Nexenta filer then creates a zpool with 7 pairs of mirrors + 2 spares and then this is presented as an iSCSI target to your server? Because if the VTrak is doing the RAID10 then what I said before still holds true.
    • jemmille
      jemmille over 14 years
      The VTrak is not setup as JBOD so I see what you are saying. I'm pretty new to ZFS and would happily change the setup to something better, I'd just have to convince my boss ;-) Regardless of that, any ideas with the caching?
    • 3dinfluence
      3dinfluence over 14 years
      I haven't had the chance to use L2Arc so I don't have any personal experience with it. But to see the performance from L2Arc the cache has to be warm. I'm not sure a benchmark is going to do a good job of warming the cache to see it's affect. But in general you're much better off doing real world tests. Is your server connected to the Nexenta with 10Gbit?
    • jemmille
      jemmille over 14 years
      The Nexenta box has 2 quad NIC's (plus 4 onboard). The quad nics are running into a switch setup with LACP 8x bond for a 8Gbit link. The vTrak is connected to the Nexenta machine with a 3Gbit HBA card. That aggregated link has a private IP the servers use to connect. Each server has a 2Gbit bonded link to the storage network. I can explain all the logic behind this but this thread is becoming quite cumbersome. If you want to know more we should try to connect outside of this. I'm interested if you are, if for no reason other than to exchange ideas.
  • jemmille
    jemmille over 14 years
    I was using iozone in the beginning but since I was receiving the same results from iozone and bonnie++ I just picked bonnie++ . I'll give this command a go thought and see what happens.
  • jemmille
    jemmille over 14 years
    I've done benchmarks with and without the caching. The results are almost the same which is the whole point of my question.
  • chrw
    chrw over 14 years
    Have you played a little bit with the record size? Maybe you see an impact on a "large amount of small files". Or you ask a consulting company to do a professional benchmark if the result is important for a company decision or something like this. Sorry the benchmark using iozone didn't work out for you but like I said: I'm not a guru on filesystem benchmarking and this kind of setup worked for me to see some impact on using ZFS in general. But with "only" 8GB RAM, no caching devices and letting ZFS take care of disk management (RAID-Z1).
  • Mark Booth
    Mark Booth over 8 years
    I use the newer arcstat.py and find that arcstat.py -f read,miss%,l2read,l2miss%,dm%,pm%,mm%,arcsz,c,l2size,l2asize 2 gives you a great insight into how your arc and l2arc are performing over time.