ZFS best practices with hardware RAID

53,399

Solution 1

The idea with ZFS is to let it known as much as possible how the disks are behaving. Then, from worst to better:

  • Hardware raid (ZFS has absolutely no clue about the real hardware),
  • JBOD mode (The issue being more about any potential expander: less bandwidth),
  • HBA mode being the ideal (ZFS knows everything about the disks)

As ZFS is quite paranoid about hardware, the less hiding there is, the more it can cope with any hardware issues. And as pointed out by Sammitch, RAID Controller configurations and ZFS may be very difficult to restore or reconfigure when it fails (i.e. hardware failure).

About the issue of standardized hardware with some hardware-RAID controller in it, just be careful that the hardware controller has a real pass-through or JBOD mode.

Solution 2

Q. If one happens to have some server-grade hardware at ones disposal, is it ever advisable to run ZFS on top of a hardware-based RAID1 or some such?

A. It is strongly preferable to run ZFS straight to disk, and not make use of any form of RAID in between. Whether or not a system that effectively requires you make use of the RAID card precludes the use of ZFS has more to do with the OTHER benefits of ZFS than it does data resiliency. Flat out, if there's an underlying RAID card responsible for providing a single LUN to ZFS, ZFS is not going to improve data resiliency. If your only reason for going with ZFS in the first place is data resiliency improvement, then you just lost all reason for using it. However, ZFS also provides ARC/L2ARC, compression, snapshots, clones, and various other improvements that you might also want, and in that case, perhaps it is still your filesystem of choice.

Q. Should one turn off the hardware-based RAID, and run ZFS on a mirror or a raidz zpool instead?

A. Yes, if at all possible. Some RAID cards allow pass-through mode. If it has it, this is the preferable thing to do.

Q. With the hardware RAID functionality turned off, are hardware-RAID-based SATA2 and SAS controllers more or less likely to hide read and write errors than non-hardware-RAID controllers would?

A. This is entirely dependent on the RAID card in question. You'll have to pore over the manual or contact the manufacturer/vendor of the RAID card to find out. Some very much do, yes, especially if 'turning off' the RAID functionality doesn't actually completely turn it off.

Q. In terms of non-customisable servers, if one has a situation where a hardware RAID controller is effectively cost-neutral (or even lowers the cost of the pre-built server offering, since its presence improves the likelihood of the hosting company providing complementary IPMI access), should it at all be avoided? But should it be sought after?

A. This is much the same question as your first one. Again - if your only desire to use ZFS is an improvement in data resiliency, and your chosen hardware platform requires a RAID card provide a single LUN to ZFS (or multiple LUN's, but you have ZFS stripe across them), then you're doing nothing to improve data resiliency and thus your choice of ZFS may not be appropriate. If, however, you find any of the other ZFS features useful, it may still be.

I do want to add an additional concern - the above answers rely on the idea that the use of a hardware RAID card underneath ZFS does nothing to harm ZFS beyond removing its ability to improve data resiliency. The truth is that's more of a gray area. There are various tuneables and assumptions within ZFS that don't necessarily operate as well when handed multi-disk LUN's instead of raw disks. Most of this can be negated with proper tuning, but out of the box, you won't be as efficient on ZFS on top of large RAID LUN's as you would have been on top of individual spindles.

Further, there's some evidence to suggest that the very different manner in which ZFS talks to LUN's as opposed to more traditional filesystems often invokes code paths in the RAID controller and workloads that they're not as used to, which can lead to oddities. Most notably, you'll probably be doing yourself a favor by disabling the ZIL functionality entirely on any pool you place on top of a single LUN if you're not also providing a separate log device, though of course I'd highly recommend you DO provide the pool a separate raw log device (that isn't a LUN from the RAID card, if at all possible).

Solution 3

I run ZFS on top of HP ProLiant Smart Array RAID configurations fairly often.

Why?

  • Because I like ZFS for data partitions, not boot partitions.
  • Because Linux and ZFS boot probably isn't foolproof enough for me right now.
  • Because HP RAID controllers don't allow RAW device passthrough. Configuring multiple RAID 0 volumes is not the same as RAW disks.
  • Because server backplanes aren't typically flexible enough to dedicate drive bays to a specific controller or split duties between two controllers. These days you see 8 and 16-bay setups most often. Not always enough to segment the way things should be.
  • But I still like the volume management capabilities of ZFS. The zpool allows me to carve things up dynamically and make the most use of the available disk space.
  • Compression, ARC and L2ARC are killer features!
  • A properly-engineered ZFS setup atop hardware RAID still gives good warning and failure alerting, but outperforms the hardware-only solution.

An example:

RAID controller configuration.

[root@Hapco ~]# hpacucli ctrl all show config

Smart Array P410i in Slot 0 (Embedded)    (sn: 50014380233859A0)

   array B (Solid State SATA, Unused Space: 250016  MB)
      logicaldrive 3 (325.0 GB, RAID 1+0, OK)

      physicaldrive 1I:1:3 (port 1I:box 1:bay 3, Solid State SATA, 240.0 GB, OK)
      physicaldrive 1I:1:4 (port 1I:box 1:bay 4, Solid State SATA, 240.0 GB, OK)
      physicaldrive 2I:1:7 (port 2I:box 1:bay 7, Solid State SATA, 240.0 GB, OK)
      physicaldrive 2I:1:8 (port 2I:box 1:bay 8, Solid State SATA, 240.0 GB, OK)

block device listing

[root@Hapco ~]# fdisk  -l /dev/sdc

Disk /dev/sdc: 349.0 GB, 348967140864 bytes
256 heads, 63 sectors/track, 42260 cylinders
Units = cylinders of 16128 * 512 = 8257536 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1       42261   340788223   ee  GPT

zpool configuration

[root@Hapco ~]# zpool  list
NAME   SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
vol1   324G  84.8G   239G    26%  1.00x  ONLINE  -

zpool detail

  pool: vol1
 state: ONLINE
  scan: scrub repaired 0 in 0h4m with 0 errors on Sun May 19 08:47:46 2013
config:

        NAME                                      STATE     READ WRITE CKSUM
        vol1                                      ONLINE       0     0     0
          wwn-0x600508b1001cc25fb5d48e3e7c918950  ONLINE       0     0     0

zfs filesystem listing

[root@Hapco ~]# zfs list
NAME             USED  AVAIL  REFER  MOUNTPOINT
vol1            84.8G   234G    30K  /vol1
vol1/pprovol    84.5G   234G  84.5G  -

Solution 4

Typically you should never run ZFS on top of disks configured in a RAID array. Note that ZFS does not have to run in RAID mode. You can just use individual disks. However, virtually 99% of people run ZFS for the RAID portion of it. You could just run your disks in striped mode, but that is a poor use of ZFS. Like other posters have said, ZFS wants to know a lot about the hardware. ZFS should only be connected to a RAID card that can be set to JBOD mode, or preferably connected to an HBA. Jump onto IRC Freenode channel #openindiana ; any of the ZFS experts in the channel will tell you the same thing. Ask your hosting provider to provide JBOD mode if they will not give a HBA.

Solution 5

Everybody tells that ZFS on top of RAID is a bad idea without even providing a link. But the developers of ZFS - Sun Microsystems even recommend to run ZFS on top of HW RAID as well as on ZFS mirrored pools for Oracle databases.

The main argument against HW RAID is that it can't detect bit rot like ZFS mirror. But that's wrong. There is T10 PI for that. You can use T10 PI capable controllers (that at least all LSI controllers that I used are) Majority of enterprise disks are T10 PI capable. So if it is appropriate for you, you can build T10 PI capable array, create ZFS pool without redundancy on top of it, and just make sure you follow the guidelines regarding to your use case in the article. Though it is written for Solaris, IMHO it is also suitable for the other OS.

The benefits for me is that the replacing disk in HW controller is really easier ( especially in my case, because I don't use whole disk for zpool for performance reasons ) It requires NO intervention at all and can be done by client's staff.

The downside is that you have to make sure that disks you buy are actually formatted to support T10 PI, because some of them though capable of T10 PI but sold formatted as regular disks. You can format them yourself, but it's not very straightforward and potentially dangerous if you interrupt the process.

Share:
53,399

Related videos on Youtube

Congmin
Author by

Congmin

Completed: mdoc.su — short manual page URLs, a deterministic URL shorterer, written wholly in nginx.conf aibs(4) in OpenBSD, DragonFly, NetBSD and FreeBSD WIP: ports.su — OpenBSD's ports-readmes based on sqlports bmap.su — 100Mbps residential broadband under 100$/mo BXR.SU — Super User's BSD Cross Reference (publicly private beta over IPv6) ngx.su — grok nginx

Updated on September 18, 2022

Comments

  • Congmin
    Congmin over 1 year

    If one happens to have some server-grade hardware at ones disposal, is it ever advisable to run ZFS on top of a hardware-based RAID1 or some such? Should one turn off the hardware-based RAID, and run ZFS on a mirror or a raidz zpool instead?

    With the hardware RAID functionality turned off, are hardware-RAID-based SATA2 and SAS controllers more or less likely to hide read and write errors than non-hardware-RAID controllers would?

    In terms of non-customisable servers, if one has a situation where a hardware RAID controller is effectively cost-neutral (or even lowers the cost of the pre-built server offering, since its presence improves the likelihood of the hosting company providing complementary IPMI access), should it at all be avoided? But should it be sought after?

    • ravi yarlagadda
      ravi yarlagadda over 10 years
    • Congmin
      Congmin over 10 years
      @ShaneMadden, the questions are similar, however, my question already comes from the perspective of hardware raid being bad in terms of zfs, and I'm asking just how bad is it; also, consider that the accepted answer to your linked question doesn't address my question at all; my question is more like a followup question to the question you've linked.
    • Stefan Lasiewski
      Stefan Lasiewski over 10 years
      "ZFS on top of Hardware Mirroring, or just mirror in ZFS?" and this question are two different topics. That other topic is more narrow in scope then this topic.
    • Congmin
      Congmin over 8 years
      @ewwhite, didn't you ask this already?
    • ewwhite
      ewwhite over 8 years
      @cnst Well, there's no marked answer, and people keep downvoting my answer. So it would be nice for there to be some closure to the question posed. (it's the responsible thing to do)
    • Congmin
      Congmin over 8 years
      @ewwhite, well, i'm sorry to hear that; i think your answer provides a good prospective, i for one surely didn't downvote it! even though i can see where those people that do are coming from...
  • Congmin
    Congmin over 10 years
    So, in regards to the closed question that you've linked to, is it to say that if I want to use ZFS, I'd better avoid, for example, Dell PERC H200 and HP P410? Do they still not have a way to disable the hardware raid mode, be that RAID0 or RAID1?
  • Congmin
    Congmin over 10 years
    So, it seems like dell.com/learn/us/en/04/campaigns/dell-raid-controllers does claim that H200 "Supports non-RAID", although h18004.www1.hp.com/products/servers/proliantstorage/… is not entirely clear on whether the raid functionality of P410 can or cannot be turned off.
  • Sammitch
    Sammitch over 10 years
    It's also worth noting that if you're using HW RAID and your controller dies [happens more than you'd think] if you can't get a replacement that's either identical or fully compatible, you're hooped. On the other hand if you gave the raw disks to ZFS you can plug those disks back into any controller on any machine and ZFS can reconstruct the array and carry on like nothing happened.
  • ewwhite
    ewwhite over 10 years
    High-end servers typically have onboard RAID controllers. E.g. I've never had to replace a controller on an HP or Dell system.
  • ewwhite
    ewwhite over 10 years
    @cnst You cannot disable the RAID functionality of an HP Smart Array P410.
  • ewwhite
    ewwhite over 10 years
    Not necessarily. What if I care more about the volume management flexibility than the optimization around have raw access to physical devices. ZFS works quite well for my use case.
  • poige
    poige over 10 years
    @ewwhite, well, someone can drive bicycle walking near it, saying that he likes to walk and love bicycles in general, but the truth is bicycles are made for being ridden on. )
  • Congmin
    Congmin over 10 years
    Yeah, I agree. But it's also a matter of what's available in stock with the configuration that fits the bill and the spec. If a server has great CPU, lots of ECC RAM, great bandwidth, and plenty of it, but has to come with a hardware-based RAID, it's may not be cost-effective to seek alternatives, which may be several times more expensive, due to being in a different category or so, or missing some of the enterprise features like the ECC RAM etc.
  • ceving
    ceving over 8 years
    This answer does not answer anything. It expresses just the biased opinion, that the supplier of the server hardware and the ZFS programmer have done a better job than the supplier of the RAID controller and the programmer of the RAID firmware. The FreeNAS community is full of guys who killed their Zpools with malfunctioning server memory or inappropriate power supplies. The chance that something big fails is higher than something small.
  • Malvineous
    Malvineous over 5 years
    So you're saying that ZFS is so unreliable that if a single bit changes you can lose the whole filesystem? How does SATA NCQ cause data loss when the drive still notifies the host only when the sectors have been written successfully (albeit in perhaps a different order)?
  • sparse
    sparse over 4 years
    Is it still correct? Are you saying there is no dangerous running ZFS on hardware raid?
  • ewwhite
    ewwhite over 4 years
    Correct. It’s not dangerous.
  • Akito
    Akito over 4 years
    @Sammitch Thank you for this great advise. I am sticking to ZFS now.
  • Akito
    Akito over 4 years
    If this answer was correct, I would've lost already several PetaByte of data... And others that I know, too...
  • TJ Zimmerman
    TJ Zimmerman about 4 years
    Anyone doing this with an H730? I understand RAID mode can be configured as RAID and Non-RAID. Wherein the second option the drives are passed through directly to the OS for management. I understand that the only drawback being that SMART data relies in firmware drivers being available from Dell for your drive. Will the H730 do anything weird like hidden error correction that might break ZFS when running in RAID mode but configured as Non-RAID Devices?
  • TJ Zimmerman
    TJ Zimmerman about 4 years
    @Akito why is it not correct?
  • Alek_A
    Alek_A about 3 years
    You probably do not know about T10 PI capable disks (which most enterprise disk are). Just create T10 PI capable HW RAID array and you are protected from the bit rot.
  • shodanshok
    shodanshok about 3 years
    While I upvoted for you reference to T10-ready disks and controllers, please note that the very same Oracle page you linked tell the following: "... Consider using JBOD-mode for storage arrays rather than hardware RAID so that ZFS can manage the storage and the redundancy... So no, Oracle docs do not suggest or recommend ZFS over HW RAID; rather, they explain how to use HWRAID with ZFS without too much pain.
  • Alek_A
    Alek_A about 3 years
    I disagree. They actually do. The phrase expresses that you should consider using it for storage arrays in general. Why? Because if you not confident in HW RAID redundancy i.e. if it can't handle silent data corruption (bit rot), it's the best way to use JBOD. But next phrase states "If you are confident in the redundancy of your hardware RAID solution, then consider using ZFS without ZFS redundancy with your hardware RAID array" Please read carefully. In Databases section stated that it is recommended as well as mirrored pool for Oracle database. IMHO for databases in general
  • shodanshok
    shodanshok about 3 years
    Just below: "Using ZFS redundancy has many benefits – For production environments, configure ZFS so that it can repair data inconsistencies...If you are confident in the redundancy of your hardware RAID solution, then consider using ZFS without ZFS redundancy with your hardware RAID array". I can't see any recommendations to use HW RAID; rather, the docs explain how to use it without causing too much trouble. Don't get me wrong: sometime I am also using ZFS on HW RAID, but I would not describe it as a "recommend setup". The mirrored pool note is to use ZFS mirror rather than ZRAID.
  • Alek_A
    Alek_A about 3 years
    People tend to see what they want to see) Sorry, I suppose further conversations is useless. OK, just let people make their own conclusions about the article.
  • sed_and_done
    sed_and_done about 3 years
    I am seeing some crashes in ZOL when using a PERC H740P mini (embedded) with each disk as a single disk RAID 0 and seeing crashes that I am highly suspicious are related to using ZOL with HW RAID.
  • ewwhite
    ewwhite about 3 years
    @sed_and_done Please avoid using multiple RAID0 arrays from a RAID controller to serve ZFS. Either go with a single LUN hardware RAID or an HBA.
  • Admin
    Admin almost 2 years
    This answers looks as if it came directly from google translate. Really awful.