Should I use "Raid 5 + spare" or "Raid 6"?

51,831

Solution 1

In short:

  • If safety is your main concern then go with RAID6 as it can survive any two drives failing at the same time. If a drive fails in an R5+spare arrangement you are not safe from another failure until the spare has been brought up to speed which could take quite some time with large drives (and it is not unheard of for a drive that has been powered down for ages, such as your spare, to fail to spin up when finally called upon).

  • If performance is king, go with 5+spare as the write performance will be better when the array is not in a degraded state - though the performance difference between R5 and R6 is significantly smaller than the difference between R5 and other solutions if you have a good controller (i.e. once that makes a partial block write operation "two/three concurrent reads then parity calc then two/three concurrent writes" most of the time rather than "read-then-read(-then-read)-then-parity-calc-then-write-then-write(-then-write)" which is what some very cheap controllers and software RAID may do.

Edit: I missed a potentially important point first time around:

  • If power consumption is a concern, then R5+spare will have an extra advantage if your controller keeps the spare drive powered down until needed.

Solution 2

RAID 5 + hot spare:

  • on equal controller hardware better performance than RAID 6
  • you cant lose 2 disk at the same time. when you lose a disk, there's a rebuild time (with the hot spare) in which you have no redundancy. Anything which fails in this time creates a complete loss (short of sending everthing to a good data rescure firm and pay really $$$$)

RAID 6:

  • worse performance than RAID 5 (dependend on controller it can range from very noticable to virtually no difference)
  • you can lose 2 disks at the same time

For any RAID 5 or 6 you have to be carefull to use disks which are not from the same production run. It can happen (I've seen it!) that after a single fail upon rebuild the next disk(s) fail due to the increased stress. Disks from the same run have the exact same firmware and probably very similiar physical properties.

Edit: What to choose

(This also depends on the performance requirements of the server and the tolerable risk.)

If the servers' environment is pretty nice for hardware (colo, climatized etc.), you'll be OK with RAID5 + hot spare.

If the environment makes it more likely that more than one disk fails within short time (vibrations, humidity, dirt), then go for RAID 6.

Always also have an adequate backup and test recovery.

Edit 2: Decent RAID controllers have scrubbing, which verifies periodically all sectors.

Solution 3

RAID5 uses one parity stripe. It is necessary to calculate the Reed Solomon error correction and write two stripes for RAID6 vs. one for RAID5. RAID5 is used for intense database applications where storage is huge because of the cost of RAID10. RAID5 cost varies from 67% to 94% disk availability where RAID10 is 50%(much higher storage costs) While RAID6 has lower read latency by a very small amount due to rotational latency, RAID6 is between 25 and 31% slower on writes due to the calculation of error correction and the additional writing of the parity bit.

Using the mean time between failure (MTBF) for the drives, the probability of two drives failing one right after another or at the same time is about (0.1% x 0.1%)*12 or 0.001 x 0.001 * 12; if you have 1000 drives running then you will average losing ~1.2 drives per year. Two drives will fail one right after the other about every 8.3 years. Now because drive failure is not a Poisson distribution due to the heavy loads on the drive during rebuild, a failure of a second drive is more likely to occur during this period, and the distribution is closer to a Gamma distribution with slightly higher values after a failure occurs.

The bottom line is, performance for RAID5 is superior to RAID6 on writes and for DB application - far better. For a mostly read application such as a web server, it makes no difference and you should use RAID6. The cost benefits of using RAID5 over RAID10 are huge for large storage. If you can afford the overhead, use RAID10 for highly disk-intensive applications. RAID10 will always perform better.

The biggest bottom line missed is RAID is NOT backup, but a way to limit downtime by providing redundancy. If the data is critical, you should be backing it up (and testing your recovery process).

If one RAID array of 10 2TB SAS drives fails, recovery will cost thousands of dollars and take weeks to recover, if it can even be done.

All RAID arrays eventually fail!

Solution 4

Speaking strictly from a data integrity viewpoint, yes. You can safely lose any two drives, although it is a rare occurrence to lose two together short of severe physical trauma to the system.

Financially, not quite as much. The hot spare can be powered down until needed, which means that it doesn't use power and incurs no wear.

And as always, RAID is not a replacement for a proper off-site backup plan.

Solution 5

Have you considered 10? If you have enough disks for raid 6, you've got enough to do a 10 volume. In most cases 10 is both faster and more redundant (at the cost of some disk space).

Share:
51,831

Related videos on Youtube

Trevor Boyd Smith
Author by

Trevor Boyd Smith

Updated on September 17, 2022

Comments

  • Trevor Boyd Smith
    Trevor Boyd Smith almost 2 years

    What is "Raid 5 + Spare" (excerpt from User Manual, Sect 4.17.2, P.54):

    RAID5+Spare: RAID 5+Spare is a RAID 5 array in which one disk is used as spare to rebuild the system as soon as a disk fails (Fig. 79). At least four disks are required. If one physical disk fails, the data remains available because it is read from the parity blocks. Data from a failed disk is rebuilt onto the hot spare disk. When a failed disk is replaced, the replacement becomes the new hot spare. No data is lost in the case of a single disk failure, but if a second disk fails before the system can rebuild data to the hot spare, all data in the array will be lost.


    What is "Raid 6" (excerpt from User Manual, Sect 4.17.2, P.54):

    RAID6: In RAID 6, data is striped across all disks (minimum of four) and a two parity blocks for each data block (p and q in Fig. 80) is written on the same stripe. If one physical disk fails, the data from the failed disk can be rebuilt onto a replacement disk. This Raid mode can support up to two disk failures with no data loss. RAID 6 provides for faster rebuilding of data from a failed disk.


    Both "Raid 5 + spare" and "Raid 6" are SO similar ... I can't tell the difference.

    When would "Raid 5 + Spare" be optimal?

    And when would "Raid 6" be optimal"?

    The manual dumbs down the different raid with 5 star ratings. "Raid 5 + Spare" only gets 4 stars but "Raid 6" gets 5 stars. If I were to blindly trust the manual I would conclude that "Raid 6" is always better. Is "Raid 6" always better?

    • sound2man
      sound2man over 13 years
      Whatever you end up doing, only raid with a raid controller, not with the on-board soft controller that comes with your mobo. If your mobo goes out, you are asking for trouble.
    • Trevor Boyd Smith
      Trevor Boyd Smith over 13 years
      The raid is being down by a hardware controller (lol i have heard too many things against software raid controllers).
  • Trevor Boyd Smith
    Trevor Boyd Smith over 13 years
    10 only supports 4 disks. so raid 10 is not an option IMO.
  • Joel Coehoorn
    Joel Coehoorn over 13 years
    @Trevor Raid 10 supports any even number of disks >= 4. If you can do raid 6, you can do raid 10.
  • Trevor Boyd Smith
    Trevor Boyd Smith over 13 years
    Most well written/concise. (States the obvious pros/cons in the first two words of each bullet point... very very good).
  • ganesh
    ganesh almost 11 years
    I disagree. RAID5 has its uses. (e.g. when a budget is tight and you really need diskspace). And since RAID does not replace a backup surviving one disk failure is plenty to tide you over till 5 PM, at which point people leave the office and you do emergence maintenance.
  • ChrisInEdmonton
    ChrisInEdmonton over 10 years
    It's not at all clear to me why you say that a RAID10 does not have the same problem with URE's. With a four-drive RAID10 setup, if you lose one drive and suffer a URE on its corresponding mirror, you're equally hosed.
  • user1594322
    user1594322 over 10 years
    If RAID10 has a failed drive, and then has a URE on the surviving drive, you only lose the unreadable sector, not the entire array. Updated the answer.
  • David Yates
    David Yates over 6 years
    I'd be curious to know when, if ever, the power draw of a single extra drive is really going to be a "concern" in comparison to everything else in the data center / server room / etc
  • David Yates
    David Yates over 6 years
    +1 for "have an adequate backup and test recovery". That's the FIRST thing everyone should have before they start worrying about RAID levels.
  • David Spillett
    David Spillett over 6 years
    A single drive in a single machine, probably not. But in colo where you get X-amps-per-rack and pay a lot for any excess (or excess is simply not permitted - sometimes if you go over you go dark), it could be noticeable. Power "consumed" is a double whammy too: it is converted to noise and heat and you end up needing more power to move the heat away. And for a whole cage or larger set of kit the total draw of an extra drive per compute unit soon adds up to something a sufficiently picky accountant might notice.