How does parity work on a RAID-5 array?

52,933

Solution 1

It just XORs each corresponding bit from each drive - If you lose any drive, you can re-build the missing data.

For background:

A B (A XOR B)
0 0    0
1 1    0
0 1    1
1 0    1

Assume that D is the XOR of the other columns, then as long as you only lose one drive, you can figure out what you lost.

A B C D
1 0 0 1
0 1 0 1
1 1 0 0

Some times the stripe bit will be distributed across the drives, but the concept is the same.

So for RAID-5, no matter how many drives, you only need 1 drive for parity equal or bigger than the smallest drive in the array you want to RAID.

RAID-5 for personal use is probably best as computational complexity is much lower than RAID-6.

RAID-6 is more complicated using Galois Fields to compute parity. And that can tax parity computations. However, you can lose more drives, but if you rebuild your array as soon as you get a single failure, you should be fine sticking with RAID-5.

Solution 2

Here's what I think is a better diagram to show how parity works in RAID4 and RAID5

RAID4

Disk1  Disk2  Disk3  Disk4
----------------------------
data1  data1  data1  parity1
data2  data2  data2  parity2
data3  data3  data3  parity3
data4  data4  data4  parity4

RAID5

Disk1   Disk2   Disk3   Disk4
----------------------------
parity1 data1   data1   data1   
data2   parity2 data2   data2  
data3   data3   parity3 data3
data4   data4   data4   parity4

Solution 3

I would recommend reading this Wikipedia article on Raid 5 and Raid 6

http://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5_parity_handling

RAID 5 writes a parity block in each strip, so for Strip A of a 4 disk array it writes the parity check on the 4th disk, with Data on disks 1, 2 and 3

For Strip B, the parity block is on disk 3, with data on disks 1,2 and 4.. etc..

If say disk 4 fails, the data can be recovered for Strip B as you know the data on disk 1 and 2 and have the parity check on disk 3.

If strip B had a parity of "2" and disk 1 has data of "1" and disk 2 data "0" then disk 4 must have had data equal to "1" so the disk is written with data = "1"

Whole disk can be recreated this way, RAID 6 extends this by having 2 party blocks per stripe.

Regarding space for Raid 5 you only ever loose one disks worth of space to parity, as it only writes on parity block per stripe, while with Raid 6 you will loose 2 disks but can also loose two disks rather than the one you can loose in Raid 5 ;)

The Wikipedia article explains this better!

Solution 4

RAID 5 uses one drive for parity, regardless of how many data drives there are in the array. This means that it becomes more efficient, in terms of usable space, the more drives that are added.

Parity is achieved by doing an XOR operation across the same block in each drive; the contents of the parity drive is adjusted such that all drives XOR to zero. This does mean that RAID 5 is restricted by the smallest capacity of all drives in the array.

RAID 6 is similar except that two simultaneous drive failures can be tolerated. This is useful because the process of "resilvering" an array after a single drive failure may be stressful enough to cause a second drive to fail.

Share:
52,933

Related videos on Youtube

Naftuli Kay
Author by

Naftuli Kay

Updated on September 18, 2022

Comments

  • Naftuli Kay
    Naftuli Kay almost 2 years

    I'm looking to build a nice little RAID array for dedicated backups. I'd like to have about 2-4TB of space available, as I have this nasty little habit of digitizing everything. Thus, I need a lot of storage and a lot of redundancy in case of drive failure. I'll also essentially be backing up 2-3 computers' /home folders using one of the "Time Machine" clones for Linux. This array will be accessible over my local network via SSH.

    I'm having difficulties understanding how RAID-5 achieves parity and how many drives are actually required. One would assume that it needs 5 drives, but I could be wrong. Most of the diagrams I've seen have only yet confused me. It seems that this is how RAID-5 works, please correct me as I'm sure I'm not grasping it properly:

    /---STORAGE---\    /---PARITY----\
    |   DRIVE_1   |    |   DRIVE_4   |
    |   DRIVE_2   |----|     ...     |
    |   DRIVE_3   |    |             |
    \-------------/    \-------------/
    

    It seems that drives 1-3 appear and work as a single, massive drive (capacity * number_of_drives) and the parity drive(s) back up those drives. What seems strange to me is that I usually see 3+ storage drives in a diagram to only 1 or 2 parity drives. Say we're running 4 1TB drives in a RAID-5 array, 3 running storage and 1 running parity, we have 3TB of actual storage, but only have 1TB of parity!?

    I know I'm missing something here, can someone help me out? Also, for my use case, what would be better, RAID-5 or RAID-6? Fault tolerance is the highest priority for me at this point, since it's going to be running over a network for home use only, speed isn't hugely critical.

  • Naftuli Kay
    Naftuli Kay about 13 years
    So that essentially means that I can have 4 2TB drives and have 6TB of effective, redundant storage?
  • Naftuli Kay
    Naftuli Kay about 13 years
    Excellent answer. I was thinking on too large a scale, on an actual complete hard-disk basis, rather than a bit-level. So does RAID-5 use a dedicated drive for parity, or rather all drives for parity? I'm confused on that.
  • Matt
    Matt about 13 years
    I believe the modern approach is to distribute the parity diagonally across all the drives. This has the effect of accelerating the read time to parity bits since multiple IO requests can be sent in parallel to different drives, but don't quote me on that.
  • Matt
    Matt about 13 years
    Yeah, it's the (smallest drive size) * (number of drives in array - 1)
  • sblair
    sblair about 13 years
    @TK Kocheran With RAID 5, yes. Note that the effective storage will be a bit less due to the file system. For example, my NAS with 4 2TB drives in RAID-Z1 (ZFS's version of RAID 5) has a usable space of 5.18TB.
  • Naftuli Kay
    Naftuli Kay about 13 years
    What's the ratio of drives to parity (total storage) for RAID-6? drive_size * (drive_count - 2)?
  • Naftuli Kay
    Naftuli Kay about 13 years
    Well yes, of course :) Always happens that way. Next question is what filesystem to use...
  • camster342
    camster342 about 13 years
    As well as fault tolerance for a second drive going bad before you can replace the first, there is one other situation that it is great for and I've come across more than once: A drive goes bad in a RAID array, and so a new drive is ordered. Some random guy who knows nothing about RAID arrays goes into the server room with new drive in hand, messes up the numbering, and ejects the wrong drive out of the array for replacement. Under RAID5, your array is screwed right there. RAID6 means you can still recover.
  • vjalle
    vjalle over 9 years
    It was easy to "feel" that you already had (drives - 1)/drives of your information even without the parity on a single drive failure, but the explanation here makes the reason obvious. If you have n-1 drives' worth of bits from your XOR equation, comparing an XORing of the n-1 to your parity bit will always tell you if the "lost" bit is switched on or not. Nicely done. (Understanding RAID 6, heaven help me.)
  • Jay Sullivan
    Jay Sullivan over 9 years
    If the parity is just an XOR of the two other disks, how do you know which of the two disks was corrupted? Wouldn't a bit flip on either disk result in a bit flip in the parity?
  • Giuseppe Crinò
    Giuseppe Crinò about 5 years
    Or, have a look at this SVG on Wikipedia en.wikipedia.org/wiki/Standard_RAID_levels#/media/…
  • MarkD
    MarkD over 4 years
    Hi Little confused about situations like line 4 - (1,1,0 = 0) If you have (1,1,?) = 0, ? could be 1 or 0 and the XOR would still be correct. What am I missing?
  • The Guy with The Hat
    The Guy with The Hat over 4 years
    @MarkD Don't think of it as XOR, think of it as "even or odd number of 1s". (1,1,0 = 0), (1,1,1 = 1).
  • Vinny
    Vinny over 4 years
    If you have (1,1,?) = 0, ? could be 1 or 0 and the XOR would still be correct. What am I missing? If you have a XOR b XOR c, you first compute a XOR b, and then compute the result XOR c. Think of it like [ ( a XOR b ) XOR c ].
  • Vinny
    Vinny over 4 years
    So for a=1,b=1,c=0, you have [ ( 1 XOR 1 ) XOR 0 ] = [ ( 0 ) XOR 0 ] = 0 But for a=1,b=1,c=1, you have [ ( 1 XOR 1 ) XOR 1 ] = [ ( 0 ) XOR 1 ] = 1