Upgrading ZFS Pool size with different sized disks

hard-drive raid zfs

7,426

Solution 1

Currently, I have a 1TB, 2TB and 3TB drive, with probably around 5.5TB used, and I'm thinking that I will buy two more 4TB drives and set up a 4TB x 4TB x 3TB RAIDZ array.

With 4 TB drives, you shouldn't be looking at single redundancy RAIDZ. I would recommend RAIDZ2 because of the additional protection it affords in case one drive somehow breaks or otherwise develops problems.

Remember that consumer drives are usually spec'd to a URE rate of one failed sector per 10^14 bits read. 1 TB (hard disk drive manufacturer terabyte, that is) is 10^12 bytes or close to 10^13 bits, give or take a small amount of change. A full read pass of the array you have in mind is statistically likely to encounter a problem, and in practice, read problems tend to develop in batches.

I'm not sure why you are suggesting RAIDZ2. Is it more likely that I will develop two simultaneous drive failures if I use RAIDZ1 than if I use no RAID? I want some improvement to the fault tolerance of my system. Nothing unrecoverable will exist in only one place, so the RAID, array is just a matter of convenience.

RAIDZ1 uses a single disk to provide redundancy in a vdev, whereas RAIDZ2 uses two (and some more complex calculations, but you are unlikely to be throughput limited by RAIDZ calculations anyway). The benefit of a second redundant disk is in case the first fails or otherwise becomes unavailable. With only one disk's worth of redundancy, any additional errors are now critical. With 4+4+3 TB, you have 11 TB of raw storage, initially 6 TB of which may need to be read to reconstruct a lost disk (8 TB once you upgrade that 3 TB drive to a 4 TB one and expand the pool to match). For order-of-magnitude estimates, that rounds nicely to somewhere between 10^13 and 10^14 bits. Statistically, you have something like a 50% to 100% probability of hitting an unrecoverable read error during resilvering when using single redundancy with an array of that order of magnitude size. Sure, you may very well luck out, but it suddenly means that you have next to no protection in case of a drive failure.

My understanding is that for heterogeneous disk arrays like this, the pool size will be limited to the size of the smallest disk, which would mean I'd be looking at a 3 x 3 x 3 TB RAIDZ with 6TB usable space and 2TB of non-fault-tolerant space - is this true?

Almost. ZFS will restrict the vdev to the size of the smallest constituent device, so you get the effective capacity of a three-device RAIDZ vdev made up of 3 TB devices, so 6 TB of user-accessible storage (give or take metadata). The remaining 2 TB of raw storage space are wasted; it is not available for use even without redundancy. (They will show up in the EXPANDSZ column in zpool list, but they aren't being used.)

Once you replace the 3 TB drive with a 4 TB drive and expand the vdev (both of which are online operations in ZFS), the pool can use the additional storage space.

There are ways around this -- for example, you could partition the drives to present three 3 TB devices and two 1 TB (remainder of the two 4 TB drives) devices to ZFS -- but it's going to seriously complicate your setup and it's unlikely to work the way you plan. I strongly recommend against that.

The 2 TB of non-fault tolerant space would not be backed up by ZFS to the offline disks, sorry if that was not clear. I was suggesting that I would back it up by normal disk syncing operations like rsync.

That implies that ZFS has no knowledge of those 2 x 1TB, and that you are creating some other file system in the space. Yes, you can do that, but again, it's going to seriously complicate your setup for, quite frankly, what appears to be very little gain.

Assuming #1 is true, when I eventually need more space, if I add a single 4TB or 6TB drive to the array, will it be a simple matter to extend the pool to become a 4 x 4 x 4 TB array, or will I need to find somewhere to stash the 6TB of data while upgrading the array?

As I said above, ZFS vdevs and pools can be grown as an online operation, if you do it by gradually replacing devices. (It is not, however, possible to shrink a ZFS pool or vdev.) What you cannot however do is add additional devices to an existing vdev (such as the three-device RAIDZ vdev that you are envisioning creating); an entirely new vdev must be added to the pool, and the data that is later written is then striped between the two vdevs in the pool. Each vdev has its own redundancy requirements, but they can share hot spares. You also cannot remove devices from a vdev, except in the case of mirrors (where removing a device only reduces the redundancy level of that particular mirror vdev, and does not affect the amount of user-accessible storage space), and you cannot remove vdevs from a pool. The only way to do the latter (and by consequence, the only way to fix some pool configuration mishaps) is to recreate the pool and transfer the data from the old pool, possibly by way of backups, to the new pool.

The 2TB of non-fault-tolerant space is not that big a deal, because I was planning on setting aside around 2TB for "stuff that needs proper backup" (personal photos, computer snapshots, etc), which I would mirror to the remaining 2TB disk and a 2nd 2TB external drive that I will keep somewhere else.

ZFS redundancy isn't really designed for the mostly-offline offsite-backup-drive use case. I discuss this in some depth in Would a ZFS mirror with one drive mostly offline work?, but the bread and butter of it is that it's better to use zfs send/zfs receive to copy the contents of a ZFS file system (including snapshots and other periphernalia), or plain rsync if you don't care about snapshots, than to use mirrors in a mostly-offline setup.

If I'm using half my disks for fault tolerance, I might as well just use traditional offline backups.

This admittedly depends a lot on your situation. What are your time to recovery requirements in different situations? RAID is about uptime and time to recovery, not about safeguarding data; you need backups anyway.

Solution 2

My understanding is that for heterogeneous disk arrays like this, the pool size will be limited to the size of the smallest disk, which would mean I'd be looking at a 3 x 3 x 3 TB RAIDZ with 6TB usable space and 2TB of non-fault-tolerant space - is this true?*

Yes.

Assuming #1 is true, when I eventually need more space, if I add a single 4TB or 6TB drive to the array, will it be a simple matter to extend the pool to become a 4 x 4 x 4 TB array, or will I need to find somewhere to stash the 6TB of data while upgrading the array?

Yes, this is possible.

You can replace all your disks inside a Z1/Z2/Z3 vdev one by one, where your capacity will be equal to the smallest disk currently in use in the vdev (you can also set the property autoexpand to on to do this automatically instead of manually).

On the other hand, you cannot add (instead of replace) any drive to an existing Z1/Z2/Z3 vdev without completely destroying and recreating the pool (losing all your data in the process). On mirrors and basic vdevs, you can add more drives to create a 2-mirror, 3-mirror, 4-mirror etc, but this only increases reliability, not usable size.

7,426

Paul

Background is in physical chemistry and NMR instrumentation, no working as a software developer.

Updated on September 18, 2022

Comments

Paul over 1 year
I currently have a setup where I am using an old desktop as a media server, but I have no fault tolerance and the amount of media on there is too large for me to reasonably back it all up. I'm not terribly concerned about losing it in case of a drive failure, since it's just movies and TV shows and the like (many of which I still have the DVDs for, packed away somewhere), but I'm currently upgrading my system and I'd like to add in some fault tolerance here.

Currently, I have a 1TB, 2TB and 3TB drive, with probably around 5.5TB used, and I'm thinking that I will buy two more 4TB drives and set up a 4TB x 4TB x 3TB RAIDZ array.

My questions are:
1. My understanding is that for heterogeneous disk arrays like this, the pool size will be limited to the size of the smallest disk, which would mean I'd be looking at a 3 x 3 x 3 TB RAIDZ with 6TB usable space and 2TB of non-fault-tolerant space - is this true?*
2. Assuming #1 is true, when I eventually need more space, if I add a single 4TB or 6TB drive to the array, will it be a simple matter to extend the pool to become a 4 x 4 x 4 TB array, or will I need to find somewhere to stash the 6TB of data while upgrading the array?
^{_{*The 2TB of non-fault-tolerant space is not that big a deal, because I was planning on setting aside around 2TB for "stuff that needs proper backup" (personal photos, computer snapshots, etc), which I would mirror to the remaining 2TB disk and a 2nd 2TB external drive that I will keep somewhere else.}}
Paul over 7 years

The 2 TB of non-fault tolerant space would not be backed up by ZFS to the offline disks, sorry if that was not clear. I was suggesting that I would back it up by normal disk syncing operations like rsync.
Paul over 7 years

I'm not sure why you are suggesting RAIDZ2. Is it more likely that I will develop two simultaneous drive failures if I use RAIDZ1 than if I use no RAID? I want some improvement to the fault tolerance of my system. Nothing unrecoverable will exist in only one place, so the RAID, array is just a matter of convenience. If I'm using half my disks for fault tolerance, I might as well just use traditional offline backups.
user121391 over 7 years

@Paul: While rebuilding a failed drive, all data from all drives must be read. This increases the stress on the disks (especially if they are mostly idle normally) and therefore the chance of another drive failing. Additionally, while reading all data you may get an URE from one of the disks with no second disk to compensate, which means files can be damaged/lost. Third, the bigger your disks are, the longer your window of vulnerability becomes, not just for those problems, but any problems that may occur on the disks or system (power outage etc.).
user121391 over 7 years

@Paul Sorry, I've misread your question (my first answer was about reliability of RAIDZ1 in case of one dead drive). To answer it: in this case, you get more reliability with Z1 over basic vdevs while everything works normally, but after one disk has died, your chance for further corruption actually increases.
Paul over 7 years

@MichaelKjörling Well, like I said, everything unrecoverable will be backed up. This is for a home media server if I didn't mention this, so uptime is not a major issue. I will ask a separate question to resolve this RAIDZ1 vs RAIDZ2 issue.
user over 7 years

@Paul Please do link to it from here in a comment as well; that sounds useful. You may want to review What are the different widely used RAID levels and when should I consider them? and Hot spare or extra parity drive in RAID array? and to a lesser extent Is bit rot on hard drives a real problem? What can be done about it?, all three on Server Fault because this is typically an enterprise, not home, consideration.
Paul over 7 years

Here is the new question about RAIDZ1 vs. nothing.