ZFS vs XFS
Solution 1
I've found XFS more well suited to extremely large filesystems with possibly many large files. I've had a functioning 3.6TB XFS filesystem for over 2 years now with no problems. Definately works better than ext3, etc at that size (especially when dealing with many large files and lots of I/O).
What you get with ZFS is device pooling, striping and other advanced features built into the filesystem itself. I can't speak to specifics (I'll let others comment), but from what I can tell, you'd want to use Solaris to get the most benefit here. It's also unclear to me how much ZFS helps if you're already using hardware RAID (as I am).
Solution 2
ZFS will give you advantages beyond software RAID. The command structure is very thoughtfully laid out, and intuitive. It's also got compression, snapshots, cloning, filesystem send/receive, and cache devices (those fancy new SSD drives) to speed up indexing meta-data.
Compression:
#zfs set compression=on filesystem/home
It supports simple to create copy-on-write snapshots that can be live-mounted:
# zfs snapshot filesystem/home/user@tuesday
# cd filesystem/home/user/.zfs/snapshot/tuesday
Filesystem cloning:
# zfs clone filesystem/home/user@tuesday filesystem/home/user2
Filesystem send/receive:
# zfs send filesystem/home/user@tuesday | ssh otherserver "zfs receive -v filesystem/home/user"
Incremental send/receive:
# zfs send -i filesystem/home/user@tuesday | ssh otherserver "zfs receive -v filesystem/home/user"
Caching devices:
# zpool add filesystem cache ssddev
This is all just the tip of the iceberg, I would highly recommend getting your hands on an install of Open Solaris and trying this out.
http://www.opensolaris.org/os/TryOpenSolaris/
Edit: This is very old, Open Solaris has been discontinued, the best way to use ZFS is probably on Linux, or FreeBSD.
Full disclosure: I used to be a Sun storage architect, but I haven't worked for them in over a year, I'm just excited about this product.
Solution 3
using lvm snapshots and xfs on live filesystems is a recipe for disaster especially when using very large filesystems.
I've been running exclusively on LVM2 and xfs for the last 6 years on my servers (at home even since zfs-fuse is just plain too slow)...
However, I can no longer count the different failure modes I encountered when using snapshots. I've stopped using them altogether - it's just too dangerous.
The only exception I'll make now is my own personal mailserver/webserver backup, where I'll do overnight backups using an ephemeral snapshot, that is always equal the size of the source fs, and gets deleted right afterwards.
Most important aspects to keep in mind:
- if you have a big(ish) filesystem that has a snapshot, write performance is horribly degraded
- if you have a big(ish) filesystem that has a snapshot, boot time will be delayed with literally tens of minutes while the disk will be churning and churning during import of the volume group. No messages will be displayed. This effect is especially horrid if root is on lvm2 (because waiting for the root device will timeout and system doesn't boot)
- if you have a snapshot it is very easy to run out of space. Once you run out of space, the snapshot is corrupt and cannot be repaired.
- Snapshots cannot be rolledback/merged at the moment (see http://kerneltrap.org/Linux/LVM_Snapshot_Merging). This means the only way to restore data from a snapshot is to actually copy (rsync?) it over. DANGER DANGER: you do not want to do this if the snapshot capacity is not at least the size of the source fs; If you don't you'll soon hit the brick wall and end up with both the source fs and the snapshot corrupted. (I've been there!)
Solution 4
A couple additional things to think about.
If a drive dies in a hardware RAID array regardless of the filesystem that's on top of it all the blocks on the device have to be rebuilt. Even the ones that didn't hold any data. ZFS on the other hand is the volume manager, the filesystem, and manages data redundancy and striping. So it can intelligently rebuild only the blocks that contained data. This results in faster rebuild times other than when the volume is 100% full.
ZFS has background scrubbing which makes sure that your data stays consistent on disk and repairs any issues it finds before it results in data loss.
ZFS file systems are always in a consistent state so there is no need for fsck.
ZFS also offers more flexibility and features with it's snapshots and clones compared to the snapshots offered by LVM.
Having run large storage pools for large format video production on a Linux, LVM, XFS stack. My experience has been that it's easy to fall into micro-managing your storage. This can result in large amounts of unused allocated space and time/issues with managing your Logical Volumes. This may not be a big deal if you have a full time storage administrator who's job is to micro-manage the storage. But I've found that ZFS's pool storage approach removes these management issues.
Solution 5
ZFS is absolutely amazing. I am using it as my home file server for a 5 x 1 TB HD file server, and am also using it in production with almost 32 TB of hard drive space. It is fast, easy to use and contains some of the best protection against data corruption.
We are using OpenSolaris on this server in particular because we wanted to have access to newer features and because it provided the new package management system and way of upgrading.
Related videos on Youtube
Tamas Czinege
Updated on September 17, 2022Comments
-
Tamas Czinege almost 2 years
We're considering building a ~16TB storage server. At the moment, we're considering both ZFS and XFS as filesystem. What are the advantages, disadvantages? What do we have to look for? Is there a third, better option?
-
Sam Go over 13 yearsDon't even compare them. ZFS is a modern enterprise-level file system like jfs2, wafl. XFS was good 10 years ago but today it's just a stone age fs.
-
Mei over 12 yearsIn some ways, you can't compare them: XFS is a filesystem; ZFS is a filesystem and so much more: it replaces the filesystem, the volume manager (like LVM), and RAID besides. However, JFS is no longer maintained if memory serves: however, XFS is active and maintained and robust. Either way - ZFS or XFS - you can't go wrong in my opinion.
-
SvennD over 7 yearsI still think this question is relevant, so Ill write our experience here : XFS is simple, you install it, you run it, its quick, it works. (HW raid below). ZFS is save, has compression, but is allot of work to get tuned to work as fast as XFS. So it also depends on the situation you are expecting the server to run. (backend of cluster. user storage, archive, ...)
-
skan about 7 yearsThere is also Hammer2 dragonflybsd.org/hammer
-
-
Brian Gianforcaro about 15 yearsFreeBSD has a mature native port of ZFS
-
Kjetil Limkjær about 15 yearswiki.freebsd.org/ZFSKnownProblems I think your definition of mature might be different from mine :-) Maybe I'd consider it after 8.0 is released.
-
Avery Payne about 15 yearsThe key feature of ZFS that you (usually) don't get elsewhere is block-level CRC, which is supposed to detect (and hopefully prevent) silent data corruption. Most filesystems assume that if a write completed successfully, then the data was indeed written to disk. That isn't always the case, especially if a sector is starting to go "marginal". ZFS detects this by checking the CRC against the resulting write.
-
Avery Payne about 15 yearsAnd yes, I do like XFS alot. :) The only gotcha that you have to keep in mind is the propensity to zero out sectors that were "bad" during a journal recovery. In some (rare) cases, you can end up with some data loss... Found this paper with the Google search term "xfs zeros out sectors upon recovery" pages.cs.wisc.edu/~vshree/xfs.pdf
-
iSee about 15 yearsOne of the things I like at XFS is the
xfs_fsr
"defragmentation" program. -
Angiosperm almost 15 yearsThere is also the option to use Nexenta: A Linux (Ubuntu) based distribution which uses the OpenSolaris kernel. It was created for (file) servers.
-
Walter over 14 yearsFreeBSD 7.2 after 20090601 have rendered most of the ZFSKnownProblems moot. If you are running the AMD64 version of the OS, it is now stable. In 8.0, FreeBSD has marked ZFS as stable enough for Production.
-
sehe over 14 yearsAs it happens, just today someone confirmed that the vg with snapshot - unable-to-boot-linux is still current: bugs.launchpad.net/lvm2/+bug/360237
-
sehe over 12 yearsRevisiting this bug, they still think that the abysmal boot problems with snaphots are "normal behaviour for lvm": bugs.launchpad.net/lvm2/+bug/360237/comments/7 (on 2012-01-07)
-
James Moore almost 11 yearsZFS on Linux is available now (zfsonlinux.org)
-
aggregate1166877 about 8 yearsThat link didn't work for me with www. Use
http://opensolaris.org/os/TryOpenSolaris/
-
Fox about 8 yearsI'd actually say that best bet for zfs is still FreeBSD. It's been a part of the system for quite a few years. So my guess is, there's the least possibility for nasty surprises. Though it's just my $0.02.
-
sehe almost 8 yearsUpdate: Same state. Only now it's been 7 more years.
-
Jody Bruchon over 7 yearsThe utility of ZFS block-level CRCs is questionable. Hard drives and SSDs use Hamming code ECC to correct single-bit errors and report two-bit errors. If the ECC can't transparently correct the physical read error, the data is lost anyway and a read failure will be reported to the OS. CRCs don't correct errors. This feature is pushed as a major benefit of ZFS but the truth is it's redundant and has no value. As for the XFS zero-after-power-fail bug, that was corrected a long time ago and isn't relevant today.
-
shodanshok almost 5 years@JodyLeeBruchon what you wrote is incorrect: while it is true that storage devices already have parity code attached to data, it does not means they are capable of end-to-end data protection. To achieve this goal without a chechsumming filesystem, you need a) a SAS T10/DIF/DIX storage stack or b) use devicemapper dm-integrity.
-
Jody Bruchon almost 5 years@shodanshok No, what I wrote is not incorrect. What you are saying is different from what I am saying. If you are going to "correct" me, at least read what I wrote and understand what it says first.
-
shodanshok almost 5 years@JodyLeeBruchon you are free to think what you want, but a CRC/ECC which lives near the original data is not the same of end-to-end data checksum. If so, both the DIF/DIX specs and the dm-intregrity target would be wasted works. I recommend you to read the original CERN research paper about data corruption, and how end-to-end data checksum can be used to avoid these problems.
-
Jody Bruchon almost 5 years@shodanshok Again, you have failed to read and comprehend what I said. You are reading what you want to read, not what I actually said.
-
Admin about 2 yearsIt would be interesting to compare using ZFS in the same scenarios (snapshotting live system running the same software).
-
Admin about 2 years@saulius2 I think there's no comparison, certainly not since ZfsOnLinux matured and became default or support for root in some Linux distros. By which I think snapshots in LVM2 have just been superseded by other volume management as in btrfs/zfs
-
Admin about 2 yearsI am not sure LVM2 has been superseded yet, but yes, I would like it and can see this coming (albeit in a slow way). What I am not sure is whether ZFS snapshots may give less failures than LVM snapshots. My guess is that not: Live snapshots of any FS should be quite unreliable thing.
-
Admin about 2 years@saulius2 Have you ever tried it? I've been using ZFS for 10 years, and I have automatic live snapshotting in the background without even noticing. The point is that ZFS/btrfs do snapshotting at the dataset level, not just blocklevel.