ReFS but for Linux

linux filesystems storage corruption badblocks

6,787

Solution 1

If you're looking for advanced filesystems for general-purpose computers in the Linux world, there are two candidates: ZFS and BTRFS. ZFS is older and more mature, but it's originally from Solaris and the port to Linux isn't seamless. BTRFS is still under heavy development, and not all features are ready for prime time yet.

Both filesystems offer per-file checksumming, so you will know if a file is corrupted; this is more of a security protection than a protection against failing hardware, because failing hardware tends to make a file unreadable, the hardware has its own checksums so reading wrong data is extremely unlikely (if a disk read returns wrong data, and you're sure it's not an application error, blame your RAM, not your disk).

If you want resilience, by far the best thing to do is RAID-1 (i.e. mirroring) over two disks. When a disk starts failing, it's rare that only a few sectors are affected; usually, more sectors follow quickly, if the disk hasn't stopped working altogether. So replicating data over the same disk doesn't help very often. Replicating data over two disks doesn't need any filesystem support. The only reason you might want to replicate data on the same disk is if you have a laptop which can only accommodate one disk, but even then the benefits are very small.

Remember that no matter how much replication you have, you still need to have offline backups, to protect against massive hardware failures (power surge, fire, …) and against software-level problems (e.g. accidental file deletion or overwrite).

Solution 2

BtrFS can do "RAID1" with a single HDD. Meaning it will put a file twice across the disk. It also stores a checksum of each file, if one file becomes corrupted it can give you the other copy.

Check out their wiki.

Solution 3

ZFS has by default multiple copies of every meta data block. You can enable this feature for data blocks and then have some protection against (localized and non massive) disk errors.

http://blogs.oracle.com/bill/entry/ditto_blocks_the_amazing_tape

Automatic ZFS Snapshots are also a popular way to protect files against accidental deletion or corruption.

6,787

Camilo Martin

Remember: don't take things too seriously. Especially online.

Updated on September 18, 2022

Comments

Camilo Martin almost 2 years

Microsoft is going to bring a "Resilient FileSystem" with Windows 8, but only for servers. I'd like that on a Linux desktop, but my search reveals no contender. There are so many filesystems for Linux, that maybe I've just missed it.

What I expect from such a filesystem is that a bad block won't screw up either files or the journal. I'm no FS geek, so please explain if such error-resilience is unfit for a desktop/CPU intensive/memory intensive/lowers the HDD's lifespan/is already in some FS like Ext4/etc.

Is there something like this available for Linux?
- elika kohen about 9 years
  
  As others point out "ZFS", I just hit on your other points: (A.) Linux support is ongoing, and gparted supports it. (B.) ReFS, ZFS, are more expensive than ext4, NTFS, etc. But have different purposes. (C.) Though ReFS will be extended in the future, it is not, and cannot be, a replacement for NTFS--now : blogs.technet.com/b/askpfeplat/archive/2013/01/02/…; (D.) I use ReFS to store very large directory structures of source code and large trees of EncFS encrypted files with large file names;
- Warren P about 6 years
  
  ZFS on linux is a terrible idea. If you want a storage appliance and you want ZFS, choose a BSD and use it.
Camilo Martin over 12 years

Oh that's nice. So let's say one of the two copies caputs, the other one will automatically overwrite the corrupted one? What if both files contain errors but in different blocks?
Shadur over 12 years

Then you're SOL. As usual, Microsoft promises revolutionary new features that, once you look into what they actually do, really fail to impress...
Camilo Martin over 12 years

@Shadur can't agree any more. Otherwise I wouldn't be asking on Unix SE :) But this FS really looks cool, auto-defrag, compression, and error-resilience via RAID1 on the same disk looks very good. Besides I believe two bad blocks for the same file on two different parts of an HDD is the definition of SOL.
Camilo Martin over 12 years

So what you're saying is that if I have any bad blocks it's a sign the disk will soon belly-up so there's no point in a filesystem trying to circumvent them? (I thought bad blocks were something that could happen sometime and the disk would still be good for the most part).
Gilles 'SO- stop being evil' over 12 years

I don't know what the exact statistics are, but partial failures aren't the dominant failure mode for disks. It can happen that a few blocks get bad and that's it, but it's about as common that a few blocks get bad and more and more, or that the disk simply won't start, so you need to be prepared for that anyway.
Camilo Martin over 12 years

Interesting read about ZFS, I didn't think about filesystems as trees of blocks before. Snapshots also seem like a very handy (and new to me) feature, but more for protecting files against human rather than software/hardware errors.
jlliagre over 12 years

@Shadur: I don't know how btrfs actually handles the situation you describe but ZFS automatically recovers two bad blocks for a locally mirrored file.
jlliagre over 12 years

@Camilo Martin: Of course, that depends on many factors but I expect software and human errors to be more common than hardware ones ...
Camilo Martin over 12 years

Thank you. I plan in the near (or not) future to give both a spin, and in either case backups are something I'll try to worry more about. I'll accept your answer since it's the most complete one, and because I've learnt what I needed to. Thank you all.
elika kohen about 9 years

@Gilles .. Your answer is misleading: (A.) RAID 1 does not address the corruption issues that are relevant here, (the link below sheds light on this). (B.) RAID is not at all analogous to software solutions, as the different features exist. (C.) There are plenty of articles about this, but starting here will help shed light: networkcomputing.com/storage/…
Gilles 'SO- stop being evil' about 9 years

@e.s.kohen (A) RAID 1 doesn't address corruption, but that's not what this question is about. It's about the common disk failure mode where a part of the disk becomes unreadable. (B) I don't understand the dichotomy between “RAID” and “software solutions”, RAID vs other types of volumes and hardware vs software are on different axes. (C) I'm sure there are other features of ReFS that my answer doesn't address, I'm only addressing the features mentioned in the question.