Linux file system corruption due to improper shutdown (fs ext4)?

5,482

ext4 should be resilient against even pulling the plug. However, in order to be so, it requires the storage subsystem to not lose committed writes.

First, confirm that you're not mounting with barrier=0/nobarrier. That often improves performance, at the cost of corruption if a proper shutdown isn't performed. Also check your kernel logs to make sure barriers aren't being disabled by ext4 because something in the stack doesn't support them.

The next thing to try, at least on magnetic (non-SSD) disks is to disable the disk write cache. Sometimes disks lie about when they've actually written data to the platters—it can improve performance (as long as the power doesn't go out). Usually you can do this with hdparm -W0 (for IDE/SATA) or sdparm --clear=WCE (for SCSI/SAS). These may need to be added to your boot scripts, as especially with SATA it may be reset to default by power cycle.

There is a (rather old) script to confirm write caching isn't losing data; see Brad Fitzpatrick's diskchecker.pl blog post for the script and how to use it.

If you're on SSDs and are seeing the problem you may, unfortunately, just need to find different disks.

Share:
5,482

Related videos on Youtube

Mani
Author by

Mani

Updated on September 18, 2022

Comments

  • Mani
    Mani over 1 year

    I have been managing many Linux servers, It is very easy to play with Linux servers than any other OS. But Sometime I encounter a problem with Linux OS is that, The file system corruption. This problem does not happen in Windows server.

    I searched for a solution in Internet in detail, Mostly these are suggestion given by all.

    1. Keep a backup & restore

    My Comments ==> Agreed 100%, But I am looking for a solution, Where I don't need to struggle for restoring a crashed OS.

    1. Run fsck

    My Comments ==> In my experience, sometime in introduces additional problem.

    1. Do a proper shutdown/reboot.

    My Comments ==> Everyone wants to shutdown/reboot properly. I am talking about a rare scenario, where the server is not responding or I am not able to shutdown or reboot properly

    1. Btrfs ==>

    My Comments ==> not stable enough for production

    1. Upgrade to Ext4

    My Comments ==> already using ext4

    1. Upgrade your hard disk My Comments ==> We encounter the problem not due to disk failure, It is mainly due to improper shutdown.

    My problem with fsck:

    1. fsck corrupts the filesystem sometime when we run with -y option

    2. fsck takes around 1 or 2 days to fix the system, which is not okay for me in a production environment

    My question is, untill btrfs becomes stable, Is there any work around to solve this problem ?

    Like, "sync"ing the file system once in few minutes. or Writing some script to sync all the file system changes before rebooting

    I am looking for a solution for this problem rather than suggestions.