Linux file system corruption due to improper shutdown (fs ext4)?
ext4 should be resilient against even pulling the plug. However, in order to be so, it requires the storage subsystem to not lose committed writes.
First, confirm that you're not mounting with barrier=0
/nobarrier
. That often improves performance, at the cost of corruption if a proper shutdown isn't performed. Also check your kernel logs to make sure barriers aren't being disabled by ext4 because something in the stack doesn't support them.
The next thing to try, at least on magnetic (non-SSD) disks is to disable the disk write cache. Sometimes disks lie about when they've actually written data to the platters—it can improve performance (as long as the power doesn't go out). Usually you can do this with hdparm -W0
(for IDE/SATA) or sdparm --clear=WCE
(for SCSI/SAS). These may need to be added to your boot scripts, as especially with SATA it may be reset to default by power cycle.
There is a (rather old) script to confirm write caching isn't losing data; see Brad Fitzpatrick's diskchecker.pl blog post for the script and how to use it.
If you're on SSDs and are seeing the problem you may, unfortunately, just need to find different disks.
Related videos on Youtube
Mani
Updated on September 18, 2022Comments
-
Mani over 1 year
I have been managing many Linux servers, It is very easy to play with Linux servers than any other OS. But Sometime I encounter a problem with Linux OS is that, The file system corruption. This problem does not happen in Windows server.
I searched for a solution in Internet in detail, Mostly these are suggestion given by all.
- Keep a backup & restore
My Comments ==> Agreed 100%, But I am looking for a solution, Where I don't need to struggle for restoring a crashed OS.
- Run fsck
My Comments ==> In my experience, sometime in introduces additional problem.
- Do a proper shutdown/reboot.
My Comments ==> Everyone wants to shutdown/reboot properly. I am talking about a rare scenario, where the server is not responding or I am not able to shutdown or reboot properly
- Btrfs ==>
My Comments ==> not stable enough for production
- Upgrade to Ext4
My Comments ==> already using ext4
- Upgrade your hard disk My Comments ==> We encounter the problem not due to disk failure, It is mainly due to improper shutdown.
My problem with fsck:
fsck corrupts the filesystem sometime when we run with -y option
fsck takes around 1 or 2 days to fix the system, which is not okay for me in a production environment
My question is, untill btrfs becomes stable, Is there any work around to solve this problem ?
Like, "sync"ing the file system once in few minutes. or Writing some script to sync all the file system changes before rebooting
I am looking for a solution for this problem rather than suggestions.
-
Rui F Ribeiro about 8 yearsbodhi.zazen is probably right. I would like to alert to other problems. unix.stackexchange.com/questions/248037/…