how to know if need to run e2fsck in order to fix corrupted blocks?

5,820

RHEL boot scripts will fsck on each boot. Rebooting for this purpose accomplishes the same thing as your procedure to switch to single user mode.

You know the file system journal is not clean if booting hangs because fsck is waiting for input. It will tell you not clean and prompt for any repairs made.

You can force a check with touch /forcefsck and reboot.

Edit: You have to umount to properly fsck, per below from the e2fsck man page. You also need to ommit -n "no" option when you want to repair the file systems.

Note that in general it is not safe to run e2fsck on mounted filesystems. The only exception is if the -n option is specified, and -c, -l, or -L options are not specified. However, even if it is safe to do so, the results printed by e2fsck are not valid if the filesystem is mounted. If e2fsck asks whether or not you should check a filesystem which is mounted, the only correct answer is 'no'.

You don't have to reboot to do this, it just is convenient. If the downtime for your services is problem for you, bring a copy of the data online on a different storage system while repairing this one. Also known as put your business continuity plan in place.

To see a real unclean fsck on a test system:

  1. Backup any data you care about.
  2. Start a write workload such as a storage benchmark with fio
  3. Crash the system hard. On Linux, tryecho 'c' > /proc/sysrq-trigger
Share:
5,820
shalom
Author by

shalom

Updated on September 18, 2022

Comments

  • shalom
    shalom almost 2 years

    We want to check the filesystem on the disks as /deb/sdc ... /dev/sdg on each Red Hat Linux machine.

    The target is to find what are the disks that require e2fsck ( as e2fsck -y /dev/sdb etc.)

    According to man page

    -n
    Open the filesystem read-only, and assume an answer of 'no' to all questions. Allows e2fsck to be used non-interactively. This option may not be specified at the same time as the -p or -y options.

    When we run the command (example)

     e2fsck -n /dev/sdXX
    

    we get

    e2fsck 1.42.9 (28-Dec-2013)
    Warning!  /dev/sdc is mounted.
    Warning: skipping journal recovery because doing a read-only filesystem check.
    /dev/sdc: clean, 94/1310720 files, 156685/5242880 blocks
    

    So what do we need to capture from e2fsck -n output, that requires us to run e2fsck (without -n)?

    e2fsck process

    init 1
    umount /dev/sdXX
    e2fsck -y /dev/sdXX  # (or e2fsck -C /dev/sdXX for full details) 
    init 3
    
    • Admin
      Admin over 6 years
      What problem are you trying to solve?
    • Admin
      Admin over 6 years
      Why not try this on a test system, as described in John Mahowald's answer? You need to account for multiple failure modes, so you should really test those scenarios by reproducing them in a test environment. You could also do some research into ext* filesystems to answer your own questions. e2fsck has a huge amount of readily available documentation that will cover the scenarios you describe, as well as equip you to act on the information it provides. If you're so concerend with online detection of block level corruption and actually want the ability to repair it, look at BTRFS.
    • Admin
      Admin over 6 years
      You should use the database function for detecting corruption.
  • John Mahowald
    John Mahowald over 6 years
    Technically, no. You can umount and fsck file systems. This requires shutting down services too so you might as well reboot. See my edit.