How do I make my disk unmap pending unreadable sectors

10,757

Solution 1

A pending unreadable sector is one that returned a read error and which the drive has marked for remapping at the first possible opportunity. However, it can't do the remapping until one of two things happens:

  1. The sector is reread successfully
  2. The sector is rewritten

Until then, the sector remains pending. So you have two corresponding ways to deal with this:

  1. Keep trying to reread the sector until you succeed
  2. Overwrite that sector with new data

Obviously, (1) is non-destructive, so you should probably try it first, although keep in mind that if the drive is starting to fail in a serious way then continual reading from a bad area is likely to make it fail much more quickly. If you have a lot of pending sectors and other errors, and you care about the data on the drive, I recommend taking it out of service and using the excellent tool ddrescue to recover as much data as possible. Then discard the drive.

If the sector in question contains data you don't care about, or can restore from a backup, then overwriting it is probably the quickest and simplest solution. You can then view the reallocated and pending counts for the drive to make sure the sector was taken care of.

How do you find out what the sector corresponds to in the filesystem? I found an excellent article on the smartmontools web site, here, although it's fairly technical and is specific to ext2/3/4 and reiser file systems.

A simpler approach, which I used on one of my own (Mac) drives, is to use find / -xdev -type f -print0 | xargs -0 ... to read every file on the system. Make a note of the pending count before running this. If the sector is inside a file, you will get an error message from the tool you used to read the files (eg md5sum) showing you the path to it. You can then focus your attentions on re-reading just this file until it reads successfully. Often this will solve the problem, if it's an infrequently-used file which just needed to be reread a few times. If the error goes away, or you don't encounter any errors in reading all the files, check the pending count to see if it's decreased. If it has, the problem was solved by reading.

If the file cannot be read successfully after multiple tries (eg 20) then you need to overwrite the file, or the block within the file, to allow the drive to reallocate the sector. You can use ddrescue on the file (rather than the partition) to overwrite just the one sector, by copying to a temporary file and then copying back again. Note that just removing the file at this point is a bad idea, because the bad sector will go into the free list where it will be harder to find. Completely overwriting it is bad too, because again the sectors will go into the free list. You need to rewrite the existing blocks. The notrunc option of dd is one way to do this.

If you encounter no errors, and the pending count did not decrease, then the sector must be in the freelist or in part of the filesystem infrastructure (eg an inode table). You can try filling up all the free space with cat /dev/zero >tempfile, and then check the pending count. If it goes down, the problem was in the free list and has now gone away.

If the sector is in the infrastructure, you have a more serious problem, and you will probably encounter errors just walking the directory tree. In this situation, I think the only sensible solution is to reformat the drive, optionally using ddrescue to recover data if necessary.

Keep a very close eye on the drive. Sector reallocation is a very good canary in the coal mine, potentially giving you early warning of a drive that is failing. By taking early action you can prevent a later catastrophic and very painful landslide. I'm not suggesting that a few sector reallocations are an indication that you should discard the drive. All modern drives need to do some reallocation. However, if the drive isn't very old (< 1 yr) or you are getting frequent new reallocations (> 1/month) then I recommend you replace it asap.

I don't have empirical evidence to prove it, but my experience suggests that disk problems can be reduced by reading the whole disk once in a while, either by a dd of the raw disk or by reading every file using find. Almost all the disk problems I've experienced in the past several years have cropped up first in rarely-used files, or on machines that are not used much. This makes sense heuristically, too, in that if a sector is being reread frequently the drive has a chance to reallocate it when it first detects a minor problem with that sector rather than waiting until the sector is completely unreadable. The drive is powerless to do anything with a sector unless the host accesses it somehow, either by reading or writing it or by conducting one of the SMART tests.

I'd like to experiment with the idea of a nightly or weekly cron job that reads the whole disk. Currently I'm using a "poor man's RAID" in which I have a second hard drive in the machine and I back up the main disk to it every night. In some ways, this is actually better than RAID mirroring, because if I goof and delete a file by mistake I can get yesterday's version immediately from the backup disk. On the other hand, I believe a hardware RAID controller does a lot of good work in the background to monitor, report and fix disk problems as they emerge. My current backup script uses rsync to avoid copying data that hasn't changed, but in view of the need to reread all sectors maybe it would be better to copy everything, or to have a separate script that reads the entire raw disk every week.

Solution 2

  1. Backup your data
  2. Remove this device from the LVM group
  3. dd if=/dev/zero of=/dev/sdc bs=4k -- this will erase all data on /dev/sdc
  4. Include it again into the LVM group
  5. Restore your backup

Solution 3

Use Data Lifeguard Diagnostic for DOS (bootable CD) software available from Western Digital site

Share:
10,757

Related videos on Youtube

dkagedal
Author by

dkagedal

Updated on September 17, 2022

Comments

  • dkagedal
    dkagedal over 1 year

    I have a disk with some pending unreadable sectors, according to smartd. What would be the easiest way to make the disk remap them and stop smartd from complaining?

    Today, I get two of these every hour:

    Sep 10 23:15:35 hylton smartd[3353]: Device: /dev/sdc, 1 Currently unreadable (pending) sectors
    

    The system is an x86 system running Ubuntu Linux 9.10 (jaunty). The disk is part of an LVM group. This is how smartctl identifies the disk:

    Model Family:     Western Digital Caviar Second Generation Serial ATA family
    Device Model:     WDC WD5000AAKS-00TMA0
    Serial Number:    WD-WCAPW4207483
    Firmware Version: 12.01C01
    User Capacity:    500,107,862,016 bytes
    
    • dkagedal
      dkagedal over 13 years
      This problem solved itself; the disk started complaining more loudly, so I replaced it.
  • Steven D
    Steven D over 13 years
    0. Have a backup. :-)
  • dkagedal
    dkagedal over 13 years
    But this is a pending read error, so shouldn't it be enough to just read all sectors?
  • maxschlepzig
    maxschlepzig over 13 years
    @dkagedal: No, the firmware of the HD already detected that it can't read this one sector. It has no way to recover it (on its own, besides perhaps to retry and retry and have luck at some point ... hopefully it is not corrupted data then returned) and thus sets up this SMART error. But if the firmware detects a write on that specific sector, it maps this sector away (and does not use it anymore) and instead maps a spare (working) sector to this address.
  • Balakrishnan
    Balakrishnan over 13 years
    @dkagedal: Sometimes just one or two additional reads will bring the sector back. Other times, nothing will bring it back. Also, the drive decides internally whether to remap the sector or to reuse it, based on the severity of the original error, and whether it can read it back successfully after writing to it. The only way you can tell is by looking at the reallocated count for the drive. I believe that drives use fairly extensive checksumming to ensure that when data is read it is not corrupted, so you can be reasonably confident about a sector that wasn't reallocated.
  • maxschlepzig
    maxschlepzig over 13 years
    If you do backups (rsyncing to an internal disk does not count ;)) then all your data is (re-)read in certain time intervals (depending on your full/increment backup schedule). RAID or rsync are not backup substitutes. And btw, I 'believe' that you have too much faith in Hardware-RAID vendors. ;)
  • Balakrishnan
    Balakrishnan over 13 years
    @maxschlepzig: You are right. I do have a separate backup regime as well. However, my experience has been that the probability of data loss due to a drive failing far outweighs all other risks put together (theft, fire, etc.). Modern hard drives have such poor reliability that I'm completely paranoid about them nowadays. So my second internal drive is a major part of my strategy.
  • dmansfield
    dmansfield over 9 years
    I have read and re-read the contents of the disk using dd if=/dev/sda ... and sectors are still pending, any idea why?