smartctl retest bad sectors

hard-disk smartctl

11,871

Solution 1

I can only second the answer from vonbrand. I've seen at least two HDDs die in the past month after going to pre-fail in SMART.

However, your best bet is probably not SMART itself but instead the badblocks utility.

You can let badblocks read and rewrite the whole disk, thereby forcing your HDD to reallocate pending sectors. This usally works quite well.

If you don't have the time for running badblocks (It can take days on bigger disks) you can try to read out the SMART error log (smartctl -x /dev/<hdd> and get a list of broken sectors.

You can then use hdparm to read the sector:

hdparm --read-sector <sector> /dev/<hdd>`

If that fails you force a remap using

hdparm --yes-i-know-what-i-am-doing --write-sector <sector> /dev/<hdd>`

This works quite well (at least for WD-Green drives, can't you tell anything about other drives)

If you have dmesg log messages for failed sectors it's even easier.

sectors=$(dmesg | grep <hdd> | grep sector | awk '{print $8}')

for s in $sectors; do <hdparm stuff>; done

Before mounting the volume again do a forced fsck

fsck -f -y /dev/<hdd>

And assume to have the drive dying on you yesterday!

Good Luck :)

Solution 2

Turn off the machine, and get a replacement disk now! Bad sectors in hard disks tend to grow exponentially, massive data loss is inminent.

Solution 3

I don't believe it's possible to re-test sectors which the drive has already marked as bad and re-mapped. That would be "send it back for warranty" territory. (E.g. theoretically the vendor may have tools that can validate and reset such a drive).

11,871

Clinton

Updated on September 18, 2022

Comments

Clinton over 1 year

I got a notification today that my drive was going to fail in 24 hours. The 'Reallocated_Sector_Ct' was already around 3000 and it has since risen to over the last few hours 4004. However, I turn my case on its side a few weeks ago, and tried turning it back upright. Since then, the 'Reallocated_Sector_Ct' hasn't risen, even though currently there's a lot of disk activity as I'm tar/zipping my important data to another drive.

I know having a hard disk not being able to read on it's side is a concern, but if placing the hard drive upright seems to fix the issue for the moment, at least I don't have to panic as much.

Is there a way I can run a retest on those "bad sectors", and remark them as good if they pass the test? I'd like to see how many "really" bad sectors there are after a retest with the box upright (of course I'll do this after my backup is complete).

I'm using Debian if that makes any difference.
Clinton over 11 years

The number of bad sectors hasn't grown past 4004 since I re-orientated the disk. I have a feeling this was a temporary issue, as there's been no growth over the last 24 hours.
vonbrand over 11 years

OK, it's your data...
Clinton about 11 years

Vonbrand: You were right, the disk failed. I did save the data though.
soger over 6 years

After 10953 hours of uptime I had my first reallocated sector. Should I start worrying? Also, the command with dmesg shows that I had the problem with sector 306986 but smartctl -l selftest after short self test says LBA_of_first_error is 343292. So which one is correct and is there a way to find out which file was damaged? I am using XFS.
Alexis Wilke over 3 years

@soger The discrepancy is because you got a block number in a partition and a drive block number.
Alexis Wilke over 3 years

To test the entire drive (each block), the you probably want to use fsck -f -c -c -y /dev/<hdd>. Without at least one -c (read-only) or two -c -c (read-write) then the blocks aren't going to be tested. All fsck does in that case is verify that the existing data is mostly valid. (inodes make sense)
soger over 3 years

@AlexisWilke yeah, it's a good thing I started worrying because that hard disk has failed not long after my comment. But I got off with minimal data loss, nothing important.