How are SMART selftests related to badblocks?

16,213

Solution 1

I have to disagree with voretaq7 — SMART is not magic. When you have a drive and one of it's sectors goes bad, you'll not be able to read data from it anymore. So it is perfectly possible to have an unreadable file on a modern disk drive. SMART would mark this unreadable sector as "Current Pending" and "Offline Uncorrectable" when it would be first accessed after failure.

But when this sector would be written to again then it would be remapped to remapping space, unmarked and a "Reallocated_Sector_Ct" counter would increase. Then a whole drive would be readable again.

smartctl -t long test is useful — it will test the whole drive space for unreadable sectors and log and mark as "Current Pending" and "Offline Uncorrectable" the first bad sector encountered when run. I'm configuring my servers to run this long test once per week on every drive. It does not affect normal drive functions too much, as OS requests always have priority over SMART scans.

As on a server I always run disks in RAID1 mirrors, so when a long test finds a bad sector I can rewrite its contents using data from another drive in a mirror, forcing reallocation.

badblocks is also useful sometimes — for example it'll test the whole drive and won't stop at a first error. It can test a single partition or any other part of a drive. You can use it to quickly check if a bad block was successfully reallocated.

Solution 2

Like I pointed out in my other answer, every modern hard drive has remapping space available (because especially at today's disk densities, no drive platter will be perfect - there will always be a few defects that the drive has to remap around, even on brand-new-never-been-used-came-off-the-assembly-line-into-my-hands drives).

Because of this, theoretically you should have a SMART failure reported before something like badblocks notices (end-user-visible) bad sectors on a drive.
On modern hard disks any end-user-visible bad sectors (as might be reported by badblocks or automatically detected by the OS) are a final gasp and shudder of a dying disk.


Ultimately SMART and badblocks test two different, but related, things:

SMART is a self-monitoring tool:

The hard drive knows some information about its operating parameters, and has some meta-knowledge as to what is "normal" for some, and "acceptable" for others.
If the drive senses that certain parameters are "abnormal" or "unacceptable" it will report a pre-failure condition -- in other words the drive is still functional, but might fail soon.

For example: The spindle motor normally draws 0.10 amps, but now it's drawing 0.50 amps -- an abnormally high draw that may indicate the shaft is binding or the permanent lubricant on the bearings is gone. Eventually the motor will be unable to overcome the resistance and the drive will seize.

Another example: The drive has 1000 "remap" blocks to deal with bad sectors. It has used 750 of them, and the engineers that built the drive determined that number of remaps indicates something internally wrong (bad platter, old-age failure, damaged head) - the drive will report a pre-failure condition allowing you time to get your data off before the remap space runs out and bad sectors become visible.

SMART is looking for more than bad sectors - it's a more comprehensive assessment of the drive's health. You could have a SMART pre-failure warning on a drive with no bad sectors and no read/write errors (for example, the spindle motor issue I described above).


badblocks is a tool with a specific (outdated) purpose: Find bad sectors.

badblocks comes from a time before SMART and bad-sector remapping. Back then we knew drives had imperfections, but the only way to map them out to prevent accidentally storing data there was to stress-test the disk, cause a failure, and then remember not to put data there ever again.

The reason I say it is outdated is because the electronics on modern drives already do what badblocks does, internally and a few thousand times faster. badblocks basically allows ancient drives that lack sophisticated electronics to re-map (or skip over) sectors that have failed, but modern hard drives already detect failed sectors and remap them for you.

Theoretically you could use badblocks data to have the OS remap (visible) failures as if your modern disk was an ancient Winchester disk, but that's ultimately counterproductive -- Like I said previously ANY bad sectors detected with badblocks on a modern drive are a cause to discard the entire drive as defective (or about to fail).

Visible bad sectors indicate that the drive is out of remapping space, which is relatively rare for modern disks unless they're old (nearing end of functional life) or defective (bad platters/heads from the factory).


So basically if running badblocks on a disk before you deploy it in production makes you feel better go ahead and do it, but if your disk was manufactured in this century and it shows a visible bad sector you should chuck it in the trash (or call in its warranty). For my money SMART status and defense in depth is a better use of my time than manually checking disks.

Solution 3

Good answers to this question are

https://superuser.com/a/693065

https://superuser.com/a/693064

Contrary to other answers I find badblocks not outdated but a very useful tool. Once I upgraded my pc with a new hard drive and it started running unstable. It took me quite a while to realize thanks to badblocks that the disk surface had defects. Since then I run full write-mode (destructible!) badblocks for every new hard drive I start using and never had that problem again. I highly recommend a

time sudo badblocks -swvo sdX.log /sev/sdX

for every new hard drive. It will test every single bit of the disk a few times for writing and reading and so can avoid a lot of trouble later.

During this test bad blocks will be mapped out by the drive. So the "Realocated Sector Count" should be noted before and after the test and compared with the SMART threshold since it will tell something about the health of the drive.

Solution 4

badblocks is a relic from old times and is not strictly useful, it can find a currently unreadable sector but the right thing to do with a bad sector is to recover the data from backup. What can be done if the data wasn't critical to you is to delete the associated file and write anything on that location, this will let the disk reallocate the sector if it thinks it needs to and continue working.

The disk self-test will also go around and test the entire media for various defects, it is supposed to use lower thresholds compared to what it uses in normal operation to see if the disk has many weak spots and based on vendor logic can decide that the disk is past its useful life and declare the test failed. At that point you should take all your data out or recover it from backup and replace the disk.

If a disk action (either by badblocks or normal operation) hits an unrecoverable read error the disk will automatically update its reallocation pending counter and when the reallocation is performed it will update the reallocation pending and the reallocated counters. A simple dd will get that happening as well.

If you need to choose between the two use smartctl -t long as it would have a better analysis of the disk.

I can also suggest the use of my diskscan utility https://github.com/baruch/diskscan, it works more like badblocks but tries to assess if there sectors that are going bad, sort of like a hard of hearing sector that takes a lot longer to read. This is indicative of a developing media problem and in future versions may also offer automatic attempt to help the disk fix this problem.

Share:
16,213

Related videos on Youtube

Hongli Lai
Author by

Hongli Lai

CTO, entrepreneur & consultant. Author of the Passenger application server, which is middleware that is in use by over 650.000 websites world-wide, including Apple, to help them ship software faster and to handle millions of customers per day. I have a wide range of technological competences & interests. Current specializations: full-stack web dev; Ruby development; DevOps, infrastructure & containerization; debugging, scaling & optimizing apps at scale; and systems programming. Available for hire or contracting work.

Updated on September 18, 2022

Comments

  • Hongli Lai
    Hongli Lai over 1 year

    The smartctl tool allows initiating a long self-test (smartctl -t long /dev/sda). However there's also badblocks that I can run on a drive. How are the two related? If badblocks detects bad blocks, does the drive automatically update its SMART values (e.g. by updating its relocated sectors count)? Can badblocks replace smartctl -t long, or vice versa?

  • voretaq7
    voretaq7 about 11 years
    That is a decision only you can make, however SMART is not designed to require manual intervention (it "Just Works" and you generally shouldn't be messing with it). Trust your hardware (at least to this limited extent), because if you can't trust your hardware you may as well pack up and go home.
  • endolith
    endolith over 8 years
    "modern drives already do what badblocks does, internally and a few thousand times faster" badblocks reads every byte from the drive, overwrites them with random patterns, and then puts the original data back. SMART self-tests don't do this. This should clear any "pending" sectors, if I understand correctly.
  • endolith
    endolith over 8 years
    Stress-testing a new drive while it's still under warranty is a great idea.
  • endolith
    endolith over 8 years
    "Also, badblocks is essentially obsolete in this day and age since the disks themselves will reallocate the data and there is no real need to map the bad blocks in the filesystem level anymore." Doesn't badblocks stress the drive in ways that SMART self-tests don't?
  • voretaq7
    voretaq7 over 8 years
    @endolith The tests are not direct equivalents, but the purpose they serve is equivalent (discover and allow the remapping of bad sectors). My main point was the last paragraph: if you want to run badblocks as a disk exerciser (to see if you can provoke SMART errors because it's finding a bunch of bad blocks) go for it, but if you're running badblocks today with the intention of then loading the bad-block list to avoid using those sectors (as we did in the stone age) you're Doing It Wrong: Visible bad sectors mean you should chuck the drive in the nearest trash bin.
  • Baruch Even
    Baruch Even over 8 years
    No. badblocks will do a sequential scan of the disk. It is no better than dd and will be the same as what self-test does too.
  • endolith
    endolith over 8 years
    badblocks reads every byte off the disk, replaces it with test patterns, then writes the original data back in place. SMART self-tests are similarly read-write tests?
  • Baruch Even
    Baruch Even over 8 years
    No. The smart test is read-only due to the fear that a power outage would result in corrupted data. It should be noted that if you rewrite the data you may actually fix some small issues instead of exposing a problem.
  • endolith
    endolith over 8 years
    Well badblocks' rewriting the data will clear any "Pending sectors" and make the drive either mark them as good again or reallocate them, correct?
  • Baruch Even
    Baruch Even over 8 years
    Yes. It might be what you want in such a case.
  • endolith
    endolith about 6 years
    @Hashim Yes I believe it's useful to use badblocks because it will clear pending sectors.
  • problemofficer - n.f. Monica
    problemofficer - n.f. Monica over 4 years
    Does a hard drive check whatever it has just written or only when it reads it later? In the latter case badblocks would make sense to force write/read. This part is still unclear to me. I don't see how a dd is the same and also I worry that if I write something to the hdd and read it 2 years later it will notice the error only then. So during normal operation, is a block that was just written immediately read again to see if the write operation went fine?
  • yar
    yar over 2 years
    Don't throw hard drives in the trash! Take it to an e-waste collector!