Hard-drive errors
Solution 1
From the smartctl
man page:
The Attribute table printed out by smartctl also shows the "TYPE" of the Attribute. Attributes are one of two possible types: Pre-failure or Old age. Pre-failure Attributes are ones which, if less than or equal to their threshold values, indicate pending disk failure. Old age, or usage Attributes, are ones which indicate end-of-product life from old-age or normal aging and wearout, if the Attribute value is less than or equal to the threshold. Please note: the fact that an Attribute is of type 'Pre-fail' does not mean that your disk is about to fail! It only has this meaning if the Attribute´s current Normalized value is less than or equal to the threshold value.
If the Attribute´s current Normalized value is less than or equal to the threshold value, then the "WHEN_FAILED" column will display "FAILING_NOW". If not, but the worst recorded value is less than or equal to the threshold value, then this column will display "In_the_past". If the "WHEN_FAILED" column has no entry (indicated by a dash: ´-´) then this Attribute is OK now (not failing) and has also never failed in the past.
So according to the smartctl
output section you have posted, your drive actually looks in good shape. However, that doesn't necessarily mean that there is not another problem.
Unfortunately the Unhandled sense code
message does mean that something went wrong, but the kernel doesn't know what. You could try looking at the rest of the smartctl
output to see if there is any thing wrong. There should be a part tha summarises the drive's overall health. You can get it on its own with the -H
option.
If the drive supports self testing, you can start one with:
smartctl -t long /dev/sda
This starts one in the background, so you will have to keep checking for results. If the drive is not mounted, you can add the -C
option enable captive mode which should take less time. A short
test is also possible, but less thorough.
It is also a good idea to check physical connectors etc to make sure nothing as come loose - its an easy fix if it has.
Update
Wikipedia has a good reference for smart attributes. Note that the 'Better' column refers to the raw values in rightmost column of the output and not the normalised value at the start. Here is the part on 'Current Pending Sector' mentioned by frostschutz:
Count of "unstable" sectors (waiting to be remapped, because of unrecoverable read errors). If an unstable sector is subsequently read successfully, the sector is remapped and this value is decreased. Read errors on a sector will not remap the sector immediately (since the correct value cannot be read and so the value to remap is not known, and also it might become readable later); instead, the drive firmware remembers that the sector needs to be remapped, and will remap it the next time it's written. However some drives will not immediately remap such sectors when written; instead the drive will first attempt to write to the problem sector and if the write operation is successful then the sector will be marked good (in this case, the "Reallocation Event Count" (0xC4) will not be increased). This is a serious shortcoming, for if such a drive contains marginal sectors that consistently fail only after some time has passed following a successful write operation, then the drive will never remap these problem sectors.
Solution 2
Your drive has 1 Current Pending Sector; which means the sector could not be read correctly. usually this is a hardware issue, and results in a failed read during a SMART self-test. If you write this sector, it may either "fix" the issue or turn into Reallocated Sector.
Since technically the drive already lost data at this point, I would no longer trust it for important stuff.
Related videos on Youtube
UVV
Updated on September 18, 2022Comments
-
UVV over 1 year
My
/home
file system is JFS, it got to RO mode several times already, so I had to reboot/remount it. I saw this at '/var/log/messages`:Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925711] ata2.00: configured for UDMA/133 Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925755] sd 1:0:0:0: [sda] Unhandled sense code Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925759] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925763] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925770] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925778] 0e 5a b2 b8 Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925782] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925785] sd 1:0:0:0: [sda] CDB: Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925815] sd 1:0:0:0: [sda] Unhandled sense code Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925817] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925820] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925825] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925833] 00 00 00 00 Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925836] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925839] sd 1:0:0:0: [sda] CDB: Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925863] sd 1:0:0:0: [sda] Unhandled sense code Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925865] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925868] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925872] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925879] 00 00 00 00 Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925882] sd 1:0:0:0: [sda] Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925885] sd 1:0:0:0: [sda] CDB: Dec 31 10:12:49 uvv-laptop-y570 kernel: [ 983.925908] ata2: EH complete
And
smartctl -a /dev/sda
gave me this:SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 179 174 021 Pre-fail Always - 2008 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1005 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 13675 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 998 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 37 193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 810861 194 Temperature_Celsius 0x0022 106 091 000 Old_age Always - 41 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
Hard-drive model:
Model Family: Western Digital Scorpio Blue Serial ATA (Adv. Format) Device Model: WDC WD7500BPVT-24HXZT3 Serial Number: WD-WX91A91R4010 LU WWN Device Id: 5 0014ee 601b831c9 Firmware Version: 03.01A03
Upd: I started another self-test (the first one I did several months ago) and got some updates:
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 13680 229857912 # 2 Extended offline Completed without error 00% 9661 - # 3 Extended offline Completed: read failure 90% 9654 96004576 # 4 Extended offline Completed: read failure 90% 9653 96004576
lines from #2 to #4 I already had before. I followed these guides: Badblock HOWTO and Debug the Filesystem. It seems the block is not reported as erroneous anymore, but it's not in Relocated blocks are not increased as well. The only thing that have been increased is Raw_Read_Error_Rate after I wrote zero to a bad block.
The questions is should I consider ordering a new hard-drive?
-
psusi over 9 yearsIs this SATA or USB?
-
UVV over 9 years@psusi it's SATA
-
psusi over 9 yearsThose messages don't make sense then. There should be lower level messages from libata explaining what happened, and they should be translated to a scsi sense code that is handled. What kernel version is this?
-
UVV over 9 years@psusi Below/above I have same messages. It's 3.10.17 x86_64. I updated the drive's name in original post.
-
Graeme over 9 yearsAt the very least it looks like there is a firmware bug somewhere around the reporting/remapping of bad blocks if not something more serious. I would definitely be looking at new ones now. It may be ok for a bit now, but it probably won't be long until it plays up again, so I would only use it for something non critical/low use.
-
psusi over 9 yearsThe log just looks odd but I don't think it is any cause for alarm. The drive should be fine if you repaired the bad sector. On the other hand, with over 13,000 power on hours, it is getting a bit up there in age. Also the load cycle count is absurdly high and approaching the design limit of the drive, so perhaps it would be a good idea to replace it.
-
-
UVV over 9 yearsActually the thing you refer to I had several months back. I wrote 0x00 to it and treated it relocated already. Agree, I backed up important data already.