How to repair a corrupted HFS+ partition from a damaged hard-disk?

107,179

Solution 1

It seems I found this thread too late, but for future readers of this thread:

When doing data rescue, your first step should be to do a full image of the drive with dd or something similar (cloneZilla is a popular choice).

In other words, get an imprint of the drive as-is so that you do no further damage to the volume whilst trying to rescue the data.

Solution 2

Based on the first smart log, you probably have 124 bad sectors -- this part:

196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       124

You should perform a full scan to be sure as to the actual number. Some files are damaged for sure. That's why you have Error Count: 1210 , each time a damaged sector is read, you get +1 error. You have to write zeros to these sectors if you ever want to make the disk work because the sectors can be relocated only when they are being written. I can't tell you how to operate on hfs filesystem because I've never used it. If you had ext4, you could use fsck with the following options:

 -c                   Check for bad blocks and add them to the badblock list
 -l bad_blocks_file   Add to badblocks list

So, you could get the list of all bad blocks, and then you could know their location, which can help to estimate if the bad blocks damaged regular files or something else. I was using debugfs. Then you could write zeros to these sectors via:

# hdparm --yes-i-know-what-i-am-doing --write-sector 104284160 /dev/sdb

and delete the damaged files.

From the log you provided:

ls: cannot access .hotfiles.btree: Input/output error
ls: cannot access .journal: Input/output error
ls: cannot access .journal_info_block: Input/output error
ls: cannot access .Spotlight-V100: Input/output error
ls: cannot access .Trashes: Input/output error
ls: cannot access home: Input/output error
ls: cannot access libpeerconnection.log: Input/output error
ls: cannot access net: Input/output error
ls: reading directory .: Input/output error

You know which files are damaged.

Next thing is the value of the following parameter:

193 Load_Cycle_Count        0x0032   037   037   000    Old_age   Always       -       635340

This kills your disk, or already killed it. My disk, for instance, has lifetime 500k. I found out about this when it reached 350k... Ultimately, I disabled this feature in my disk firmware, and it works so far.

The next thing is the filesystem supperblock. You should have done a copy of that block for this kind of situation, and after removing badblocks, you could restore the superblock. If the superblock got damaged, and you don't have the backup, or you don't remember the location of backup sectors, you won't be able to recover data. Try to use:

# file -s /dev/sdb1
# file -s /dev/sdb

But you can read the filesystem (ls command), so it's not that bad, and I think after removing the badblocks and deleting some files, everything should be fine.

Share:
107,179

Related videos on Youtube

landroni
Author by

landroni

Updated on September 18, 2022

Comments

  • landroni
    landroni over 1 year

    I have a friend's Mac OS X disk that comes with an HFS+ partition. I am supposed to recover the personal data from this disk, and I'm not yet sure if the filesystem is corrupted or the disk is dying).


    Background: The full symptoms are as follows. The drive gets recognized by Linux and even gets automounted (using Xfce here):

    liv@liv-HP-Compaq-dc7900:~$ cat /etc/mtab | grep -i hfs
    /dev/sdb2 /media/Macintosh\040HD hfsplus ro,nosuid,nodev,uhelper=udisks 0 0
    

    The kernel reports the following:

    [ 4382.681310] usb 2-5: USB disconnect, device number 2
    [ 4390.104044] usb 2-5: new high-speed USB device number 3 using ehci_hcd
    [ 4390.259178] Initializing USB Mass Storage driver...
    [ 4390.259983] scsi6 : usb-storage 2-5:1.0
    [ 4390.260077] usbcore: registered new interface driver usb-storage
    [ 4390.260079] USB Mass Storage support registered.
    [ 4391.260684] scsi 6:0:0:0: Direct-Access     ASMT     2105             0    PQ: 0 ANSI: 6
    [ 4391.261346] sd 6:0:0:0: Attached scsi generic sg2 type 0
    [ 4391.494924] sd 6:0:0:0: [sdb] 488397168 512-byte logical blocks: (250 GB/232 GiB)
    [ 4391.495668] sd 6:0:0:0: [sdb] Write Protect is off
    [ 4391.495672] sd 6:0:0:0: [sdb] Mode Sense: 43 00 00 00
    [ 4391.496551] sd 6:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
    [ 4391.560091]  sdb: sdb1 sdb2
    [ 4391.565039] sd 6:0:0:0: [sdb] Attached SCSI disk
    [..]
    [10376.614742] hfs: Filesystem was not cleanly unmounted, running fsck.hfsplus is recommended.  mounting read-only.
    [10380.531230] sd 6:0:0:0: [sdb] Unhandled sense code
    [10380.531234] sd 6:0:0:0: [sdb]  Result: hostbyte=invalid driverbyte=DRIVER_SENSE
    [10380.531239] sd 6:0:0:0: [sdb]  Sense Key : Medium Error [current] 
    [10380.531243] sd 6:0:0:0: [sdb]  Add. Sense: Unrecovered read error
    [10380.531253] sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 1e 22 e8 00 00 08 00
    [10380.531259] end_request: critical target error, dev sdb, sector 1975016
    [10380.531264] Buffer I/O error on device sdb2, logical block 195672
    [10384.353981] sd 6:0:0:0: [sdb] Unhandled sense code
    [10384.353985] sd 6:0:0:0: [sdb]  Result: hostbyte=invalid driverbyte=DRIVER_SENSE
    [10384.353990] sd 6:0:0:0: [sdb]  Sense Key : Medium Error [current] 
    [10384.353995] sd 6:0:0:0: [sdb]  Add. Sense: Unrecovered read error
    [10384.354004] sd 6:0:0:0: [sdb] CDB: Read(10): 28 00 00 1e 22 e8 00 00 08 00
    [10384.354011] end_request: critical target error, dev sdb, sector 1975016
    [10384.354015] Buffer I/O error on device sdb2, logical block 195672
    

    Here's relevant output from lshw:

     *-scsi
          physical id: 3
          bus info: usb@2:5
          logical name: scsi7
          capabilities: emulated scsi-host
          configuration: driver=usb-storage
        *-disk
             description: SCSI Disk
             product: 2105
             vendor: ASMT
             physical id: 0.0.0
             bus info: scsi@7:0.0.0
             logical name: /dev/sdb
             version: 0
             serial: 00000000000000000000
             size: 232GiB (250GB)
             capabilities: gpt-1.00 partitioned partitioned:gpt
             configuration: ansiversion=6 guid=6b43402b-9887-4a33-a329-9801b59ccdc7
           *-volume:0
                description: Windows FAT volume
                vendor: BSD  4.4
                physical id: 1
                bus info: scsi@7:0.0.0,1
                logical name: /dev/sdb1
                version: FAT32
                serial: 70d6-1701
                size: 199MiB
                capacity: 199MiB
                capabilities: boot fat initialized
                configuration: FATs=2 filesystem=fat label=EFI name=EFI System Partition
           *-volume:1
                description: Apple HFS partition
                vendor: Mac OS X (fsck)
                physical id: 2
                bus info: scsi@7:0.0.0,2
                logical name: /dev/sdb2
                version: 4
                serial: d9a741cc-8313-cc78-0000-000000800000
                size: 232GiB
                capabilities: journaled bootable osx hfsplus initialized
                configuration: boot=osx checked=2009-09-24 02:29:07 created=2009-09-23 17:29:07 filesystem=hfsplus lastmountedby=fsck modified=2013-11-03 01:02:00 name=Customer state=unclean
    

    When I open the drive in Thunar, I get the following error message: "Failed to open directory "Macintosh HD". Error when getting information for file '/media/Macintosh HD/.journal': Input/output error." (I can access the mount point and some subdirs, though, if I use emelFM2.)

    enter image description here

    If I try ls on the mount point, I get a bunch of I/O errors:

    liv@liv-HP-Compaq-dc7900:/media/Macintosh HD$ ls -lha
    ls: cannot access .hotfiles.btree: Input/output error
    ls: cannot access .journal: Input/output error
    ls: cannot access .journal_info_block: Input/output error
    ls: cannot access .Spotlight-V100: Input/output error
    ls: cannot access .Trashes: Input/output error
    ls: cannot access home: Input/output error
    ls: cannot access libpeerconnection.log: Input/output error
    ls: cannot access net: Input/output error
    ls: reading directory .: Input/output error
    total 20M
    drwxrwxr-t 1 root   80   35 Oct 13 22:56 .
    drwxr-xr-x 3 root root 4.0K Jan 16 21:09 ..
    drwxrwxr-x 1 root   80   53 Oct 18 22:07 Applications
    drwxr-xr-x 1 root root   39 Sep 26 00:51 bin
    drwxrwxr-t 1 root   80    2 Jul  9  2009 cores
    dr-xr-xr-x 1 root root    2 Jul  9  2009 dev
    -rw-rw-r-- 1  501   80  16K Sep  8 14:19 .DS_Store
    lrwxr-xr-x 1 root root   11 Sep 24  2009 etc -> private/etc
    ---------- 1 root   80    0 Jul  9  2009 .file
    drwx------ 1   99   99  246 Nov  3 00:29 .fseventsd
    lrwxr-xr-x 1 root   80   60 Mar 20  2010 Guides de l’utilisateur et informations -> /Library/Documentation/User Guides and Information.localized
    dr-xr-xr-t 1 root root    2 Sep 24  2009 .HFS+ Private Directory Data?
    d????????? ? ?    ?       ?            ? home
    -????????? ? ?    ?       ?            ? .hotfiles.btree
    -????????? ? ?    ?       ?            ? .journal
    -????????? ? ?    ?       ?            ? .journal_info_block
    -????????? ? ?    ?       ?            ? libpeerconnection.log
    drwxrwxr-t 1 root   80   58 Mar 27  2013 Library
    drwxrwxrwt 1 root root    4 Sep 18  2012 lost+found
    -rw-r--r-- 1 root root  20M Jun  8  2011 mach_kernel
    d????????? ? ?    ?       ?            ? net
    drwxr-xr-x 1 root root    2 Jul  9  2009 Network
    drwxr-xr-x 1  501   80    3 Oct 26  2010 opt
    drwxr-xr-x 1 root root    6 Sep 24  2009 private
    drwxr-xr-x 1 root root   67 Sep 26 00:52 sbin
    d????????? ? ?    ?       ?            ? .Spotlight-V100
    drwxr-xr-x 1 root root    4 Jul  3  2011 System
    lrwxr-xr-x 1 root root   11 Sep 24  2009 tmp -> private/tmp
    d????????? ? ?    ?       ?            ? .Trashes
    drwxr-xr-x 1 root root    2 May 18  2009 .vol
    -rw-r--r-- 1  501   80  70K Jun 26  2013 .VolumeIcon.icns
    

    Lastly, I already tried to install hfsprogs and run fsck.hfsplus, but without much luck:

    root@liv-HP-Compaq-dc7900:/home/liv# fsck.hfsplus -q /dev/sdb2
    ** /dev/sdb2
    QUICKCHECK ONLY; FILESYSTEM DIRTY
    
    root@liv-HP-Compaq-dc7900:/home/liv# fsck.hfsplus -d /dev/sdb2
    ** /dev/sdb2
        Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K.
    ** Checking HFS Plus volume.
       Invalid B-tree node size
    (8, 0)
    ** Volume check failed.
    volume check failed with error 7 
        volume type is pure HFS+ 
        primary MDB is at block 0 0x00 
        alternate MDB is at block 0 0x00 
        primary VHB is at block 2 0x02 
        alternate VHB is at block 487725342 0x1d12191e 
        sector size = 512 0x200 
        VolumeObject flags = 0x07 
        total sectors for volume = 487725344 0x1d121920 
        total sectors for embedded volume = 0 0x00 
    

    Question: From the error messages above, is the filesystem corrupted or the drive failing? How can I fix the corrupted filesystem? And if that's not the issue, how can I recover the user data from a partially failing disk?


    UPDATE1:

    Given the useful input that I got from Which 'smartctl -d' option should I use on this hard-disk: 'scsi' or 'ata'?, I now managed to successfully run smartctl on the hard-drive:

    root@liv-HP-Compaq-dc7900:/home/liv# smartctl -d sat -H -i -c -A -l error -l selftest -l selective '/dev/sdb'
    smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-57-generic] (local build)
    Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
    
    === START OF INFORMATION SECTION ===
    Device Model:     TOSHIBA MK2555GSXF
    Serial Number:    10J9SA69S
    LU WWN Device Id: 5 000039 245a067fd
    Firmware Version: FH205B
    User Capacity:    250,059,350,016 bytes [250 GB]
    Sector Size:      512 bytes logical/physical
    Device is:        Not in smartctl database [for details use: -P showall]
    ATA Version is:   8
    ATA Standard is:  Exact ATA specification draft version not indicated
    Local Time is:    Fri Jan 17 18:02:43 2014 CET
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    [..]
    
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
      2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
      3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       1031
      4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       16237
      5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       18
      7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
      8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
      9 Power_On_Hours          0x0032   081   081   000    Old_age   Always       -       7987
     10 Spin_Retry_Count        0x0033   253   100   030    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       5274
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       1119
    192 Power-Off_Retract_Count 0x0032   084   084   000    Old_age   Always       -       8196
    193 Load_Cycle_Count        0x0032   037   037   000    Old_age   Always       -       635340
    194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       25 (Min/Max 7/49)
    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3
    197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       124
    198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x0032   200   253   000    Old_age   Always       -       0
    220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       57
    222 Loaded_Hours            0x0032   087   087   000    Old_age   Always       -       5415
    223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
    224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
    226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       346
    240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0
    254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       8107
    
    SMART Error Log Version: 1
    ATA Error Count: 1210 (device log contains only the most recent five errors)
    [..]
    
    Error 1210 occurred at disk power-on lifetime: 7984 hours (332 days + 16 hours)
      When the command that caused the error occurred, the device was active or idle.
    
      After command completion occurred, registers were:
      ER ST SC SN CL CH DH
      -- -- -- -- -- -- --
      40 51 08 e8 22 1e 40  Error: UNC 8 sectors at LBA = 0x001e22e8 = 1975016
    
      Commands leading to the command that caused the error were:
      CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
      -- -- -- -- -- -- -- --  ----------------  --------------------
      25 da 08 e8 22 1e 40 00      00:08:36.484  READ DMA EXT
      25 da 08 e8 22 1e 40 00      00:08:32.637  READ DMA EXT
      25 da 08 00 66 22 40 00      00:08:32.637  READ DMA EXT
      25 da 08 f8 65 22 40 00      00:08:32.625  READ DMA EXT
      25 da 08 50 c3 28 40 00      00:08:32.625  READ DMA EXT
    
    [..]
    
    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]
    
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    

    I'm not sure how to parse this output, but two things pop to my eyes:

    • SMART overall-health self-assessment test result: PASSED
    • ATA Error Count: 1210 (device log contains only the most recent five errors)

    So how bad is it? And how should I proceed?


    UPDATE2:

    Following the suggestions in the comments, I used a Mac OS X to run diskutil verifyVolume:

    mac:~ admin$ diskutil list
    [..]
    /dev/disk1
    #:                       TYPE NAME                    SIZE       IDENTIFIER
    0:      GUID_partition_scheme                        *250.1 GB   disk1
    1:                        EFI                         209.7 MB   disk1s1
    2:                  Apple_HFS Macintosh HD            249.7 GB   disk1s2
    
    mac:~ admin$ diskutil verifyVolume /dev/disk1s2
    Started filesystem verification on disk1s2 Macintosh HD
    Checking Journaled HFS Plus volume
    Invalid B-tree node size
    The volume Macintosh HD could not be verified completely
    Error: -9957: Filesystem verify or repair failed
    Underlying error: 8: POSIX reports: Exec format error
    

    And fsck:

    mac:~ admin$ fsck -d /dev/disk1s2 
    ** /dev/rdisk1s2
    BAD SUPER BLOCK: MAGIC NUMBER WRONG
    
    LOOK FOR ALTERNATE SUPERBLOCKS? [yn] y
    
    SEARCH FOR ALTERNATE SUPER-BLOCK FAILED. YOU MUST USE THE
    -b OPTION TO FSCK TO SPECIFY THE LOCATION OF AN ALTERNATE
    SUPER-BLOCK TO SUPPLY NEEDED INFORMATION; SEE fsck(8).
    

    So, how bad are these error messages? Is the drive toast?


    UPDATE3:

    I played a bit more with smartctl and it seems to me (but please confirm!!) that the drive is definitely toast:

    # 'smartctl' -d sat,16 -H -i -c -A -l error -l selftest -l selective '/dev/sdb'
    
    smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-57-generic] (local build)
    Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
    
    === START OF INFORMATION SECTION ===
    Device Model:     TOSHIBA MK2555GSXF
    Serial Number:    10J9SA69S
    LU WWN Device Id: 5 000039 245a067fd
    Firmware Version: FH205B
    User Capacity:    250,059,350,016 bytes [250 GB]
    Sector Size:      512 bytes logical/physical
    Device is:        Not in smartctl database [for details use: -P showall]
    ATA Version is:   8
    ATA Standard is:  Exact ATA specification draft version not indicated
    Local Time is:    Mon Jan 27 15:20:57 2014 CET
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: FAILED!
    Drive failure expected in less than 24 hours. SAVE ALL DATA.
    See vendor-specific Attribute list for failed Attributes.
    
    General SMART Values:
    Offline data collection status:  (0x00) Offline data collection activity
                        was never started.
                        Auto Offline Data Collection: Disabled.
    Self-test execution status:      (  88) The previous self-test completed having
                        the electrical element of the test
                        failed.
    Total time to complete Offline 
    data collection:        (  120) seconds.
    Offline data collection
    capabilities:            (0x5b) SMART execute Offline immediate.
                        Auto Offline data collection on/off support.
                        Suspend Offline collection upon new
                        command.
                        Offline surface scan supported.
                        Self-test supported.
                        No Conveyance Self-test supported.
                        Selective Self-test supported.
    SMART capabilities:            (0x0003) Saves SMART data before entering
                        power-saving mode.
                        Supports SMART auto save timer.
    Error logging capability:        (0x01) Error logging supported.
                        General Purpose Logging supported.
    Short self-test routine 
    recommended polling time:    (   2) minutes.
    Extended self-test routine
    recommended polling time:    (  90) minutes.
    SCT capabilities:          (0x0039) SCT Status supported.
                        SCT Error Recovery Control supported.
                        SCT Feature Control supported.
                        SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
      2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
      3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       1025
      4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       1
      5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
      8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
      9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       0
     10 Spin_Retry_Count        0x0033   100   100   030    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1
    191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
    193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       3
    194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       27 (Min/Max 26/30)
    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
    197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x0032   200   253   000    Old_age   Always       -       0
    220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       57
    222 Loaded_Hours            0x0032   100   100   000    Old_age   Always       -       0
    223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
    224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
    226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       353
    240 Head_Flying_Hours       0x0001   001   001   001    Pre-fail  Offline  FAILING_NOW 3
    254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0
    
    Error SMART Error Log Read failed: scsi error badly formed scsi parameters
    Smartctl: SMART Error Log Read Failed
    Error SMART Error Self-Test Log Read failed: scsi error badly formed scsi parameters
    Smartctl: SMART Self Test Log Read Failed
    Error SMART Read Selective Self-Test Log failed: scsi error badly formed scsi parameters
    Smartctl: SMART Selective Self Test Log Read Failed
    

    I could single out:

    • SMART overall-health self-assessment test result: FAILED! Drive failure expected in less than 24 hours. SAVE ALL DATA.
    • 240 Head_Flying_Hours 0x0001 001 001 001 Pre-fail Offline FAILING_NOW 3

    I suspect that any solutions like testdisk or photorec on the drive itself are pretty much out of the question right now. So my only hope to rescue any data would be to get myself a bigger hard-disk and make a bit-for-bit copy of the failing drive using dd or ddrescue, and then play with photorec on the resulting image. Any other ideas most welcome!


    UPDATE4:

    As inquired in Recovering data from a damaged hard-drive: the "freezer trick", I am posting the output of smartctl -H /dev/yourdisk and smartctl -A /dev/yourdisk:

    [Output was misleading so I removed that. See UPDATE5.]
    

    Does this allow to identify the type of failure?


    UPDATE5:

    About a week ago I foolishly ran testdisk on the disk for a night (after a couple of native Mac OS X fsck attempts), and the damage likely became worse than it had been when the owner simply had dropped it. At the end of the testdisk session I was clearly hearing a clicking noise ("click of death"?), and the drive was incapable of any further reads (all reads resulted in an error). Initially I assumed that this was happening because of overheating, but now I tend to believe that the damage simply propagated and the drive is now in very bad shape.

    When I try to run smartctl short self-test on the drive, the test Completed with electrical failure and the smartctl output is the same as in UPDATE3, including the 240 Head_Flying_Hours 0x0001 001 001 001 Pre-fail Offline FAILING_NOW 3 error.

    enter image description here

    I also attempted a ddrescue session, which ended with a grand total of 0 bytes rescued.

    root@xubuntu:/mnt/ram# ddrescue -f -n /dev/sdc /dev/sda /mnt/ram/ddrescue.log
    
    Press Ctrl-C to interrupt
    Initial status (read from logfile)
    rescued:         0 B,  errsize:       0 B,  errors:       0
    Current status
    rescued:         0 B,  errsize:    250 GB,  current rate:        0 B/s
       ipos:     65024 B,   errors:       1,    average rate:        0 B/s
       opos:     65024 B,     time from last successful read:     3.5 m
    Finished                   
    

    At each read attempt the kernel was complaining in dmesg of Buffer I/O error on device:

    [ 3706.642819] sd 9:0:0:0: [sdc]  Sense Key : Medium Error [current]
    [ 3706.642824] sd 9:0:0:0: [sdc]  Add. Sense: Unrecovered read error
    [ 3706.642834] sd 9:0:0:0: [sdc] CDB: Read(10): 28 00 00 00 00 18 00 00 08 00
    [ 3706.642842] end_request: critical target error, dev sdc, sector 24
    [ 3706.642845] Buffer I/O error on device sdc, logical block 3
    [ 3710.910060] sd 9:0:0:0: [sdc] Unhandled sense code
    [ 3710.910064] sd 9:0:0:0: [sdc]  Result: hostbyte=invalid driverbyte=DRIVER_SENSE
    

    So all this definitely points to hardware damage. But what is the exact type of damage? (In part, I would like to check if the "freezer trick" is in any way appropriate.)

    As suggested in a related question, I checked How to Recover Data When Your Hard Drive Goes Belly Up and it seems to me, given the symptoms that I've noticed, that it's either:

    • Your Drive Is Spinning Up and Making Clicking Noises, or
    • Your Drives Spins Ups and Is Detected by Your Computer, But Hangs When You Try to Access It

    So given all the additional information posted here, is it possible identify the type of failure that the disk is experiencing? And would the "freezer trick" be appropriate in this case?

    (It was suggested to me that "When read write heads touch the surface of the disk, then they are toss and twist, so no read is possible anymore", and this does sound like a realistic explanation, but I'm not sure how to confirm it.)

    • Admin
      Admin over 10 years
      These kernel messages are saying you have a bad sector. You could use smartctl to check if its just a bad sector or few, or the disk things the end is near.
    • Admin
      Admin over 10 years
      @derobert Now I managed to run smartctl on the drive (see UPDATE1 in the OP). Any insights?
    • Admin
      Admin over 10 years
      @bahamat At least in Debian, hfsprogs upstream is... Apple. Though the Debian package is outdated. Finding the latest version upstream is, well, challenging... It's probably opensource.apple.com/source/hfs/hfs-226.1.1 but that has a lower version, so who knows!
    • Admin
      Admin over 10 years
      @derobert: Well...Debian's version is 332.25-11. According to the changelog 332.25-1 was a new upstream release matching Mac OS X 10.4 in 2009. Prior to that the minor version was 332.23. So it looks like Debian's version is tracking Tiger. If it were me, I'd be nervous running Tiger's fsck for anything modern.
    • Admin
      Admin over 10 years
      @bahamat I tried native fsck and diskutil verifyVolume, both ending up in errors (see UPDATE2). Please advise.
  • landroni
    landroni over 10 years
    A couple of questions. "Load_Cycle_Count" What does it mean, and why does it kill the disk? How can you disable it in the firmware? "The next thing is the filesystem supperblock. You should have done a copy of that block [..]" I should have done a copy of the whole thing as soon as I got it!! But anyways, now the superblock is likely fried, as lately when I connected the disk the kernel was complaining of corrupted superblock in dmesg.
  • VinoPravin
    VinoPravin over 10 years
    Read about it here community.wd.com/t5/My-Book-Live/… -- I have WD disk, and I can use idle3-tools -- idle3-tools.sourceforge.net
  • landroni
    landroni over 10 years
    Thanks. One more question: what does the 240 Head_Flying_Hours 0x0001 001 001 001 Pre-fail Offline FAILING_NOW 3 indicated in UPDATE3?
  • landroni
    landroni over 10 years
    Thanks. I do expect the disk to fail soon, but let's see how much I can ddrescue before that. When I got that reading, the disk was quite hot after a much too long testdisk session.
  • landroni
    landroni over 10 years
    I did some more debugging; could you take a look at UPDATE5?
  • Pro Backup
    Pro Backup over 10 years
    From the same community.wd.com link: "Load_cycle_count" will endure at least that number. I have desktop drives that have well over 1 million LC's and they're still run fine (they're rated for only 300,000). Load cycle count is a guaranteed minimum not a maximum value.
  • VinoPravin
    VinoPravin over 10 years
    @landroni -- I'm not sure if freezing a disk is a good idea, but if your disk is dead, or almost dead, and you can't simply make it work, you can't do more harm, so if you have no other options, just do it, but I don't think it will help.
  • landroni
    landroni over 10 years
    @MikhailMorfikov Yup, the disk is almost dead, so I may as well try it. Thanks.
  • user
    user almost 10 years
    What is different from normal about DiskWarrior's "scavenge" mode? Why are these tools particularly well-suited to resolve the OP's situation?
  • Pro Backup
    Pro Backup almost 10 years
    @MichaelKjörling DiskWarrior's "scavenge" is optimized for recovering files after another disk repair utility has already been used to recover files. The OP's corruption might be caused by logical or physical issues. DiskWarrior specializes in eliminating "logical" directory errors, SpinRite specializes in recovering data caused by "physical" hard drive issues.
  • user
    user almost 10 years
    You should edit your post to include additional information, then possibly use an @-comment to notify the original commenter. Information in comments only has a tendency to easily get lost to the noise.
  • landroni
    landroni over 9 years
    Yup, exactly! That was my initial error all along. All that testdisk, FS checking, etc. tools did was to aggravate the hardware issues. Probably it is best to clone the image of the drive using a tool like ddrescue, since it allows to intelligently skip over problematic sectors (whereas dd shall simply fail at the first sign of hardware damage).
  • Remy A.
    Remy A. almost 9 years
    I have to recommend AGAINST SpinRite. When I tried salvaging data from a hard disk it only wasted precious time on the dying disk (>24 hours for less than 1 MB, until I gave up and tried something else). Finally I managed to get back about 80% of the data by using Linux (ddrescue and dd_rescue). I have had good experiences with DiskWarrior though, though not in scavenge mode.
  • Remy A.
    Remy A. almost 9 years
    Even better, use dd_rescue instead of ddrescue. It actively expects and deals with runs of damaged sectors, jumping to sane zones to get as much as possible of the easily readable data before returning to retry the slow failing zones. Genius and free.
  • forgotstackxpassword
    forgotstackxpassword about 8 years
    I've had a very similar situation a few times, troubleshooting a new one right now; and I have to 2nd @hmijail on that: ddrescue, testdisk, and other free software tools have proven most helpful so far. jpgrecover on stock Debian works wonderfully. Also, counterpoint an annoying OSX tool called Easeus file recovery wizard works well, but NOT if the disk is messed up like in OPs question, here.
  • David Balažic
    David Balažic almost 8 years
    "You have to write zeros to these sectors if you ever want to make the disk work because the sectors can be relocated only when they are being written." Why would this only work with zeroes and not other values?
  • VinoPravin
    VinoPravin almost 8 years
    To be honest, I don't know. I asked that question here: unix.stackexchange.com/questions/304962