zpool status reports error ... what next?

17,521

Solution 1

Type zpool clear raid2 to clear the errors and initiate a scrub.

If the errors persist following that, replace the disk.

More details about the hardware would help, so this is generic advice. My recommendation for bunch of consumer disks connected to a PC motherboard are different than what I'd do for enterprise-level gear.

Solution 2

The tool tells you what you need to do: "Determine if the device needs to be replaced".

The tools are only so intelligent and need you, as the human administrator, to figure some things. The steps required are specific to your hardware and your set up, so you will need to make some decisions based on your knowledge of the system.

Take a look at the output from the command. It looks like device gptid/5fe33556-3ff2-11e2-9437-f46d049aaeca is experiencing 'WRITE' errors. '1.13M' is a very high error rate and I suspect the problem has been occurring for a while without you noticing. See if you can figure out why and then replace the disk.

If you have a hardware controller, that controller might have additional tools to help you determine the nature of the failure.

ZFS can deal with corrupt sectors, so there is no need to panic. But don't ignore the problem either.

As a preventative measure, you should also run a ZFS scrub regularly. See http://doc.freenas.org/index.php/ZFS_Scrubs . This will alert you when ZFS first encounters a problem, well before you hit the 1.13M mark.

Solution 3

Use the following command change out /dev/adaX for your drives.

[blackout@freenas ~]# smartctl -a /dev/ada0 | grep "Serial"
Serial Number: WD-WCC4EXXXXXXXX
also a helpful commant [blackout@freenas ~]# glabel status

Solution 4

Although the question is old, it might be looked at by other people.

If so, remember, the output of zpool status and zpool status -v relate to all errors experienced. That includes errors due to your motherboard SATA ports (if used), the HBA card (if used), the SATA cables themselves..... not just the disks.

Three quick diagnostic tests are - check the disk quickly using smartctl, check the card is correctly seated and not loose, and try a different port or SATA cable (the cable is a common cause of read/write errors).

Share:
17,521

Related videos on Youtube

Dan
Author by

Dan

Updated on September 18, 2022

Comments

  • Dan
    Dan almost 2 years

    On our FreeNAS server, zpool status gives me:

      pool: raid2
     state: ONLINE
    status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
    action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
       see: http://www.sun.com/msg/ZFS-8000-9P
     scrub: none requested
    config:
    
        NAME                                            STATE     READ WRITE CKSUM
        raid2                                           ONLINE       0     0     0
          raidz1                                        ONLINE       0     0     0
            gptid/5f3c0517-3ff2-11e2-9437-f46d049aaeca  ONLINE       0     0     0
            gptid/5fe33556-3ff2-11e2-9437-f46d049aaeca  ONLINE       3 1.13M     0
            gptid/60570005-3ff2-11e2-9437-f46d049aaeca  ONLINE       0     0     0
            gptid/60ebeaa5-3ff2-11e2-9437-f46d049aaeca  ONLINE       0     0     0
            gptid/61925b86-3ff2-11e2-9437-f46d049aaeca  ONLINE       0     0     0
    
    errors: No known data errors
    

    What should I do? scrub the pool?

  • Dan
    Dan about 10 years
    uh oh ... after zpool clear raid2, zpool status gave DEGRADED and that disk is UNAVAIL. No point in scrubbing now, right? Need to replace disk? But ... not sure how to identify it. Is there a way to get serial number for gptid/5fe33556-3ff2-11e2-9437-f46d049aaeca?
  • ewwhite
    ewwhite about 10 years
    +1. ZFS is hard.
  • Andreas Mattisson
    Andreas Mattisson almost 10 years
    zdb raid2, will give the GUID for the disk. But I don't think this will give out the serialnumber.