How concerned should I be about checksum errors in a ZFS mirror?

5,882

Probable causes

These errors can have different sources, some popular include:

  • Flaky connections or damaged cables
  • Sudden power loss or forceful removal of devices
  • Memory errors if using non-ECC RAM that is faulty

You can clear the errors with zpool clear, because the errors were few (instead of too many errors) and they did not affect the same blocks on each device. I suspect the FreeNAS upgrade also does this so it results in the situation you described.

Possible solutions

I would do the following:

  • Run memtest86+ overnight to rule out faults in the memory
  • Disconnect and reconnect all cables, check for lose connections, if needed replace cables (they are cheap, so simply replacing them is quicker than troubleshooting)
  • Try if running with an UPS makes a difference if you live in an area with unstable energy supply (brown-outs are a thing in some parts of the USA, I've read)
  • Test if the USB port/controller is at fault by using a different one, or a PCIe adaptor card (quite cheap)
  • Try to use another type of USB sticks, I've had good experiences with SLC-based USB3 sticks (only checksum errors were on removal or blackout)
Share:
5,882

Related videos on Youtube

James Thorpe
Author by

James Thorpe

I'm a professional software developer with commercial experience back to 2004. I work daily with .NET, JavaScript, HTML, CSS, and MySQL, and know my way around PHP and MSSQL.

Updated on September 18, 2022

Comments

  • James Thorpe
    James Thorpe almost 2 years

    I have a self-built FreeNAS system, which uses 4 HDD in one ZFS pool purely for storage, and 2 mirrored 16GB USB memory sticks in a ZFS mirror for booting from. It is stable and has been running for around 8 months. However, the overnight emails are usually along these lines:

    Checking status of zfs pools:
     NAME           SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
     freenas-boot  14.5G  1.67G  12.8G         -      -    11%  1.00x  ONLINE  -
     volume1       7.25T   457G  6.80T         -     2%     6%  1.00x  ONLINE  /mnt
    
    

      pool: freenas-boot state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 232K in 0h6m with 0 errors on Sun Oct 23 03:51:22 2016 config:

       NAME STATE READ WRITE CKSUM    freenas-boot ONLINE 0 0 0      mirror-0 ONLINE 0 0 0        gptid/253f05a3-be26-11e5-8d9c-d05099c04470 ONLINE 0 0 1        gptid/879584e0-bfb0-11e5-ba20-d05099c04470 ONLINE 0 0 9

    errors: No known data errors

    The checksum errors will occasionally increase, then when I perform a normal FreeNAS update, will all reset back to 0 again, whereupon the process will repeat, with the checksum errors creeping up now and again.

    I've seen nothing to indicate that the HDDs themselves are having any faults, and I download the FreeNAS config to an external backup location when I make changes, so I should be able to reinstall on a new flash drive if necessary.

    All that said, how concerned should I be about these checksum errors?

    • user
      user over 7 years
    • GGGuest
      GGGuest about 5 years
      Well, based on personal experience, I would say very concerned. Thank to checksum errors, my pool was lost as I was unable to import it.
    • James Thorpe
      James Thorpe about 5 years
      @GGGuest I’d completely forgotten about this question. After asking it, I tried a different usb memory stick, only to quickly have the same issue. I chose the enclosure and motherboard with future expansion in mind, so had plenty of free space for new internal drives - chucked in a couple of old ones to mirror the USBs, and haven’t looked back since.