How concerned should I be about checksum errors in a ZFS mirror?
Probable causes
These errors can have different sources, some popular include:
- Flaky connections or damaged cables
- Sudden power loss or forceful removal of devices
- Memory errors if using non-ECC RAM that is faulty
You can clear the errors with zpool clear
, because the errors were few (instead of too many errors
) and they did not affect the same blocks on each device. I suspect the FreeNAS upgrade also does this so it results in the situation you described.
Possible solutions
I would do the following:
- Run memtest86+ overnight to rule out faults in the memory
- Disconnect and reconnect all cables, check for lose connections, if needed replace cables (they are cheap, so simply replacing them is quicker than troubleshooting)
- Try if running with an UPS makes a difference if you live in an area with unstable energy supply (brown-outs are a thing in some parts of the USA, I've read)
- Test if the USB port/controller is at fault by using a different one, or a PCIe adaptor card (quite cheap)
- Try to use another type of USB sticks, I've had good experiences with SLC-based USB3 sticks (only checksum errors were on removal or blackout)
Related videos on Youtube
James Thorpe
I'm a professional software developer with commercial experience back to 2004. I work daily with .NET, JavaScript, HTML, CSS, and MySQL, and know my way around PHP and MSSQL.
Updated on September 18, 2022Comments
-
James Thorpe almost 2 years
I have a self-built FreeNAS system, which uses 4 HDD in one ZFS pool purely for storage, and 2 mirrored 16GB USB memory sticks in a ZFS mirror for booting from. It is stable and has been running for around 8 months. However, the overnight emails are usually along these lines:
Checking status of zfs pools: NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT freenas-boot 14.5G 1.67G 12.8G - - 11% 1.00x ONLINE - volume1 7.25T 457G 6.80T - 2% 6% 1.00x ONLINE /mnt
pool: freenas-boot state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://illumos.org/msg/ZFS-8000-9P scan: scrub repaired 232K in 0h6m with 0 errors on Sun Oct 23 03:51:22 2016 config:
NAME STATE READ WRITE CKSUM freenas-boot ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 gptid/253f05a3-be26-11e5-8d9c-d05099c04470 ONLINE 0 0 1 gptid/879584e0-bfb0-11e5-ba20-d05099c04470 ONLINE 0 0 9
errors: No known data errors
The checksum errors will occasionally increase, then when I perform a normal FreeNAS update, will all reset back to 0 again, whereupon the process will repeat, with the checksum errors creeping up now and again.
I've seen nothing to indicate that the HDDs themselves are having any faults, and I download the FreeNAS config to an external backup location when I make changes, so I should be able to reinstall on a new flash drive if necessary.
All that said, how concerned should I be about these checksum errors?
-
user over 7 years
-
GGGuest about 5 yearsWell, based on personal experience, I would say very concerned. Thank to checksum errors, my pool was lost as I was unable to import it.
-
James Thorpe about 5 years@GGGuest I’d completely forgotten about this question. After asking it, I tried a different usb memory stick, only to quickly have the same issue. I chose the enclosure and motherboard with future expansion in mind, so had plenty of free space for new internal drives - chucked in a couple of old ones to mirror the USBs, and haven’t looked back since.
-