ZFS and green drives/TLER

5,753

Solution 1

WDC Green drives have the "deep recovery" problem. You'll need Red or RE drives to avoid it.

I have a ZFS RAIDZ of Green drives at home. They've lasted almost 3 years of Power On Hours without a single error. This may be just lucky, but errors don't generally happen all that often. So you have to ask, is the cost difference worth it. Take the value of up-time multiply by the likelihood of failure, now you have the amount of additional money you should spend on tech that would mitigate that potential failure mode. In most business situations, the answer is going to be a clear yes, as the cost difference is fairly small.

Solution 2

WD Green drives are fine as long as you know how to modify their settings. I believe WD makes a habit of standardizing their firmware across several lines and making slight modifications to settings in the firmware to differentiate those lines.

For instance, the green drives are not advertised as supporting TLER and have 'cool n' quiet' as one of their marketing buzzwords. Reds are the opposite - they have TLER and don't spin down (or timeout defaults to a much larger interval). However, both models have firmware that supports turning either of these features on or off.

I've modified Green drives' firmware settings using WDTLER and WDIDLE3 utilities, now TLER is set to on, and they don't automatically spin-down.

I am not using ZFS, but from what I understand TLER is what helps you avoid 'deep recovery' errors with ZFS and hardware RAID controllers. (It's also called ERC in Seagate-speak). (Update: I AM using ZFS now and I have subsequently turned TLER off).

I have not had any problem enabling TLER on any of the 3 WD Green drive models I have configured: WD20EZRX, WD20EARX and WD20EARS I can personally vouch for these models, but not any others.

These 6 WD 2TB green drives that I've purchased over the years are currently being used in a RAID-10 array that is using PCI passthrough to a Synology Diskstation VM in ESXi 6.5a, and were previously an MDADM/LVM array in Ubuntu 12.04 bare-metal. Controller is an Intel Cougar Point HBA on Supermicro X9SCL-F-O. Update: I still have these same drives a year later which are now in a FreeBSD 11.1-RELEASE VM using ZFS 3-vdev striped mirrors.

The drives have been running nearly 24/7 since 2010 for the EARS, since 2012 for the EARX and since 2015 for the EZRX. They seem perfectly acceptable for RAID use after using aforementioned utilities to modify firmware settings. Using them stock for a NAS without modification, however, is not recommended. The older drives have extremely high power cycle counts from being used 24/7 without turning idle/spindown off, since I didn't know about turning it off until after I had already owned them several years.

WD Reds have other advertised features, such as shock absorption which is supposed to help them in various array sizes. However, I haven't had any problem with the Green drives in a 6-drive array (I use two IStarUSA 2x5.25 to 3x3.5 hot-swap bay converters). Consequently, the experience makes me suspect these 'features' are little more than just mere marketing buzzwords to help inflate the price of their Red drives.

I'd like to add a reference for anyone interested in using WDIDLE3:

The source of the problem is Western Digital's attempt to make the device "more green" - use less electricity. One way to accomplish this goal is to park the heads on a plastic pad after eight seconds of no read/write requests instead of allowing them to float over the spinning platters of the hard drive. This adds up to 10,800 cycles each day. The numerous scrapings gradually wears out the heads. According to some literature, 250,000 to 1,250,000 cycles will result in damage that will lead to read/write errors. If you do the math, data corruption will begin within 23.148 to 115.741 days if you are employing the hard drive on a heavily used server. Regular consumers will not notice read/write problems until later. Some WD drives reported 3,000 to 5,000 cycles per day. At this rate, the first instances of data corruption will begin within 83.33 to 250 days.

From my experience, early data loss will not be noticed by the average user. There are no signs of trouble if work files are not accessed, edited, and save. With numerous usages, lost sectors on the hard drive appear and indexes become corrupted. Then, damages become apparent. During bootup, Windows OS will begin employing Check Disk (chkdsk/f) to repair errors. Chunks of bad information get deleted and corrupted indexes are re-corrected during the process. Eventually, 50%-to-60% drive gets wiped out before the user realizes the problem. He accesses a file, and there is none. Using a file manager, further examinations reveal other missing data. This degradation takes time - months to a year depending on computer usage.

Nevertheless, six years of complaints have forced the manufacturer to do something - provided a firmware fix. WDIDLE3.EXE software is used to reset the parking cycle to as high as five minutes. For normal users, this change brings down the parking cycle to 133 per day. This is within the industrial average. Most drives experience 10 to 200 per day and are rated around 600,000. WDIDLE3.EXE can also turn off head parking. Unfortunately, this is not recommended. Users have reported that drive speed was reduced to a crawl or exhibited read/write problems. . This solution is a masterpiece in public relations. Instead of deactivating or eliminating the eight second head parking cycle on newly manufactured drives, WD forces the user to make the firmware change after the sale. The process is not easy, and the company's website does not explain or provide any information - it provides just the software. The procedure requires unplugging all other devices that are connected to SATA ports and numerous resets to the BIOS. The computer must boot in DOS via a CD or USB 2.0 thumb drive and typing the required codes. Just finding the necessary software to create the booting device is a pain.

As a result, non-technical consumers will not do anything and allow their hard drives to malfunction. For the "techies," it will take hours of research, internet searches, and trial-and-error. Hopefully, they will also be discouraged. In one stroke, the company has placated the critics and still maintain high sales volume.

I have already done the necessary work. So, here is the easiest procedure using a booting USB 2.0 drive.

DOWNLOAD THE FOLLOWING PROGRAMS. . . .

. . . . . HP USB Disk Storage Format Tool

. . . . . Z-Zip

. . . . . wdidle3.exe

. . . . . FreeDOS (fd11src.iso)

DO THE FOLLOWING IN THIS ORDER TO CREATE A BOOTING USB 2.0 FLASH DRIVE.

. . . . . 1. Install Z-Zip

. . . . . 2. Use Z-Zip to extract HP USB Disk Storage Format Tool and FreeDOS iso.

. . . . . 3. Install the HP software.

. . . . . 4. Install a USB 2.0 flash drive on one of computer's USB 2.0 ports.

. . . . . . . Right-click the HP icon.

. . . . . . . Go to COMPATIBILITY/PRIVILEGE LEVEL.

. . . . . . . Check RUN THIS PROGRAM AS AN ADMINISTRATOR.

. . . . . . . Exit the program.

. . . . . 5. Activate the HP program by clicking its icon.

. . . . . . . Select FAT for FILE SYSTEM

. . . . . . . Place a check mark on CREATE DOS STARTUP DISK

. . . . . . . Go to USING DOS SYSTEM FILES LOCATED AT and point to the

. . . . . . . . . . subdirectory of the FreeDOS files. It is \FREEDOS\SETUP\ODIN

. . . . . 6. Format the USB 2.0 flash drive. Depending on the size, it will take time.

. . . . . 7. Use WINDOWS EXPLORER to copy WDIDLE3.EXE to your formatted USB 2.0 flash drive.

SHUT OFF YOUR COMPUTER.

. . . . . 1. Deactivate all devices connected to your SATA ports by pulling out their two cords. You do not want WDIDLE3.EXE to corrupt their firmware settings.

. . . . . 2. Connect your Western Digital Red Hard Drive.

RESTART YOUR COMPUTER.

. . . . . 1. Go into your PC's BIOS setting.

. . . . . 2. Turn AHCI off. This will enable your flash drive to be recognized.

. . . . . 3. Set the thumb drive as the first bootable drive.

. . . . . 4. Save your BIOS settings and exit.

RESTART YOUR COMPUTER. Your thumb drive should boot the computer and go into MS-DOS.

. . . . . 1. Type "wdidle3.exe" without the quotes and press ENTER. This will activate the program.

. . . . . 2. Type "wdidle3.exe /r" without the quotes and press ENTER. This will show the current timeout. The factory default is eight seconds.

. . . . . 3. Type "wdidle3.exe /s300" without the quotes and press ENTER. This changes the autopark timer to 300 seconds or five minutes - the maximum allowed.

. . . . . 4. Type "wdidle3.exe /r" without the quotes and press ENTER. This will check that the hard drive has accepted the change.

. . . . . 5. Shut off your PC.

IF YOU NEED TO PROCESS ANOTHER HARD DRIVE, pull out the two connecting cables, attach them to the next Western Digital Red drive, and repeat the above process.

ONCE FINISHED, TURN OFF YOUR COMPUTER AND PLUG YOUR SATA DEVICES BACK.

. . . . . 1. Turn on your PC

. . . . . 2. Go back into your PC BIOS setting.

. . . . . 3. Turn AHCI on.

. . . . . 4. Change your boot order.

. . . . . 5. Save your settings and exit.

Note: I did not write this guide, and as far as I remember, I had all the drives plugged in when I performed firmware modifications (you can select the particular drive using the software), and was using AHCI, and did not experience any problems. YMMV.

Share:
5,753

Related videos on Youtube

phatmanace
Author by

phatmanace

Updated on September 18, 2022

Comments

  • phatmanace
    phatmanace almost 2 years

    I'm setting up a NAS box based around freenas and ZFS.

    I've read lots of posts (like this one) about "deep recovery" and green drives when using RAID-5.

    Does ZFS (vs Raid-5) mean that this problem goes away, or should I still be looking at Red or Black drives to put into my NAS?

    • Michael Hampton
      Michael Hampton over 11 years
      How important is your data?
    • MDMarra
      MDMarra over 11 years
      @phatmanace, this is a great example of why you don't accept the first answer that gets posted if it's not something that you go out and verify yourself. It's usually a good idea to wait a day or two to let the community vote on things before you accept an incorrect answer :)
    • phatmanace
      phatmanace over 11 years
      it was the second answer chonologically. Actually I found what he had to say pretty useful, especially the links. I accepted it because it, and the links in it answered my question.
    • Michael Hampton
      Michael Hampton over 11 years
      I suppose even a wrong answer is an answer...
    • phatmanace
      phatmanace over 11 years
      potentially - yes - if it leads you to more information that helps answer the question from your perspective.
  • phatmanace
    phatmanace over 11 years
    Great answer - thanks. Followup question I guess is if ZFS does drop the drive out of the array, and sometime later the drives firmware has repaired/corrected the error - can you manually add the disk back into the pool without rebuilding it?
  • osij2is
    osij2is over 11 years
    I'd say yes (to rebuilding) if you use RAIDZ/Z2/Z3 as parity is calculated. I'd bet doing a mirror (RAID1) or 1+0, there would be little to no rebuilding as there's no parity calculation involved, but if a rebuild needed to happen, I would not be shocked to see the time to rebuild dramatically reduced for a mirror in comparison to a RAIDZ configuration.
  • Philip
    Philip over 11 years
    -1 Sorry. The problem with those forums is that way too many people, who don't know what they're talking about, weight in with opinions. Drives dropping out of arrays is annoying, but certainly the array continues to function. When a drive sits and does "deep recovery" for 30 second, the array stops working for those 30 second, including ZFS. Worse, the OS may try the same sector multiple times. It's usually not just one sector that goes bad either, it's quite a few. Every time one of these sectors is read the array stops for 30 seconds - bad news in a production business environment.
  • osij2is
    osij2is over 11 years
    I totally agree with your comment on production business environment, however, the OPs question was asking if the TLER issue goes away in ZFS RAIDZ. I personally haven't experienced the issue and since no one has definitively stated "yes" or "no", we're left with whatever info (preferably reputable) we can find. I posted from relatively reputable boards (FreeNAS, FreeBSD) so I tend to trust their information a little more than others especially regarding the subject matter. Is the information I posted inaccurate or misleading? Please feel free to correct my answer.
  • Philip
    Philip over 11 years
    The parts that you filled yourself are mostly just begging more questions. Then you "answer" with quotes from people who are mostly wrong. TLER doesn't matter to ZFS so much as to the user who will be waiting for 30 second on every HD error without it. While ZFS is happy to passthough the delay to the user, it's likely preferable to the user that major delays do drop a disk (and thus alleviate the delay). TLER is certainly not useful only to RAID controllers. Also, the Black drives don't have TLER; the Red and RE do.
  • osij2is
    osij2is over 11 years
    Ok then. Just delete my answer.
  • phatmanace
    phatmanace over 11 years
    I've unaccepted - but it feels harsh to downvote as I did find the linked information useful.
  • T. Giri
    T. Giri almost 7 years
    Enabling TLER in a parity ZFS RAID can be dangerous: serverfault.com/a/838419/101323