ssd 2 million hour mtbf? how is this proven?

17,487

Solution 1

MTBF is defined as the predicted elapsed time between inherent failures of a system during operation.

It literally stands for "Mean Time Between Failure". Additionally...

As you can see, MTBF refers to the failure rate of a drive over its expected lifetime. This doesn't mean a 1.2 million hour MTBF drive will last 1.2 million hours, and a 1.5 million hour MTBF drive will last 1.5 million hours (that’s 136 to 171 years by the way)

So What Does SSD MTBF Actually Mean for Me?

unfortunately, most manufacturers don’t share this information freely.

What does 2,000,000 hour MTBF Mean For Me?

In attempt to make the example used in the article specific to a drive with a 2,000,000 hour MTBF. The following math was performed to determine that one failure would happen every 250 days

2,000,000 / 8 hours a day = 250,000 / 1000 drives = 250 days.

The article originally stated that a drive with a 1.5 million hour MTBF would fail once every 150 days:

if the drive is used at an average of 8 hours a day, a population of 1000 SSDs would be expected to have one failure every 150 days ...

The article continues to indicate that MTBF isn't that great of a way to determine how reliable the drive will be.

A better way to get an idea of how long an SSD will actually last for you would be to consider the Total Bytes Written spec, or TBW. Although this is another ‘overall expectation’ figure and doesn’t directly tell you the lifespan of a drive, it will give you an idea of how one drive compares to another. Unfortunately, not all manufacturers give out this spec either.

The also article continues to explain how MTBF is normally determined.

The JEDEC JESD218A standard defines the method for testing the read/write endurance of an SSD (free registration required to view) which is the leading cause of SSD failure, but manufacturers may choose to supplement this with some additional failure tests.

Another thing to consider is what workload is used to specify the MTBF. For instance, Intel qualifies their SSDs using a workload of 20 GB of writes per day for 5 years. With this workload, along with the supplemental failure tests, the Intel 335 has an MTBF of 1.2 million hours. However if the workload was reduced to 10 GB a day, the MTBF would be 2.5 million hours. At 5 GB per day, it becomes 4 million hours.

References

  1. Understanding MTBF in SSD – What Does an SSD’s MTBF Mean for You? - Hardcoreware.com, Carl Nelson, January 6, 2013

Solution 2

Drives don't all fail at exactly the MTBF time: rather, the times at which they fail obey a particular statistical distribution with the given mean. You don't necessarily need to test for as long as the mean to get bounds on the mean, since testing for a shorter time can still give you a lot of information about the shape of the distribution.

For example, suppose you want to demonstrate that the MTBF is greater than one month. If the MTBF was only a month, you'd expect a few drives to fail very quickly so if you tested a bunch of drives for a week and none of them failed in that time, you have reasonable grounds for believing that the MTBF is quite a lot more than one week. If you test enough drives for time T, you can argue that the MTBF must be at least some larger value.

Also, they may be using an argument along the lines of "We tested the drive by reading and writing 24/7 for a month. In reality, most users only access the drive for 1% of the time that the computer is running, so most users will experience one hundred times the MTBF we found in our tests."

Another technique that may be used is to test in harsher conditions than real use. I don't know if this is used for hardware but it is used for shelf-life of foods. First, you do experiments that show, for example, that your canned whatevers degrade three times as fast when stored at 40C as they do at 20C. Then, if they're still good to eat after four months in storage at 40C, they should be good to eat after a year at 20C.

Share:
17,487

Related videos on Youtube

Lolorz12
Author by

Lolorz12

Updated on September 18, 2022

Comments

  • Lolorz12
    Lolorz12 over 1 year

    How can this ssd (corsair) have a mean time before failure of 2000000 hours? last time I checked that was in the hundreds of years...

    From experience even with Computers that don't receive constant use ssd's seem to always fail much sooner when compared to Drives with platters.

    So, If their claims are actually true, what evidence backs their claim?

    • James P
      James P almost 9 years
      Normally it's just a number for marketing, not something that has been certified independently and they don't have to back it up with anything. Plus its quite possible there could be a firmware change or something that causes thousands of units to fail within a few hours. But depending on the manufacturer, SSD's sometimes have a maximum total bytes written value specified (e.g. 75TB) and this is more relevant because exceeding it can effectively void the warranty. Typically though it would be very difficult to write that much data in the warranty period.
  • Lolorz12
    Lolorz12 almost 9 years
    interesting it'd be nice to know a little more from the companies listing this stuff what sort of tests they use to come up with some of these numbers. anyways regarding my platter comment the place i work has ~100 computers and only 8 desktops use ssd's while all use platters. we've had a higher failure ratio amongst ssd's than platters (3-4 ssd's in the past couple years vs ~7 mechanical)
  • qasdfdsaq
    qasdfdsaq almost 9 years
    Indeed, the biggest flaw in MTBF numbers is they are almost always estimated from samples of drives tested under unrealistic, accelerated ageing conditions - high temperatures, high loads, excess power cycles, etc.
  • Ramhound
    Ramhound almost 9 years
    I have removed my comments directed to @qasdfdsaq since I have addressed his concerns. I am only trying to clean up the comment section for this answer. I am not trying to hide anything by removing those comments, they are just not relevant, if I have addressed his concerns about the original source material.
  • Ramhound
    Ramhound almost 9 years
    @qasdfdsaq - I still don't know what your problem with my answer is. There is not a single word that is quoted that I changed.
  • qasdfdsaq
    qasdfdsaq almost 9 years
    I have highlighted in bold the incorrect terms you've used and pointed it out multiple times. Once again, if you cannot understand the difference then you have no understanding of MTBF.
  • Thalys
    Thalys almost 9 years
    Just to note - The answer looks fine to me, but the comments look heated. I'm cleaning up both sides of this. I'd note the best way to deal with an answer you consider wrong is to post a better answer.