How to check the life left in SSD or the medium's wear level?

114,591

Solution 1

In your first example, what I think you are referring to is the "Media Wearout Indicator" on Intel drives, which is attribute 233. Yes, it has a range of 0-100, with 100 being a brand new, unused drive, and 0 being completely worn out. According to your ouptut, this field doesn't seem to exist.

In your second example, please read the official docs about SSD_Life_Left. Per that page:

The RAW value of this attribute is always 0 and has no meaning. Check the normalized VALUE instead. It starts at 100 and indicates the approximate percentage of SDD life left. It typically decreases when Flash blocks are marked as bad, see the RAW value of Retired_Block_Count

It's really important that you fully understand what smartctl(8) is saying, and not making assumptions. Unfortunately, the S.M.A.R.T. tools aren't always up to date with the latest SSDs and their attributes. As such, there isn't always a clean way to tell how many times the chips have been written to. Best you can do, is look at the "Power_On_Hours", which in your case is "6568", determine your average disk utilization, and average it out.

You should be able to lookup your drive specs, and determine the process used to make the chips. 32nm process chips will have a longer write endurance than 24nm process chips. However, it seems that "on average", you could probably expect about 3,000 to 4,000 writes, with a minimum of 1,000 and a max of 6,000. So, if you have a 64GB SSD, then you should expect somewhere in the neighborhood of a total of 192TB to 256TB written to the SSD, assuming wear leveling.

As an example, if you're sustaining a utilization of say 11 KBps to your drive, then you could expect to see about 40 MB written per hour. At 6568 powered on hours, you've written roughly 260 GB to disk. Knowing that you could probably sustain about 200 TB of total writes, before failure, you have about 600 years before failure due to wearing out the chips. Your disk will likely fail due to worn out capacitors or voltage regulation.

Solution 2

For Samsung SSDs, check SMART attribute 177 (Wear Leveling Count).

ID # 177 Wear Leveling Count

This attribute represents the number of media program and erase operations (the number of times a block has been erased). This value is directly related to the lifetime of the SSD. The raw value of this attribute shows the total count of P/E Cycles.

Source: http://www.samsung.com/global/business/semiconductor/minisite/SSD/M2M/download/07_Communicating_With_Your_SSD.pdf

The wear level indicator starts at 100 and decreases linearly down to 1 from what I can tell. At 1 the drive will have exceeded all of its rated p/e cycles, but in reality the drive's total endurance can significantly exceed that value.

Source: http://www.anandtech.com/show/7173/samsung-ssd-840-evo-review-120gb-250gb-500gb-750gb-1tb-models-tested/3

I would suggest you take that last statement about exceeding that value with a grain of salt.

Solution 3

If you don't have an Intel-brand SSD: Be careful!! I have a Samsung SSD, and I was totally misled by erroneous attribute labeling by smartmontools /smartctl. If you have anything except Intel -- you may find my story of (inane) pain at https://askubuntu.com/a/460463/65722 helpful.

May your ratio of information-quality to time-spent-digging be better than mine!

Solution 4

having a server with an LSI raid card, I have 7 Samsung SSD's installed.

It is such that

  • /dev/sda is my operating system SSD, marked as JBOD by Raid Controller.
  • The other 7 SSD's show up only as /dev/sdb because they are RAID 0 (or RAID-?).

to get info of disks behind a raid controller the trick is to

smartctl --scan

{output is}
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/bus/0 -d megaraid,8 # /dev/bus/0 [megaraid_disk_08], SCSI device
/dev/bus/0 -d megaraid,9 # /dev/bus/0 [megaraid_disk_09], SCSI device
/dev/bus/0 -d megaraid,10 # /dev/bus/0 [megaraid_disk_10], SCSI device
/dev/bus/0 -d megaraid,11 # /dev/bus/0 [megaraid_disk_11], SCSI device
/dev/bus/0 -d megaraid,12 # /dev/bus/0 [megaraid_disk_12], SCSI device
/dev/bus/0 -d megaraid,13 # /dev/bus/0 [megaraid_disk_13], SCSI device
/dev/bus/0 -d megaraid,14 # /dev/bus/0 [megaraid_disk_14], SCSI device
/dev/bus/0 -d megaraid,15 # /dev/bus/0 [megaraid_disk_15], SCSI device

then to get the smartctl info such as

  • WEAR_LEVELING_COUNT
  • POWER_ON_HOURS
  • TEMPERATURE_CELCIUS and all that other good stuff

for each disk do

smartctl -d megaraid,8 -all /dev/bus/0
smartctl -d megaraid,9 -all /dev/bus/0
smartctl -d megaraid,10 -all /dev/bus/0
{down to}
smartctl -d megaraid,15 -all /dev/bus/0

the syntax of smartctl is smartctl [options] <device>

this is how you get in and thru a raid card when multiple disks do not show up as multiple devices such as /dev/sdb, /dev/sdc, /dev/sdd, and so on.

Share:
114,591

Related videos on Youtube

Tankman六四
Author by

Tankman六四

Updated on September 18, 2022

Comments

  • Tankman六四
    Tankman六四 almost 2 years

    We all know that SSDs have a limited predetermined life span. How do I check in Linux what the current health status of an SSD is?

    Most Google search results would ask you to look up S.M.A.R.T. information for a percentage field called Media_Wearout_Indicator, or other jargons indicators like Longterm Data Endurance -- which don't exist -- Yes I did check two SSDs, both lack these fields. I could go on to find a third SSD, but I feel the fields are not standardized.

    To demonstrate the problem here are the two examples.


    With the first SSD, it is not clear which field indicates wearout level. However there is only one Unknown_Attribute whose RAW VALUE is between 1 and 100, thus I can only assume that is what we are looking for:

        $ sudo smartctl -A /dev/sda                                             
        smartctl 6.2 2013-04-20 r3812 [x86_64-linux-3.11.0-14-generic] (local build)
        Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
    
        === START OF READ SMART DATA SECTION ===                                 
        SMART Attributes Data Structure revision number: 1                       
        Vendor Specific SMART Attributes with Thresholds:                        
        ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
          5 Reallocated_Sector_Ct   0x0002   100   100   000    Old_age   Always       -       0
          9 Power_On_Hours          0x0002   100   100   000    Old_age   Always       -       6568
         12 Power_Cycle_Count       0x0002   100   100   000    Old_age   Always       -       1555
        171 Unknown_Attribute       0x0002   100   100   000    Old_age   Always       -       0
        172 Unknown_Attribute       0x0002   100   100   000    Old_age   Always       -       0
        173 Unknown_Attribute       0x0002   100   100   000    Old_age   Always       -       57
        174 Unknown_Attribute       0x0002   100   100   000    Old_age   Always       -       296
        187 Reported_Uncorrect      0x0002   100   100   000    Old_age   Always       -       0
        230 Unknown_SSD_Attribute   0x0002   100   100   000    Old_age   Always       -       190
        232 Available_Reservd_Space 0x0003   100   100   005    Pre-fail  Always       -       0
        234 Unknown_Attribute       0x0002   100   100   000    Old_age   Always       -       350
        241 Total_LBAs_Written      0x0002   100   100   000    Old_age   Always       -       742687258
        242 Total_LBAs_Read         0x0002   100   100   000    Old_age   Always       -       1240775277
    

    So this SSD has used 57% of its rewrite life-span, is it correct?


    With the other disk, the SSD_Life_Left ATTRIBUTE stands out, but its Raw value of 0, indicating 0% life left, is unlikely for an apparently-healthy SSD unless it happen to be in peril (we will see in a few days), and if it reads "0% life has been used", also impossible for a worn hard disk (worn = used for more than a year).

        > sudo /usr/sbin/smartctl -A /dev/sda
        smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.6-4-desktop] (SUSE RPM)
        Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
    
        === START OF READ SMART DATA SECTION ===
        SMART Attributes Data Structure revision number: 10
        Vendor Specific SMART Attributes with Thresholds:
        ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
          1 Raw_Read_Error_Rate     0x000f   104   100   050    Pre-fail  Always       -       0/8415644
          5 Retired_Block_Count     0x0033   100   100   003    Pre-fail  Always       -       0
          9 Power_On_Hours_and_Msec 0x0032   100   100   000    Old_age   Always       -       4757h+02m+17.130s
         12 Power_Cycle_Count       0x0032   099   099   000    Old_age   Always       -       1371
        171 Program_Fail_Count      0x0032   000   000   000    Old_age   Always       -       0
        172 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always       -       0
        174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age   Offline      -       52
        177 Wear_Range_Delta        0x0000   000   000   000    Old_age   Offline      -       2
        181 Program_Fail_Count      0x0032   000   000   000    Old_age   Always       -       0
        182 Erase_Fail_Count        0x0032   000   000   000    Old_age   Always       -       0
        187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
        194 Temperature_Celsius     0x0022   030   030   000    Old_age   Always       -       30 (Min/Max 30/30)
        195 ECC_Uncorr_Error_Count  0x001c   104   100   000    Old_age   Offline      -       0/8415644
        196 Reallocated_Event_Count 0x0033   100   100   000    Pre-fail  Always       -       0
        231 SSD_Life_Left           0x0013   100   100   010    Pre-fail  Always       -       0
        233 SandForce_Internal      0x0000   000   000   000    Old_age   Offline      -       3712
        234 SandForce_Internal      0x0032   000   000   000    Old_age   Always       -       1152
        241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age   Always       -       1152
        242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age   Always       -       3072
    
    • Simon Gates
      Simon Gates over 10 years
      With SMART attributes, lower values are worse because the drive always alerts if a value is lower than (or equal to? Not sure) the threshold value. That having been said, it's very nice to have a wear indicator, but I hope you're not trusting precious data to any one storage device. You should be running multiple storage devices in a RAID arrangement.
    • Tankman六四
      Tankman六四 over 10 years
      How do you know my data is 'precious'? It is just an offline copy of company's knowledgabase to my laptop. I comment to make a point that people assume too often a sysop scenario. Thanks for you comments anyway.
    • Simon Gates
      Simon Gates over 10 years
      All data is precious. :) We start on that principle, then move on to data that is more precious (a photographer's digital photos, for instance) and less precious (the OS — easy to replace, but downtime and a loss of time/revenue if you have to replace it).
    • bwDraco
      bwDraco over 7 years
      Both drives are well within endurance limits. The first drive has only about 350 GiB on it, while the second drive has 1.1 TiB on it. I'm not sure what's going on here...
    • Joachim Wagner
      Joachim Wagner about 3 years
      @bwDraco "has X GB on it" is misleading as many readers will think it is how much space is used. Data may have been written to the same LBA location multiple times. The values are not unusual for small SSDs, e.g. 60 GB, that were never more than half full.
  • Tankman六四
    Tankman六四 over 10 years
    So clear, thank you. This knowledge is best made into a GUI tool utilizing smartctl or its API. Afterall calculating with a calculator by using computer as an input device and human sitting in front of it as a processor is against the spirit with which computers were invented!
  • Calculus Knight
    Calculus Knight almost 7 years
    Link is dead by now.
  • John Eikenberry
    John Eikenberry over 4 years
    I think they have the order for Wear_Leveling_Count backwards. I have 2 Samsung SSDs and the one that is ~4 years old has a RAW_VALUE of 42 and another one that is ~1 month old has a RAW_VALUE of 0. Seems to be that it starts at 0 and increments upward.