How to check the life left in SSD or the medium's wear level?
Solution 1
In your first example, what I think you are referring to is the "Media Wearout Indicator" on Intel drives, which is attribute 233. Yes, it has a range of 0-100, with 100 being a brand new, unused drive, and 0 being completely worn out. According to your ouptut, this field doesn't seem to exist.
In your second example, please read the official docs about SSD_Life_Left. Per that page:
The RAW value of this attribute is always 0 and has no meaning. Check the normalized VALUE instead. It starts at 100 and indicates the approximate percentage of SDD life left. It typically decreases when Flash blocks are marked as bad, see the RAW value of Retired_Block_Count
It's really important that you fully understand what smartctl(8) is saying, and not making assumptions. Unfortunately, the S.M.A.R.T. tools aren't always up to date with the latest SSDs and their attributes. As such, there isn't always a clean way to tell how many times the chips have been written to. Best you can do, is look at the "Power_On_Hours", which in your case is "6568", determine your average disk utilization, and average it out.
You should be able to lookup your drive specs, and determine the process used to make the chips. 32nm process chips will have a longer write endurance than 24nm process chips. However, it seems that "on average", you could probably expect about 3,000 to 4,000 writes, with a minimum of 1,000 and a max of 6,000. So, if you have a 64GB SSD, then you should expect somewhere in the neighborhood of a total of 192TB to 256TB written to the SSD, assuming wear leveling.
As an example, if you're sustaining a utilization of say 11 KBps to your drive, then you could expect to see about 40 MB written per hour. At 6568 powered on hours, you've written roughly 260 GB to disk. Knowing that you could probably sustain about 200 TB of total writes, before failure, you have about 600 years before failure due to wearing out the chips. Your disk will likely fail due to worn out capacitors or voltage regulation.
Solution 2
For Samsung SSDs, check SMART attribute 177 (Wear Leveling Count).
ID # 177 Wear Leveling Count
This attribute represents the number of media program and erase operations (the number of times a block has been erased). This value is directly related to the lifetime of the SSD. The raw value of this attribute shows the total count of P/E Cycles.
The wear level indicator starts at 100 and decreases linearly down to 1 from what I can tell. At 1 the drive will have exceeded all of its rated p/e cycles, but in reality the drive's total endurance can significantly exceed that value.
I would suggest you take that last statement about exceeding that value with a grain of salt.
Solution 3
If you don't have an Intel-brand SSD: Be careful!! I have a Samsung SSD, and I was totally misled by erroneous attribute labeling by smartmontools /smartctl. If you have anything except Intel -- you may find my story of (inane) pain at https://askubuntu.com/a/460463/65722 helpful.
May your ratio of information-quality to time-spent-digging be better than mine!
Solution 4
having a server with an LSI raid card, I have 7 Samsung SSD's installed.
It is such that
- /dev/sda is my operating system SSD, marked as JBOD by Raid Controller.
- The other 7 SSD's show up only as /dev/sdb because they are RAID 0 (or RAID-?).
to get info of disks behind a raid controller the trick is to
smartctl --scan
{output is}
/dev/sda -d scsi # /dev/sda, SCSI device
/dev/sdb -d scsi # /dev/sdb, SCSI device
/dev/bus/0 -d megaraid,8 # /dev/bus/0 [megaraid_disk_08], SCSI device
/dev/bus/0 -d megaraid,9 # /dev/bus/0 [megaraid_disk_09], SCSI device
/dev/bus/0 -d megaraid,10 # /dev/bus/0 [megaraid_disk_10], SCSI device
/dev/bus/0 -d megaraid,11 # /dev/bus/0 [megaraid_disk_11], SCSI device
/dev/bus/0 -d megaraid,12 # /dev/bus/0 [megaraid_disk_12], SCSI device
/dev/bus/0 -d megaraid,13 # /dev/bus/0 [megaraid_disk_13], SCSI device
/dev/bus/0 -d megaraid,14 # /dev/bus/0 [megaraid_disk_14], SCSI device
/dev/bus/0 -d megaraid,15 # /dev/bus/0 [megaraid_disk_15], SCSI device
then to get the smartctl info such as
- WEAR_LEVELING_COUNT
- POWER_ON_HOURS
- TEMPERATURE_CELCIUS and all that other good stuff
for each disk do
smartctl -d megaraid,8 -all /dev/bus/0
smartctl -d megaraid,9 -all /dev/bus/0
smartctl -d megaraid,10 -all /dev/bus/0
{down to}
smartctl -d megaraid,15 -all /dev/bus/0
the syntax of smartctl is smartctl [options] <device>
this is how you get in and thru a raid card when multiple disks do not show up as multiple devices such as /dev/sdb, /dev/sdc, /dev/sdd, and so on.
Related videos on Youtube
Tankman六四
Updated on September 18, 2022Comments
-
Tankman六四 almost 2 years
We all know that SSDs have a limited predetermined life span. How do I check in Linux what the current health status of an SSD is?
Most Google search results would ask you to look up S.M.A.R.T. information for a percentage field called Media_Wearout_Indicator, or other jargons indicators like Longterm Data Endurance -- which don't exist -- Yes I did check two SSDs, both lack these fields. I could go on to find a third SSD, but I feel the fields are not standardized.
To demonstrate the problem here are the two examples.
With the first SSD, it is not clear which field indicates wearout level. However there is only one Unknown_Attribute whose RAW VALUE is between 1 and 100, thus I can only assume that is what we are looking for:
$ sudo smartctl -A /dev/sda smartctl 6.2 2013-04-20 r3812 [x86_64-linux-3.11.0-14-generic] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 1 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 5 Reallocated_Sector_Ct 0x0002 100 100 000 Old_age Always - 0 9 Power_On_Hours 0x0002 100 100 000 Old_age Always - 6568 12 Power_Cycle_Count 0x0002 100 100 000 Old_age Always - 1555 171 Unknown_Attribute 0x0002 100 100 000 Old_age Always - 0 172 Unknown_Attribute 0x0002 100 100 000 Old_age Always - 0 173 Unknown_Attribute 0x0002 100 100 000 Old_age Always - 57 174 Unknown_Attribute 0x0002 100 100 000 Old_age Always - 296 187 Reported_Uncorrect 0x0002 100 100 000 Old_age Always - 0 230 Unknown_SSD_Attribute 0x0002 100 100 000 Old_age Always - 190 232 Available_Reservd_Space 0x0003 100 100 005 Pre-fail Always - 0 234 Unknown_Attribute 0x0002 100 100 000 Old_age Always - 350 241 Total_LBAs_Written 0x0002 100 100 000 Old_age Always - 742687258 242 Total_LBAs_Read 0x0002 100 100 000 Old_age Always - 1240775277
So this SSD has used 57% of its rewrite life-span, is it correct?
With the other disk, the SSD_Life_Left ATTRIBUTE stands out, but its Raw value of 0, indicating 0% life left, is unlikely for an apparently-healthy SSD unless it happen to be in peril (we will see in a few days), and if it reads "0% life has been used", also impossible for a worn hard disk (worn = used for more than a year).
> sudo /usr/sbin/smartctl -A /dev/sda smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.11.6-4-desktop] (SUSE RPM) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 104 100 050 Pre-fail Always - 0/8415644 5 Retired_Block_Count 0x0033 100 100 003 Pre-fail Always - 0 9 Power_On_Hours_and_Msec 0x0032 100 100 000 Old_age Always - 4757h+02m+17.130s 12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 1371 171 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0 172 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0 174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 52 177 Wear_Range_Delta 0x0000 000 000 000 Old_age Offline - 2 181 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0 182 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 194 Temperature_Celsius 0x0022 030 030 000 Old_age Always - 30 (Min/Max 30/30) 195 ECC_Uncorr_Error_Count 0x001c 104 100 000 Old_age Offline - 0/8415644 196 Reallocated_Event_Count 0x0033 100 100 000 Pre-fail Always - 0 231 SSD_Life_Left 0x0013 100 100 010 Pre-fail Always - 0 233 SandForce_Internal 0x0000 000 000 000 Old_age Offline - 3712 234 SandForce_Internal 0x0032 000 000 000 Old_age Always - 1152 241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 1152 242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 3072
-
Simon Gates over 10 yearsWith SMART attributes, lower values are worse because the drive always alerts if a value is lower than (or equal to? Not sure) the threshold value. That having been said, it's very nice to have a wear indicator, but I hope you're not trusting precious data to any one storage device. You should be running multiple storage devices in a RAID arrangement.
-
Tankman六四 over 10 yearsHow do you know my data is 'precious'? It is just an offline copy of company's knowledgabase to my laptop. I comment to make a point that people assume too often a sysop scenario. Thanks for you comments anyway.
-
Simon Gates over 10 yearsAll data is precious. :) We start on that principle, then move on to data that is more precious (a photographer's digital photos, for instance) and less precious (the OS — easy to replace, but downtime and a loss of time/revenue if you have to replace it).
-
bwDraco over 7 yearsBoth drives are well within endurance limits. The first drive has only about 350 GiB on it, while the second drive has 1.1 TiB on it. I'm not sure what's going on here...
-
Joachim Wagner about 3 years@bwDraco "has X GB on it" is misleading as many readers will think it is how much space is used. Data may have been written to the same LBA location multiple times. The values are not unusual for small SSDs, e.g. 60 GB, that were never more than half full.
-
-
Tankman六四 over 10 yearsSo clear, thank you. This knowledge is best made into a GUI tool utilizing smartctl or its API. Afterall calculating with a calculator by using computer as an input device and human sitting in front of it as a processor is against the spirit with which computers were invented!
-
Calculus Knight almost 7 yearsLink is dead by now.
-
John Eikenberry over 4 yearsI think they have the order for Wear_Leveling_Count backwards. I have 2 Samsung SSDs and the one that is ~4 years old has a RAW_VALUE of 42 and another one that is ~1 month old has a RAW_VALUE of 0. Seems to be that it starts at 0 and increments upward.