Verify TRIM support with BtrFS on SSD
Solution 1
So after many days working on this, I was able to demonstrate that BtrFS does use TRIM. I was unable to successfully have TRIM work on the server that we will be deploying these SSDs to. However, when testing using the same drive plugged into a laptop, the tests succeed.
Hardware used for all of this testing:
- Crucial m4 SSD 512GB
- HP DL160se G6
- LSI LSISAS9200-8e HBA
- generic SAS enclosure
- Dell XPS m1210 laptop
After many failed attempts at verifying BtrFS on the server, I decided to try this same test using an old laptop (remove the RAID card layer). The initial attempts of this test using both Ext4 and BtrFS on the laptop fail (data not TRIM'd).
I then upgraded the SSD drive firmware from version 0001 (as shipped out of the box) to version 0009. The tests were repeated with Ext4 and BtrFS and both filesystems successfully TRIM'd the data.
To ensure the TRIM command had time to run, I did a rm /mnt/testfile && sync && sleep 120
before performing validation.
One thing to note if you're attempting this same test: SSDs have erase blocks that they operate on (I don't know the size of the Crucial m4 erase blocks). When the file system sends the TRIM command to the drive, the drive will only erase a complete block; if the TRIM command is specified for a portion of a block, that block will not be TRIM'd due to the remaining valid data within the erase block.
So to demonstrate what I'm talking about (output of the sectors.pl
script above). This is with the test file on the SSD. Periods are sectors that only contain zeros. Pluses have one or more non-zero bytes.
Test file on drive:
24600 .......................................+++++++++++
24650 ++++++++++++++++++++++++++++++++++++++++++++++++++
24700 ++++++++++++++++++++++++++++++++++++++++++++++++++
-- cut --
34750 ++++++++++++++++++++++++++++++++++++++++++++++++++
34800 ++++++++++++++++++++++++++++++++++++++++++++++++++
34850 +++++++++++++++++++++++++++++.....................
Test file deleted from drive (after a sync && sleep 120
):
24600 .......................................+..........
24650 ..................................................
24700 ..................................................
-- cut --
34750 ..................................................
34800 ..................................................
34850 ......................+++++++.....................
It appears that the first and last sectors of the file are within a different erase blocks from the rest of the file. Therefore some sectors were left untouched.
A takeaway form this: some Ext4 TRIM testing instructions ask the user to only verify that the first sector was TRIM'd from the file. The tester should view a larger portion of the test file to really see if the TRIM was successful or not.
Now to figure out why manually issued TRIM commands sent to the SSD through the RAID card work but automatic TRIM commands to not...
Solution 2
Based on what I've read, there may be a flaw in your methodology.
You are assuming that TRIM will result in your SSD zeroing the blocks which have been deleted. However this is often not the case.
That is only if the SSD implements TRIM so that it zeroes the discarded blocks. You can check if the device at least knows enough to report discard_zeroes_data:
cat /sys/block/sda/queue/discard_zeroes_data
Also, even if the SSD does zero, it may take some time -- well after the discard has completed -- for the SSD to actually zero the blocks (this is true of some lesser quality SSDs).
http://www.redhat.com/archives/linux-lvm/2011-April/msg00048.html
BTW I was looking for a reliable way to verify TRIM and haven't found one yet. I'd love know to if anyone finds a way.
Solution 3
Here is testing methodology for 10.10 and EXT4. Maybe it'll help.
https://askubuntu.com/questions/18903/how-to-enable-trim
Oh and I think you do need the discard parameter on the fstab mount. Not sure if SSD param is needed as I think it should auto detect SSD.
Solution 4
Virtually all SSDs with a SATA interface run some sort of log structure filesystem that is completely hidden from you. The SATA 'trim' command tells the device that the block is no longer in use and that the underlying log structure filesystem can flash it /if/ the corresponding erase block (which might be substantially larger) /only/ contains blocks marked with trim.
I have not read the standard docs, which are here: http://t13.org/Documents/MinutesDefault.aspx?keyword=trim, but I'm not sure if there is any standard level guarantee that you'd be able to see the results of a trim command. If you can see something change, like the first few byte being zero'd out at the start of an erase block, I don't think there's any guarentee this is applicable across different devices or perhaps even firmware version.
If you think about the way the abstraction might be implemented, it should be possible to make the result of the trim command completely invisible to the one just reading/writing blocks. Furthermore it might be hard to tell which blocks are in the same erase block, since only the flash translation layer has to know that and might have reordered them logically.
Perhaps there is a SATA command (OEM command perhaps?) for fetching metadata related to the SSDs flash translation layer?
Solution 5
Some things to think about (to help answer your "am i missing something?" question):
what exactly is /dev/sda? a single SSD? or a (hardware?) RAID array of SSDs?
if the latter then what kind of RAID controller?
and does your raid controller support TRIM?
and, finally,
- does your testing method give you the results you expect if you format /dev/sda1 with something other than btrfs?
Related videos on Youtube
Comments
-
Shane Meyers over 1 year
We are looking into using BtrFS on an array of SSD disks and I have been asked to verify that BtrFS does in fact perform TRIM operations upon deleting a file. So far I have been unable to verify that the TRIM command is sent to the disks.
I know BtrFS is not considered production ready, but we like the bleeding edge, therefore I'm testing it. The server is Ubuntu 11.04 server 64-bit release (mkfs.btrfs version 0.19). I have installed the Linux 3.0.0 kernel as the BtrFS changelog states that bulk TRIM is not available in the kernel shipped with Ubuntu 11.04 (2.6.38).
Here's my testing methodology (initially adopted from http://andyduffell.com/techblog/?p=852, with modifications to work with BtrFS):
- Manually TRIM the disks before starting:
for i in {0..10} ; do let A="$i * 65536" ; hdparm --trim-sector-ranges $A:65535 --please-destroy-my-drive /dev/sda ; done
- Verify the drive was TRIM'd:
./sectors.pl |grep + | tee sectors-$(date +%s)
- Partition the drive:
fdisk /dev/sda
- Make the file system:
mkfs.btrfs /dev/sda1
- Mount:
sudo mount -t btrfs -o ssd /dev/sda1 /mnt
- Create a file:
dd if=/dev/urandom of=/mnt/testfile bs=1k count=50000 oflag=direct
- Verify the file is on the disk:
./sectors.pl | tee sectors-$(date +%s)
- Delete the test file:
rm /mnt/testfile
- See that the test file is TRIM'd from the disk:
./sectors.pl | tee sectors-$(date +%s)
- Verify the TRIM'd blocks:
diff
the two most recentsectors-*
files
At this point, the pre-delete and post delete verifications still show the same disk blocks in use. I should instead see a reduction in the number of in use blocks. Waiting an hour (in case it takes a while for the TRIM command to be issued) after the test file is deleted still shows the same blocks in use.
I have also tried mounting with the
-o ssd,discard
options, but that doesn't seem to help at all.Partition that was created from
fdisk
above (I keep the partition small so the verification can go faster):root@ubuntu:~# fdisk -l -u /dev/sda Disk /dev/sda: 512.1 GB, 512110190592 bytes 255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x6bb7542b Device Boot Start End Blocks Id System /dev/sda1 63 546209 273073+ 83 Linux
My
sectors.pl
script (I know this is inefficient, but it gets the job done):#!/usr/bin/perl -w use strict; my $device = '/dev/sda'; my $start = 0; my $limit = 655360; foreach ($start..$limit) { printf "\n%6d ", $_ if !($_ % 50); my @sector = `/sbin/hdparm --read-sector $_ $device`; my $status = '.'; foreach my $line (@sector) { chomp $line; next if $line eq ''; next if $line =~ /$device/; next if $line =~ /^reading sector/; if ($line !~ /0000 0000 0000 0000 0000 0000 0000 0000/) { $status = '+'; } } print $status; } print "\n";
Is my testing methodology flawed? Am I missing something here?
Thanks for the help.
-
Matt Simmons over 12 yearsI wholly support testing bleeding edge things, but just so you know, as of right now, btrfs doesn't have an fsck that actually, you know, fixes things: btrfs.wiki.kernel.org/index.php/Main_Page - so just watch out for that.
-
Shane Meyers over 12 years@Matt - Good point about the missing fsck. My understanding is that the first version of an fsck should ship within the next few weeks, so we should be covered by the time we move this to production. Additionally, we'll have multiple copies of our data, so if we loose one copy, we have at least two more copies to restore from. But I fully agree that this is not the file system for people with irreplaceable data for now.
-
zebediah49 over 12 yearsProbably won't change anything, but you might as well try running a
sync
after rmming the file. -
Shane Meyers over 12 yearsI want to say that I tried running a
sync
after removing the file and the results were still the same. I will double check that though when I'm back in the office after the weekend is over. -
cas over 12 yearsif you don't mind bleeding edge, have you considered zfsonlinux.org ? native (i.e. in kernel, not fuse) ZFS for linux. they're close to an official "release", and have RCs available (including a PPA for Ubuntu - easy enough to rebuild for debian too)
- Manually TRIM the disks before starting:
-
Shane Meyers over 12 yearsI have attempted to follow Ext4 SSD verification instructions, but they don't work due to differences in how BtrFS works compared to other file systems. Hence the workflow I came up with. I used the
ssd
mount option to ensure that BtrFS knew to use its SSD-specific code even though it should auto detect. I also tried usingdiscard
(as noted above) and it didn't help. -
Dave Veffer over 12 yearsOh well. Worth a shot :)
-
Shane Meyers over 12 yearsAs I mentioned above, I tried my testing with both the
discard
option and thessd
option. The BtrFS docs mention thessd
option a lot, so I focused my testing there, but neither option resulted in the outcome I expected. Most webpages that show how to test TRIM are for Ext4 and the like. BtrFS can not be tested using those methodologies due to difference in design of the file system. -
Paweł Brodacki over 12 years
hdparm --fibmap
is FS agnostic. A block at given LBA address is either zeroed out, or not, whether it's extN, btrfs, xfs, jfs...ssd
option is irrelevant for trim, see e.g. this discussion on btrfs mailing list: mail-archive.com/[email protected]/msg10932.html. -
Shane Meyers over 12 yearsI tried using
hdparm --fibmap
but it doesn't work on BtrFS. If you look at the wiper.sh README (distributed alongside hdparm), they explicitly state that "FIEMAP/FIBMAP ioctl() calls are completely unsafe when used on a btrfs filesystem." So hdparm is out, which is too bad as this would make testing go a lot easier. I didn't know that thessd
option had nothing to do with TRIM as the docs aren't very clear on the usefulness of the option. -
Paweł Brodacki over 12 yearsThank you for the extra information about ioctls, I didn't know it. I think the best place to ask for extra information could be btrfs mailing list. You'll get first-hand information from there.
-
Ronald Pottol over 12 yearsI thought all HW RAID ate trim commands, nice to see that things are slowly changing. On the other hand, with good modern drives, TRIM matters less and less.