Fastest Linux Filesystem on Shingled Disks

7,866

Solution 1

Intuitively Copy-on-Write and Log structured filesystems might give better performance on shingled disks by reducing reduce random writes. The benchmarks somewhat support this, however, these differences in performance are not specific to shingled disks. They also occur on an unshingled disk used as a control. Thus the switching to a shingled disk might not have much relevance to your choice of filesystem.

The nilfs2 filesystem gave quite good performance on SMR disk. However, this was because I allocated the whole 8TB partition, and the benchmark only wrote ~0.5TB so the nilfs cleaner did not have to run. When I limited the partition to 200GB the nilfs benchmarks did not even complete successfully. Nilfs2 may be a good choice performance-wise if you really use the "archive" disk as an archive disk where you keep all the data and snapshots written to the disk forever, as then then nilfs cleaner does not have to run.


I understand that the 8TB seagate ST8000AS0002-1NA17Z drive I used for the test has a ~20GB cache area. I made changed the default filebench fileserver settings so that the benchmarks set would be ~125GB, larger than the unshingled cache area:

set $meanfilesize=1310720
set $nfiles=100000
run 36000

Now for the actual data. The number of ops measures the "overall" fileserver performance while the ms/op measures the latency of the random append, and could be used as a rough guide to the performance of random writes.

$ grep rand *0.out | sed s/.0.out:/\ / |sed 's/ - /-/g' |  column -t
SMR8TB.nilfs   appendfilerand1   292176ops 8ops/s   0.1mb/s   1575.7ms/op    95884us/op-cpu [0ms - 7169ms]
SMR.btrfs      appendfilerand1  214418ops  6ops/s   0.0mb/s  1780.7ms/op  47361us/op-cpu  [0ms-20242ms]
SMR.ext4       appendfilerand1  172668ops  5ops/s   0.0mb/s  1328.6ms/op  25836us/op-cpu  [0ms-31373ms]
SMR.xfs        appendfilerand1  149254ops  4ops/s   0.0mb/s  669.9ms/op   19367us/op-cpu  [0ms-19994ms]
Toshiba.btrfs  appendfilerand1  634755ops  18ops/s  0.1mb/s  652.5ms/op   62758us/op-cpu  [0ms-5219ms]
Toshiba.ext4   appendfilerand1  466044ops  13ops/s  0.1mb/s  270.6ms/op   23689us/op-cpu  [0ms-4239ms]
Toshiba.xfs    appendfilerand1  368670ops  10ops/s  0.1mb/s  195.6ms/op   19084us/op-cpu  [0ms-2994ms]

Since the Seagate is 5980RPM one might naively expect the Toshiba to be 20% faster. These benchmarks show it as being roughly 3 times (200%) faster, so these benchmarks are hitting the shingled performance penalty. We see Shingled (SMR) disk still can't match the performance ext4 with on a unshingled (PMR) disk. The best performance was with nilfs2 with a 8TB partition (so the cleaner didn't need to run), but even then it was significantly slower than the Toshiba with ext4.

To make the benchmarks above more clear, it might might help to normalise them relative to the performance of ext4 on each disk:

                ops     randappend
SMR.btrfs:      1.24    0.74
SMR.ext4:       1       1
SMR.xfs:        0.86    1.98
Toshiba.btrfs:  1.36    0.41
Toshiba.ext4:   1       1
Toshiba.xfs:    0.79    1.38

We see that on the SMR disk btrfs has most of the advantage on overall ops that it has on ext4, but penalty on random appends is not as dramatic as a ratio. This might lead one to move to btrfs on the SMR disk. On the other hand, if you need low latency random appends, this benchmark suggests you want xfs, especially on SMR. We see that while SMR/PMR might influence your choice of filesystem, considering the workload your are optimising for seems more important.

I also ran an attic based benchmark. The durations of the attic runs (on the 8TB SMR full disk partitions) were:

ext4:  1 days 1 hours 19 minutes 54.69 seconds
btrfs: 1 days 40 minutes 8.93 seconds
nilfs: 22 hours 12 minutes 26.89 seconds

In each case the attic repositories had the following stats:

                       Original size      Compressed size    Deduplicated size
This archive:                1.00 TB            639.69 GB            515.84 GB
All archives:              901.92 GB            639.69 GB            515.84 GB

Adding a second copy of the same 1 TB disk to attic took 4.5 hours on each of these three filesystems. A raw dump of the benchmarks and smartctl information is at: http://pastebin.com/tYK2Uj76 https://github.com/gmatht/joshell/tree/master/benchmarks/SMR

Solution 2

If you rsync from an SMR drive, make sure that filesystem is mounted read-only or with noatime option.

Otherwise the SMR drive will need to write a timestamp for each file rsync reads, resulting in a significant performance degradation (from around 80 mb/s down to 3-5 mb/s here) and head wear / clicking noise.

If you already have an rsync job running with poor performance, there is no need to stop it, you can remount the source filesystem doing

sudo mount -o remount,ro  /path/to/source/fs

The effect will not be seen immediately, be patient and wait 10 to 20 minutes, until the drive has finished to write out all the data still in its buffers. This advise is tried and tested ok.


This might also apply when rsyncing to an SMR drive, i.e. if the filesystem tries to update the timestamp after the file has been fully written to disk. This jitters sequential workload and huge bands of data are continuously rewritten, contributing to drive wear. The following may help:

sudo mount -t fs_type -o rw,noatime device /path/to/dest/fs

This has to be done, before rsync is run; other factors may render this option insignificant, i.e. unbuffered FAT/MFT updating, parallelized writes if the filesystem is optimized primarily for SSDs, etc.


Try to use dd bs=32M and then resize the filesystem on the SMR target, if you want to backup full filesystems anyway (no need to have it mounted and run rsync to transport each and every file in this case).


Actual hardware in use was a Seagate drive managed SMR 8tb consumer drive. Your mileage may vary with other hardware.

Share:
7,866

Related videos on Youtube

gmatht
Author by

gmatht

Updated on September 18, 2022

Comments

  • gmatht
    gmatht over 1 year

    There is considerable interest in shingled drives. These put data tracks so close together that you can't write to one track without clobbering the next. This may increase capacity by 20% or so, but results in write amplification problems. There is work underway on filesystems optimised for Shingled drives, for example see: https://lwn.net/Articles/591782/

    Some shingled disks such as the Seagate 8TB archive have a cache area for random writes, allowing decent performance on generic filesystems. The disk can even be quite fast on some common workloads, up to round 200MB/sec writes. However, it is to be expected that if the random write cache overflows, the performance may suffer. Presumably, some filesystems are better at avoiding random writes in general, or patterns of random writes likely to overflow the write cache found in such drives.

    Is a mainstream filesystem in the linux kernel better at avoiding the performance penalty of shingled disks than ext4?

    • R J
      R J over 8 years
      There are 2 types of shingled disks in the market right now. Those that need a supported OS like the HGST 10TB disks vs those that do not need specific OS support like the Seagate 8TB Archive. Which are you referring to?
    • gmatht
      gmatht over 8 years
      Given that I am limiting the FS to mainstream ones, it would probably have to be a Seagate style?
    • qasdfdsaq
      qasdfdsaq over 8 years
      SMR as implemented in current drives does not result in "write amplification problems like SSDs". They only operate in very few ways vaguely like SSDs.
    • gmatht
      gmatht about 5 years
      @qasdfdsaq I meant "as with SSDs".
  • R J
    R J over 8 years
    Are you sure these differences are specific to SMR vs PMR?
  • gmatht
    gmatht over 8 years
    Not really. I will add more benchmarks as I do them to answer such questions, but someone with more benchmark experience could probably do a better job than me. Hopefully this is enough to give a rough idea whether it might be worth considering switching from ext4 on a SMR disk.
  • qasdfdsaq
    qasdfdsaq over 8 years
    Shingled disks do not use copy on write. They use read-modify-write just like partial writes to RAID-5 arrays. Random writes do not slow down SMR disks, in fact it speeds them up. 6000RPM SMR drives are 10x faster at random writes than 15000 RPM non-SMR drives as long as it fits in cache, which is actually 30GB.
  • gmatht
    gmatht over 8 years
    @qasdfdsaq Thanks, I removed reference to CoW. I understand that at the level of the platter shingled drives are much slower for random writes than PMR, but that the SMR can emulate faster writes due to the cache; a PMR drive + cache would presumably be faster again. Do you have a reference for the 30GB figure? There doesn't seem to be an official number, e.g. on the Seagate technical specifications. Also, optimizing for shingled drives might be a similar problem to optimising RAID 5 arrays?
  • qasdfdsaq
    qasdfdsaq over 8 years
    There's no official documentation, 30GB is what was deduced by researchers studying the behaviour of SMR disks. There's still little consensus for how one "optimizes" for SMR disks, and while it is similar in principle to partial-stripe writes on RAID-5 arrays, SMR disks do not expose the underlying block size which may vary, on the order of 50MB+.
  • gmatht
    gmatht over 8 years
    BTW, I am doing more thorough benchmarks. Should be ready in a month.
  • gmatht
    gmatht over 8 years
    @b70568b5 I've added more benchmarks. It seems the that differences are indeed primarily not due to SMR. Thanks.
  • Moh.Maher
    Moh.Maher almost 8 years
    I was doing some random search on the topic and came across a blog post on f2fs: blog.schmorp.de/2015-10-08-smr-archive-drives-fast-now.html
  • frauke
    frauke over 7 years
    Can you share more details about the nilfs2 cleaner failure you encountered? I'm having good luck with 4.7.9-era nilfs2 on the Seagate 8TB SMR disk so far, but would like to know what to watch out for.
  • Giacomo1968
    Giacomo1968 over 6 years
    This is a good answer, but not to this question since it has utterly nothing to do with what the original poster has posted. I would encourage you to create a self-answered question for this answer. Such as “I am attempting to Rsync from a shingled drive and performance is bad. What can I do to improve it?”