Poor IO performance - PCIe NVMe Samsung 950 pro

37,050

Solution 1

Thank you for your question, it has been incredibly helpful for me.

I have a very similar experience, different hardware setup (I am using an Intel NVMe SSD). But I am also running Ubuntu 16.04. Given your evidence and a similar result found in this article I was convinced that the issue was with how Ubuntu was setting up the NVMe drives.

I was determined to solve the issue without giving up completely on Ubuntu. But no matter what I did, I was not able to get speeds above 2000 MB/sec when testing with hdparm exactly as you described.

So, I did some digging, and found a guide provided by Intel. I tried everything they suggested in this guide and found that one part was different. Near the bottom it discusses aligning the drive partitions correctly. This is the one part that didn't match up with my installation. My starting block was not divisible by 4096 bytes. It was using a 512 byte sector size instead of a 4k sector size.

Sure enough, I formatted the disk to start the partition at a value divisible by 4096 and FINALLY I was able to break speeds of 2000 MB/s.

Right now it is averaging 2.3 GB/s when I expect it to be a bit higher. I blame this on the fact that when I run sudo fdisk -lthe NVMe drive is still shown with a physical sector size of 512 bytes. I plan to continue investigating but I hope this helps you!

Solution 2

Caution: This answer is old. As of Linux 4.19 blk_mq is the default scheduler. It is most likely that the problem for your PCIe NVMe SSD running slow stems form elsewhere.

Original answer:

Please add

scsi_mod.use_blk_mq=1

to your kernel boot parameters, otherwise I don't think you will see the benefit of NVMe's increased command queue and command per queue.

Note: I know it's for arch but you might also want to take a look at the Wiki for more info about tuning I/O.

Solution 3

This thread is one year old (October 2016). One of the highest upvoted answers recommends an Intel NVMe driver that is two years old (2015).

In February 2017 though Samsung released a Firmware Update that uses a Linux based boot ISO installer. On the same link there are drivers you can install for Windows 7/8/10. I'll be installing both soon on my new Samsung 960 Pro and brand new Dell based i7-6700 laptop. Along with flashing BIOS and updating other Dell based drivers.

I think it's important to revisit these old threads and provide new users with current (as of October 11, 2017 anyways) links so they have all options open.

There are many google searches returned for slow performance of Samsung 960 Pro under Linux being half the speed of Windows so I encourage everyone to search out as many options as possible.


After implementing scsi_mod.use_blk_mq=1 kernel parameter:

$ systemd-analyze
Startup finished in 7.052s (firmware) + 6.644s (loader) + 2.427s (kernel) + 8.440s (userspace) = 24.565s

Removing the kernel parameter and rebooting:

$ systemd-analyze
Startup finished in 7.060s (firmware) + 6.045s (loader) + 2.712s (kernel) + 8.168s (userspace) = 23.986s

So it would appear now that scsi_mod.use_blk_mq=1 makes system slower not faster. At one time it may have been beneficial though.

Solution 4

Here's some interesting information: on Windows, the drive doesn't perform according to review benchmarks until cache flushing is disabled. Usually this isn't done directly; instead, the vendor's driver (in this case, Samsung NVMe driver) is installed.

If you benchmark with the vendor's driver, and then disable cache flushing in Windows, you get the same numbers. This would unlikely be the case if the vendor wasn't ignoring cache flushing.

Translated to Linux-land, that means that on Windows, to get the big benchmark numbers you see in all the reviews, you need to disable fsync, with all that means for reliability (no fsync, or specifically, no write barrier, means that power loss at the wrong time could break the whole FS, depending on implementation - reordered writes create "impossible" situations).

Samsung's "data center" SSDs come with capacitors to ensure cached data is flushed correctly. This is not the case with their consumer drives.

I've just worked this out from first principles, having added a 1TB NVMe to my new build yesterday. I'm not particularly happy, and I've initiated contact with Samsung support to see what they say - but I doubt I'll hear back.

Share:
37,050

Related videos on Youtube

kross
Author by

kross

@rosskevin Entrepreneur, adviser, seed investor. Kind-of retired, but I have my doubts.

Updated on September 18, 2022

Comments

  • kross
    kross over 1 year

    I just finished a hardware build expecting a big gain from the new NVMe drive. My prior performance was lower than expected (~3gb transferred), so I've replaced the motherboard/cpu/memory/hdd. While performance is double what it was, it is still half what I get on my 3 year old macbook pro with a SATA6 drive.

    • CPU: i7-5820k 6core
    • Mobo: MSI X99A MPOWER
    • Memory: 32GB
    • Drive: Samsung 950 pro NVMe PCIe

    Ubuntu (also confirmed with 16.04.1 LTS):

    Release:    15.10
    Codename:   wily
    
    4.2.0-16-generic
    
    $ sudo blkid
    [sudo] password for kross: 
    /dev/nvme0n1p4: UUID="2997749f-1895-4581-abd3-6ccac79d4575" TYPE="swap"
    /dev/nvme0n1p1: LABEL="SYSTEM" UUID="C221-7CA5" TYPE="vfat"
    /dev/nvme0n1p3: UUID="c7dc0813-3d18-421c-9c91-25ce21892b9d" TYPE="ext4"
    

    Here are my test results:

    sysbench --test=fileio --file-total-size=128G prepare
    sysbench --test=fileio --file-total-size=128G --file-test-mode=rndrw --max-time=300 --max-requests=0 run
    sysbench --test=fileio --file-total-size=128G cleanup
    
    
    Operations performed:  228000 Read, 152000 Write, 486274 Other = 866274 Total
    Read 3.479Gb  Written 2.3193Gb  Total transferred 5.7983Gb  (19.791Mb/sec)
     1266.65 Requests/sec executed
    
    Test execution summary:
        total time:                          300.0037s
        total number of events:              380000
        total time taken by event execution: 23.6549
        per-request statistics:
             min:                                  0.01ms
             avg:                                  0.06ms
             max:                                  4.29ms
             approx.  95 percentile:               0.13ms
    
    Threads fairness:
        events (avg/stddev):           380000.0000/0.00
        execution time (avg/stddev):   23.6549/0.00
    

    The scheduler is set to none:

    # cat /sys/block/nvme0n1/queue/scheduler
    none
    

    Here is the lspci information:

    # lspci -vv -s 02:00.0
    02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd Device a802 (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Samsung Electronics Co Ltd Device a801
        Physical Slot: 2-1
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 45
        Region 0: Memory at fb610000 (64-bit, non-prefetchable) [size=16K]
        Region 2: I/O ports at e000 [size=256]
        Expansion ROM at fb600000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
            Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
            Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
            Address: 0000000000000000  Data: 0000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
            DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset+
            DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ FLReset-
                MaxPayload 128 bytes, MaxReadReq 512 bytes
            DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr+ TransPend-
            LnkCap: Port #0, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L0s <4us, L1 <64us
                ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
            LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
            LnkSta: Speed 8GT/s, Width x4, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
            DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR+, OBFF Not Supported
            DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
            LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                 Compliance De-emphasis: -6dB
            LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                 EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [b0] MSI-X: Enable+ Count=9 Masked-
            Vector table: BAR=0 offset=00003000
            PBA: BAR=0 offset=00002000
        Capabilities: [100 v2] Advanced Error Reporting
            UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
            UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
            UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
            CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
            CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
            AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
        Capabilities: [148 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Capabilities: [158 v1] Power Budgeting <?>
        Capabilities: [168 v1] #19
        Capabilities: [188 v1] Latency Tolerance Reporting
            Max snoop latency: 0ns
            Max no snoop latency: 0ns
        Capabilities: [190 v1] L1 PM Substates
            L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
                  PortCommonModeRestoreTime=10us PortTPowerOnTime=10us
        Kernel driver in use: nvme
    

    hdparm:

    $ sudo hdparm -tT --direct /dev/nvme0n1
    
    /dev/nvme0n1:
     Timing O_DIRECT cached reads:   2328 MB in  2.00 seconds = 1163.98 MB/sec
     Timing O_DIRECT disk reads: 5250 MB in  3.00 seconds = 1749.28 MB/sec
    

    hdparm -v:

     sudo hdparm -v /dev/nvme0n1
    
    /dev/nvme0n1:
    SG_IO: questionable sense data, results may be incorrect
     multcount     =  0 (off)
     readonly      =  0 (off)
     readahead     = 256 (on)
     geometry      = 488386/64/32, sectors = 1000215216, start = 0
    

    fstab

    UUID=453cf71b-38ca-49a7-90ba-1aaa858f4806 /               ext4    noatime,nodiratime,errors=remount-ro 0       1
    # /boot/efi was on /dev/sda1 during installation
    #UUID=C221-7CA5  /boot/efi       vfat    defaults        0       1
    # swap was on /dev/sda4 during installation
    UUID=8f716653-e696-44b1-8510-28a1c53f0e8d none            swap    sw              0       0
    UUID=C221-7CA5  /boot/efi       vfat    defaults        0       1
    

    fio

    This has some comparable benchmarks it is way off. When I tested with fio and disabled sync, it is a different story:

    sync=1
    1 job  - write: io=145712KB, bw=2428.5KB/s, iops=607, runt= 60002msec
    7 jobs - write: io=245888KB, bw=4097.9KB/s, iops=1024, runt= 60005msec
    
    sync=0
    1 job  - write: io=8157.9MB, bw=139225KB/s, iops=34806, runt= 60001msec
    7 jobs - write: io=32668MB, bw=557496KB/s, iops=139373, runt= 60004msec
    

    Here's the full sync results for one job and 7 jobs:

    $ sudo fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4k --numjobs=1 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test
    journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
    fio-2.1.11
    Starting 1 process
    Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/2368KB/0KB /s] [0/592/0 iops] [eta 00m:00s]
    journal-test: (groupid=0, jobs=1): err= 0: pid=18009: Wed Nov 18 18:14:03 2015
      write: io=145712KB, bw=2428.5KB/s, iops=607, runt= 60002msec
        clat (usec): min=1442, max=12836, avg=1643.09, stdev=546.22
         lat (usec): min=1442, max=12836, avg=1643.67, stdev=546.23
        clat percentiles (usec):
         |  1.00th=[ 1480],  5.00th=[ 1496], 10.00th=[ 1512], 20.00th=[ 1528],
         | 30.00th=[ 1576], 40.00th=[ 1592], 50.00th=[ 1608], 60.00th=[ 1608],
         | 70.00th=[ 1608], 80.00th=[ 1624], 90.00th=[ 1640], 95.00th=[ 1672],
         | 99.00th=[ 2192], 99.50th=[ 6944], 99.90th=[ 7328], 99.95th=[ 7328],
         | 99.99th=[ 7520]
        bw (KB  /s): min= 2272, max= 2528, per=100.00%, avg=2430.76, stdev=61.45
        lat (msec) : 2=98.44%, 4=0.58%, 10=0.98%, 20=0.01%
      cpu          : usr=0.39%, sys=3.11%, ctx=109285, majf=0, minf=8
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued    : total=r=0/w=36428/d=0, short=r=0/w=0/d=0
         latency   : target=0, window=0, percentile=100.00%, depth=1
    
    Run status group 0 (all jobs):
      WRITE: io=145712KB, aggrb=2428KB/s, minb=2428KB/s, maxb=2428KB/s, mint=60002msec, maxt=60002msec
    
    Disk stats (read/write):
      nvme0n1: ios=69/72775, merge=0/0, ticks=0/57772, in_queue=57744, util=96.25%
    
    $ sudo fio --filename=/dev/nvme0n1 --direct=1 --sync=1 --rw=write --bs=4k --numjobs=7 --iodepth=1 --runtime=60 --time_based --group_reporting --name=journal-test
    journal-test: (g=0): rw=write, bs=4K-4K/4K-4K/4K-4K, ioengine=sync, iodepth=1
    ...
    fio-2.1.11
    Starting 7 processes
    Jobs: 6 (f=6): [W(2),_(1),W(4)] [50.4% done] [0KB/4164KB/0KB /s] [0/1041/0 iops] [eta 01m:00s]
    journal-test: (groupid=0, jobs=7): err= 0: pid=18025: Wed Nov 18 18:15:10 2015
      write: io=245888KB, bw=4097.9KB/s, iops=1024, runt= 60005msec
        clat (usec): min=0, max=107499, avg=6828.48, stdev=3056.21
         lat (usec): min=0, max=107499, avg=6829.10, stdev=3056.16
        clat percentiles (usec):
         |  1.00th=[    0],  5.00th=[ 2992], 10.00th=[ 4512], 20.00th=[ 4704],
         | 30.00th=[ 5088], 40.00th=[ 6176], 50.00th=[ 6304], 60.00th=[ 7520],
         | 70.00th=[ 7776], 80.00th=[ 9024], 90.00th=[10048], 95.00th=[12480],
         | 99.00th=[15936], 99.50th=[18048], 99.90th=[22400], 99.95th=[23936],
         | 99.99th=[27008]
        bw (KB  /s): min=  495, max=  675, per=14.29%, avg=585.60, stdev=28.07
        lat (usec) : 2=4.41%
        lat (msec) : 2=0.57%, 4=4.54%, 10=80.32%, 20=9.92%, 50=0.24%
        lat (msec) : 250=0.01%
      cpu          : usr=0.14%, sys=0.72%, ctx=173735, majf=0, minf=63
      IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
         submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
         issued    : total=r=0/w=61472/d=0, short=r=0/w=0/d=0
         latency   : target=0, window=0, percentile=100.00%, depth=1
    
    Run status group 0 (all jobs):
      WRITE: io=245888KB, aggrb=4097KB/s, minb=4097KB/s, maxb=4097KB/s, mint=60005msec, maxt=60005msec
    
    Disk stats (read/write):
      nvme0n1: ios=21/122801, merge=0/0, ticks=0/414660, in_queue=414736, util=99.90%
    

    Alignment

    I have checked the alignment with parted, as well as did the math based on http://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/ssd-partition-alignment-tech-brief.pdf

    kross@camacho:~$ sudo parted
    GNU Parted 3.2
    Using /dev/nvme0n1
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) unit s                                                           
    (parted) print all                                                        
    Model: Unknown (unknown)
    Disk /dev/nvme0n1: 1000215216s
    Sector size (logical/physical): 512B/512B
    Partition Table: gpt
    Disk Flags: 
    
    Number  Start       End          Size        File system     Name                                 Flags
     1      2048s       206847s      204800s     fat32           EFI system partition                 boot, esp
     2      206848s     486957055s   486750208s  ntfs                                                 msftdata
     3      486957056s  487878655s   921600s     ntfs                                                 hidden, diag
     4      590608384s  966787071s   376178688s  ext4
     5      966787072s  1000214527s  33427456s   linux-swap(v1)
    
    kross@camacho:~$ sudo parted /dev/nvme0n1
    GNU Parted 3.2
    Using /dev/nvme0n1
    Welcome to GNU Parted! Type 'help' to view a list of commands.
    (parted) align-check opt 1                                                
    1 aligned
    (parted) align-check opt 2
    2 aligned
    (parted) align-check opt 3
    3 aligned
    (parted) align-check opt 4
    4 aligned
    (parted) align-check opt 5
    5 aligned
    

    TLDR;

    I feel like I have something fundamentally set incorrectly, though my research hasn't turned up anything. I'm expecting throughput ~4x my 3yr old macbook pro w/SATA6, and I'm getting 1/2 of it with NVMe. I added noatime,nodiratime which gave me a very small improvement, but nothing like the 4x I'm expecting. I have re-partitioned/re-installed fresh 15.10 server just to be sure I didn't have anything lingering, and had the same results.

    Are my fio results above of sync/no sync indicative of a problem?

    So I have a clean slate and can try anything. What can I try to get my performance up to par? Any references are welcome.

    • Fabby
      Fabby over 8 years
      What's the output of smartctl --scan and then a smartctl --all /dev/xxx where xxx is whatever came up in the first command???
    • kross
      kross over 8 years
      @fabby apt-get install smartmontools fails with grub-probe: error: cannot find a GRUB drive for /dev/nvme0n1p3. Check your device.map.. It appears (based on my endeavors) that update-grub doesn't work well due to a grub-probe error. smartctl -i /dev/nvme0n1 returns /dev/nvme0n1: Unable to detect device type. Please specify device type with the -d option. NVMe does not appear in the smartctl -h as a device type.
    • Fabby
      Fabby over 8 years
      what's the output of uname --kernel-release&&lsb_release --code --short???
    • kross
      kross over 8 years
      4.2.0-16-generic wily
    • wawa
      wawa over 8 years
      I could be completely wrong and I can't find the source currently, but as I have it in mind, you need a Skylake processor to run those SSD's at full speed...
    • Fabby
      Fabby over 8 years
      OK, could you update your answer with the above and the output to sudo blkid? (I'll be responding 2morrow as I'm keeling over from lack of sleep right now-)
    • kross
      kross over 8 years
      answer updated - no device.map, I had a real struggle with that documented in this issue: askubuntu.com/questions/697446/…
    • kross
      kross over 8 years
      @wawa I researched and picked this X99 board and this Haswell-e i7 processor to be sure I could run it at full speed. Please correct me though as I didn't verify it definitively!
    • wawa
      wawa over 8 years
      @kross I unfortunately just found an German article: computerbild.de/artikel/… At Neue Hauptplatinen which translates to new motherboards they write something like: you need a new motherboard too, which brings the advantage to use SSDs sized m.2. at full speed, since there will be additional PCI-Express-3.0-Connections available, with earlier once the m.2-SSD had to use the clasical PCI-Express-Connection (to the southbridge not to the northbridge). Now I'm not sure if it's mainboard or CPU related, but could that be it?
    • wawa
      wawa over 8 years
      @kross Hmm I think I was mistaking, it ain't about the CPU but about the motherboard. From en.wikipedia.org/wiki/M.2 M.2 sockets keyed to support SATA or two PCI Express lanes (PCIe ×2) are referred to as "socket 2 configuration" or "socket 2", while the sockets keyed for four PCI Express lanes (PCIe ×4) are referred to as "socket 3 configuration" So I'm guessing, your motherboard is only supporting socket 2, which isn't as performing as socket 3. Is that possible?
    • wawa
      wawa over 8 years
      A further reading could be this article: pcworld.com/article/2977024/storage/…
    • kross
      kross over 8 years
      @wawa I linked the brand new motherboard, it has Turbo M.2: delivering next generation M.2 Gen3 x4 performance with transfer speeds up to 32 Gb/s so I think I'm set due to the X4 pci lanes.
    • wawa
      wawa over 8 years
      strange. Then I don't see any reason why it shouldn't work.
    • Jan-Marten Spit
      Jan-Marten Spit over 8 years
      what does iostat -xm (averages since boot) report for your nvme disk? did you check /sys/block/device/queue/max_hw_sectors_kb and /sys/block/device/queue/max_sectors_kb? not a misaligned partition? during the tests, what does /proc/pid/wchan give as kernel-wait-channel most often when the pid status is in D state?
    • zloster
      zloster over 8 years
      Servethehome.com seems to have similar problem. Checkout this and their forum post. They do not have solution yet.
    • rm-vanda
      rm-vanda about 8 years
      Damn, I just got my Samsung 950, and I have the same problem on an x99s - - -
    • mplappert
      mplappert about 8 years
      I'm also facing the same problem. Have you ever found a viable solution?
    • kross
      kross over 7 years
      I have no solution yet, and have confirmed the same with 16.04.1 LTS
    • WinEunuuchs2Unix
      WinEunuuchs2Unix over 5 years
      I'm curious if you've worked out these speed issues yet?
    • kross
      kross over 5 years
      No, I abandoned Linux on this hardware and just use it for PC gaming now.
    • ikwyl6
      ikwyl6 over 3 years
      Has anyone had any luck with a newer ubuntu install to see if these results change?
  • kross
    kross over 8 years
    I'm using it in an identical way to the mbpro, and it is 1/2 the performance, which is the thing that doesn't make sense.
  • kross
    kross over 8 years
    I just added a fio test with 1 and 7 threads, and a reference to a bunch of benchmarks using it as a basis.
  • kross
    kross over 8 years
    Thanks but I already referenced that article above under the fio heading and you can see from the benchmarks there that my SSD is underperforming Intel 750 NVMe 400GB 261 MB/s (1 job) 884 MB/s (5 jobs) by a large margin with sync, and even underperforming against the previous generation Samsung XP941 256GB 2.5 MB/s (1 job) 5 MB/s (7 jobs). So while it may be well known, it is still less than it should be.
  • kross
    kross over 7 years
    Thank you for adding this, I tried it on Ubuntu 16.04.1 LTS and saw no difference. I was quite hopeful, but unfortunately this didn't change anything.
  • kross
    kross over 7 years
    Thanks, I will check my alignment again. I know I investigated this at one point, but it is definitely worth taking a fresh look with this information.
  • kross
    kross over 7 years
    I updated the question with my alignment. parted says it is aligned, based on the 512 block size, but it isn't divisible by 4096. So I just want to confirm: your sector size remains at 512 and the only thing you did is start the partition at a location divisible by 4096, correct?
  • kross
    kross over 7 years
    Good explanation: blog.kihltech.com/2014/02/…
  • kross
    kross over 7 years
    Ugh, now what to do with my existing disk...try and resize/move, or dd, hmmm, not sure. Indeed this seems to be the root cause though.
  • kross
    kross over 7 years
  • kross
    kross over 7 years
    Based on ^^^, my ext4 partition is aligned: 590608384s * 512 / 4096 == (whole number)
  • cwoodwar6
    cwoodwar6 over 7 years
    Correct, my sector size is still showing as 512 and I started the partition at a number divisible by 4096. What I am not understanding is why does parted still show 512? You mention root cause, did you see an increase in performance after making any tweaks?
  • kross
    kross over 7 years
    I haven't made any tweaks. My early comments were based on math calculated on incorrect units from parted/fdisk. The Alignment section is updated above, and the math based on the intel pdf confirms that my partitions are aligned (as well as the fact that parted states they are aligned). So, I've not tried to move the partitions because...they already seem to be aligned and I don't want to randomly move them. Do you see misalignment based on the data I added?
  • cwoodwar6
    cwoodwar6 over 7 years
    Sorry, wasn't following your comments. Your math checks out, the partition is aligned. This just leaves me even more confused, I expected the sector size to be 4096.
  • wordsforthewise
    wordsforthewise over 6 years
    Same for me, no noticeable difference in performance from hdparm benchmarks.
  • Csaba Toth
    Csaba Toth about 6 years
    Did they say anything?
  • WinEunuuchs2Unix
    WinEunuuchs2Unix over 5 years
    Same for me. I've updated my answer below showing a 1 second decrease in boot speed.
  • Anon
    Anon over 5 years
    Just an FYI: at one point enabling SCSI multiqueue did indeed slow down certain devices but various issues have been fixed. From the v4.19 kernel onwards Linux enables scsi-mq by default. Note: it is unclear to me whether this option would impact NVMe drives (as opposed to SCSI/SATA drives).