Why is dd using direct slower writing to disk than to a file

linux filesystems hard-disk performance coreutils

13,776

Solution 1

This difference undoubtedly comes down to one thing: caching.

It will be really difficult to pin down where, especially from userland, but all Linux kernels buffer (cache) filesystem writes, unless you perform the tricks to get synchronous writes. That is, the kernel will save the data dd sends to a file somewhere in kernel memory. The kernel probably uses file system code to do this. Some time in the future, the kernel will schedule a disk block to go out to the disk. That will happen "asynchronously", sometime after the kernel tells dd that the write finished.

The reason for this is that moving bytes over a bus and into a disk drive, and then on to the disk platters is much slower than even copying from user to kernel memory. Ordinarily, programs don't care too much that the data they just "wrote" won't make it to the disk for a while. Hardware reliability is high enough that the data makes it to platter almost always.

That's the simple answer, but once you've got reads/writes/deletes all buffered up in the kernel, the file system code can take advantage of short file lifetimes by never writing out the data of files that get deleted before they make it to disk. The file system code can group writes to take advantage of disk blocks larger than a group of writes and consolidate them into one write. There's tons of optimizations that can be done in most file systems.

Solution 2

Disk cacheing of copy programs make it quicker then using dd I would assume.

If this is the only disk intensive app running, wait for the program to exit and then run:

$ sync; sync

This will flush the cache immediately. If it takes a while to return to the prompt you know it was hitting the cache.

I do this before pulling my usb drives and often takes quite a long time from copy finish to cache hitting the disk.

13,776

Author by

Jan Folfas

Updated on September 18, 2022

Comments

Jan Folfas over 1 year
I am trying to compare aggregate write rates when writing to a file in a GPFS file system, as compared to writing directly to a disk on a system with Red Hat Enterprise Linux Server release 6.4 (Santiago). For my application I need to measure the raw rate, i.e. without taking advantage of cache. I do not understand the impact of the direct option used with dd to bypass cache. When writing directly to a block device, I get a drastically lower rate when I use oflag=direct, as compared with writing to a file in the GPFS file system. Why does this happen?

To measure aggregate rates I create p processes running dd that writes concurrently to the block device or file. I then sum the p rates obtained to get the aggregate write rate.
```
    #!/bin/bash
    directdiskrate=~/scratch/rate5
    syncdiskrate=~/scratch/rate4
    filerate=~/scratch/rate3
    numruns=1
    numthreads=30

    #to disk use both conv=fsync and oflag=direct
    writetodiskdirect="dd if=/dev/zero of=/dev/sdac bs=256k count=4096 conv=fsync oflag=direct iflag=fullblock"
    for p in $(seq $numthreads)
    do
             #parses output of dd, rate is on last line, each field separated by ,s
            $writetodiskdirect 2>&1|tail -n 1|awk 'BEGIN { FS = "," } ; { print $3 }'|sed -e 's/MB\/s//g'>>$directdiskrate&
    done
    wait

    #to disk use only conv=fsync option
    writetodisksync="dd if=/dev/zero of=/dev/sdac bs=256k count=4096 conv=fsync iflag=fullblock"
    for p in $(seq $numthreads)
    do
       #parses output of dd, rate is on last line, each field separated by ,s
       $writetodisksync 2>&1|tail -n 1|awk 'BEGIN { FS = "," } ; { print $3 }'|sed -e 's/MB\/s//g'>>$syncdiskrate&
    done
    wait

    #to file use both conv=fsync and oflag=direct
    for p in $(seq $numthreads)
    do
        writetofile="dd if=/dev/zero of=/gpfs1/fileset6/file$p bs=256k count=4096 conv=fsync oflag=direct"
        #parses output of dd, rate is on last line, each field separated by ,s
        $writetofile 2>&1|tail -n 1|awk 'BEGIN { FS = "," } ; { print $3 }'|sed -e 's/MB\/s//g'>>$filerate&
    done
    wait
```
Results: The write rate of each of 30 processes is as follows:
1. Writing to disk using conv=fsync option, each process gets a write rate of ~180MB/s
2. Writing to disk using both conv=fsync and oflag=direct, each process gets a write rate of ~9MB/s
3. Writing to a file in GPFS file system, using both conv=fsync and oflag=direct, gets a write rate of ~80MB/s
- Admin almost 9 years
  
  Did you manage to find a combination of flags that causes writing to a file (through filesystem layer) to be consistently slower than writing directly to the block device? fsync seems to work on some filesystem types, but when using FUSE exfat it seems to still be caching.
Jan Folfas over 10 years

Does this mean that despite using the oflag=direct option with the dd write to file, the write still goes via the cache? But the oflag=direct truly takes effect when writing directly to a block device?
Admin over 10 years

@user3216949 - I think that's true, but I don't have direct knowledge. The block device is at the end of the user -> kernel -> file system -> disk drive layers. As I understand it, using the block device eliminates the file system layer(s) in the stack.
felix over 10 years

@Bruce is there a way(like a tool) to verify that the writes are going through cache or not?
phemmer over 10 years

@user3216949 If you want to make sure the data is actually written out to disk, use one of: conv=fdatasync conv=fsync oflag=dsync oflag=sync.
Jan Folfas over 10 years

Thanks, I realized using your suggestion, that even with using fsync alone data ends up written to disk, but perhaps more efficiently as one bulk write.