What is fastest way to copy a sparse file? What method results in the smallest file?

16,875

From the above benchmarking, it looks like using dd on our target hardware with a blocksize of 64K gives us the best overall result considering the copy time and bloat:

dd if=srcFile of=dstFile iflag=direct oflag=direct bs=64K conv=sparse
Share:
16,875

Related videos on Youtube

Steve Amerige
Author by

Steve Amerige

Updated on September 18, 2022

Comments

  • Steve Amerige
    Steve Amerige almost 2 years

    BACKGROUND: I'm copying a sparse qcow2 VM image that is 200GB in size, but has 16GB of allocated blocks. I've tried various methods to copy this sparse file within the same server and have some preliminary results. Environment is RHEL 6.6 or CentOS 6.6 x64.

    ls -lhs srcFile 
    16G -rw-r--r-- 1 qemu qemu 201G Feb  4 11:50 srcFile
    

    Via cp - best speed

    cp --sparse=always srcFile dstFile
    Performance Notes:
        Copied 200GB max/16GB actual VM as 200GB max/26GB actual, bloat: 10GB
        Copy time: 1:02 (mm:ss) 
    

    Via dd - best overall performer

    dd if=srcFile of=dstFile iflag=direct oflag=direct bs=4M conv=sparse
    Performance Notes:
        Copied 200GB max/16GB actual VM as 200GB max/21GB actual, bloat: 5GB
        Copy time: 2:02 (mm:ss)
    

    Via cpio

    mkdir tmp$$
    echo srcFile | cpio -p --sparse tmp$$; mv tmp$$/srcFile dstFile
    rmdir tmp$$
    Performance Notes:
        Copied 200GB max/16GB actual VM as 200GB max/26GB actual, bloat: 10GB
        Copy time: 9:26 (mm:ss)
    

    Via rsync

    rsync --ignore-existing -aS srcFile dstFile
    Performance Notes:
        Copied 200GB max/16GB actual VM as 200GB max/26GB actual, bloat: 10GB
        Copy time: 24:49 (mm:ss)
    

    Via virt-sparsify - best size

    virt-sparsify srcFile dstFile
        Copied 200GB max/16GB actual VM as 200GB max/16GB actual, bloat: 0
        Copy time: 17:37 (mm:ss)
    

    Varying Blocksize

    I was concerned about the 'bloat' during dd copying (file size increase from the original), so I varied the blocksize. I used 'time' to also get the total time and CPU%. The original file in this case is a 7.3GB sparse 200GB file:

    4K:   5:54.64, 56%, 7.3GB
    8K:   3:43.25, 58%, 7.3GB
    16K:  2:23.20, 59%, 7.3GB
    32K:  1:49.25, 62%, 7.3GB
    64K:  1:33.62, 64%, 7.3GB
    128K: 1:40.83, 55%, 7.4GB
    256K: 1:22.73, 64%, 7.5GB
    512K: 1:44.84, 74%, 7.6GB
    1M:   1:16.59, 70%, 7.9GB
    2M:   1:21.58, 66%, 8.4GB
    4M:   1:17.52, 69%, 9.5GB
    8M:   1:10.92, 76%, 12GB
    16M:  1:17.09, 78%, 16GB
    32M:  2:54.10, 90%, 22GB
    

    QUESTION: Can you verify that I've identified the best methods for copying a sparse file to get best overall performance? Any suggestions on how to do this better are welcomed as are any concerns you might have with the methods I'm using.

    • mpez0
      mpez0 over 9 years
      The only other one I'd try, given your commendable efforts, is rsync with the --sparse option. It's also possible that different block size in dd would improve its speed or bloat.
    • Olivier Dulac
      Olivier Dulac over 9 years
      tar is a good one to try too
    • Steve Amerige
      Steve Amerige over 9 years
      @OlivierDulac I tried tar, but this was so poor a performer that I didn't even include it. I should have. The above are local copies. I'll add network copying performance data for a 10Gbps network, copying in the same subnet. Once I have that, I think I'll have enough data to draw a workable conclusion for which is the best overall performer.
    • Travis Campbell
      Travis Campbell over 9 years
      rsync also supports --sparse for doing this style of copy. You might want to try that. Also has the added benefit of being measurable on both local and over-the-network copying.
    • Steve Amerige
      Steve Amerige over 9 years
      @TravisCampbell I added data for rsync. It was the worst performer of all the tests I ran.
  • bummi
    bummi over 9 years
    Hi Steve, please split into question an answer, this how stackexchange sites work.