What is fastest way to copy a sparse file? What method results in the smallest file?
From the above benchmarking, it looks like using dd on our target hardware with a blocksize of 64K gives us the best overall result considering the copy time and bloat:
dd if=srcFile of=dstFile iflag=direct oflag=direct bs=64K conv=sparse
Related videos on Youtube
Steve Amerige
Updated on September 18, 2022Comments
-
Steve Amerige almost 2 years
BACKGROUND: I'm copying a sparse qcow2 VM image that is 200GB in size, but has 16GB of allocated blocks. I've tried various methods to copy this sparse file within the same server and have some preliminary results. Environment is RHEL 6.6 or CentOS 6.6 x64.
ls -lhs srcFile 16G -rw-r--r-- 1 qemu qemu 201G Feb 4 11:50 srcFile
Via cp - best speed
cp --sparse=always srcFile dstFile Performance Notes: Copied 200GB max/16GB actual VM as 200GB max/26GB actual, bloat: 10GB Copy time: 1:02 (mm:ss)
Via dd - best overall performer
dd if=srcFile of=dstFile iflag=direct oflag=direct bs=4M conv=sparse Performance Notes: Copied 200GB max/16GB actual VM as 200GB max/21GB actual, bloat: 5GB Copy time: 2:02 (mm:ss)
Via cpio
mkdir tmp$$ echo srcFile | cpio -p --sparse tmp$$; mv tmp$$/srcFile dstFile rmdir tmp$$ Performance Notes: Copied 200GB max/16GB actual VM as 200GB max/26GB actual, bloat: 10GB Copy time: 9:26 (mm:ss)
Via rsync
rsync --ignore-existing -aS srcFile dstFile Performance Notes: Copied 200GB max/16GB actual VM as 200GB max/26GB actual, bloat: 10GB Copy time: 24:49 (mm:ss)
Via virt-sparsify - best size
virt-sparsify srcFile dstFile Copied 200GB max/16GB actual VM as 200GB max/16GB actual, bloat: 0 Copy time: 17:37 (mm:ss)
Varying Blocksize
I was concerned about the 'bloat' during dd copying (file size increase from the original), so I varied the blocksize. I used 'time' to also get the total time and CPU%. The original file in this case is a 7.3GB sparse 200GB file:
4K: 5:54.64, 56%, 7.3GB 8K: 3:43.25, 58%, 7.3GB 16K: 2:23.20, 59%, 7.3GB 32K: 1:49.25, 62%, 7.3GB 64K: 1:33.62, 64%, 7.3GB 128K: 1:40.83, 55%, 7.4GB 256K: 1:22.73, 64%, 7.5GB 512K: 1:44.84, 74%, 7.6GB 1M: 1:16.59, 70%, 7.9GB 2M: 1:21.58, 66%, 8.4GB 4M: 1:17.52, 69%, 9.5GB 8M: 1:10.92, 76%, 12GB 16M: 1:17.09, 78%, 16GB 32M: 2:54.10, 90%, 22GB
QUESTION: Can you verify that I've identified the best methods for copying a sparse file to get best overall performance? Any suggestions on how to do this better are welcomed as are any concerns you might have with the methods I'm using.
-
mpez0 over 9 yearsThe only other one I'd try, given your commendable efforts, is rsync with the --sparse option. It's also possible that different block size in dd would improve its speed or bloat.
-
Olivier Dulac over 9 yearstar is a good one to try too
-
Steve Amerige over 9 years@OlivierDulac I tried tar, but this was so poor a performer that I didn't even include it. I should have. The above are local copies. I'll add network copying performance data for a 10Gbps network, copying in the same subnet. Once I have that, I think I'll have enough data to draw a workable conclusion for which is the best overall performer.
-
Travis Campbell over 9 yearsrsync also supports --sparse for doing this style of copy. You might want to try that. Also has the added benefit of being measurable on both local and over-the-network copying.
-
Steve Amerige over 9 years@TravisCampbell I added data for rsync. It was the worst performer of all the tests I ran.
-
-
bummi over 9 yearsHi Steve, please split into question an answer, this how stackexchange sites work.