cp or rsync, is cp really worth it?
Solution 1
cp
is a part of coreutils, therefore it is present everywhere. Furthermore, it primarily was designed to copy files inside one computer.
rsync
isn't a part of coreutils, it isn't present even on the default environment. Moreover it was primarily designed to transfer files over network. Also rsync
has more dependencies comparing to coreutils, however this difference doesn't make a big sense.
PS: By the way, the CPU usage is still matters on the embedded systems.
Solution 2
The main reason you don't want to use rsync
for every copy operation, is becasue rsync
has calulating overhead. Before data transfer actually starts, rsync
scans all the files. Then before each file, a comparison is made. This overhead is not insignificant, even with the fast CPU's available in 2012. I do these type of transfers all the time, and on pretty decent sized servers, once you start dealing with gigs of data the overhead can be time consuming.
I'm not saying don't use rsync
, not at all, use rsync
anytime you can save some transfer time. Just don't use rsync
when cp
could accomplish the same thing.
What I usually do, first bring over the data using regular copy methods. Then rsync
for subsequent changes, which is when those diffs can be leveraged.
Solution 3
Quite apart from overheads in the case of a large or nonexistent diff, rsync
does not appear to have an equivalent of cp --reflink=always
, which can save an enormous amount of data if copying within a filesystem that supports it (it creates copy-on-write copies, so data in common between the original and the copy (which is, of course, initially all of it) is stored only once). rsync
is, however, better at updating CoW copies, using --inplace
.
Solution 4
I would expect cp
to use less CPU when copying locally because it doesn't use diffs whereas rsync
can reduce writes when using diffs. Compression should be avoided locally because you have to read and write the whole file/diff anyway and it requires additional computations.
Related videos on Youtube
Soyuz
Updated on September 18, 2022Comments
-
Soyuz over 1 year
I hope this does not count as a question without a real answer, as I can't seem to find a good reason to use
cp(1)
overrsync(1)
in virtually all circumstances. Should one typically favourrsync
overcp
? Is there any good guideline for their use?rsync
: Transfers the diffs, it can use compression, it can be used remotely (and securely), it can be restarted despite an interruption, even during the transfer of a single large file. 'cp
: Perhaps it's just simpler to use? Is it faster than rsync?
-
Daniel about 9 yearsSame question with better answer: serverfault.com/questions/43014/…
-
Soyuz almost 12 yearsIs it a large price to pay, given how powerful our processors are today?
-
Soyuz almost 12 yearsSo, it all boils down to CPU cost or I/O cost
-
J. M. Becker almost 12 yearsLarge is subjective, and would change depending on the number of files vs size of data vs IO. My main point would be, it's not insignificant regardless of today's processors. One thing to remember today's processors, still must deal with today's data.
-
J. M. Becker almost 12 years@soyuz, that is true for data which has changed, in reference to other data. The smaller the diff, the more
rsync
is preferred. But when their is no diff,rsync
is only bringing overhead without extra benefit. -
Izkata almost 12 years@TechZilla I think you mean, when there's nothing in common. If there is no diff, rsync will just end early 'cause there's nothing to copy, making it far more preferred.
-
Warren Young almost 12 yearsAgreed. CPUs don't get faster in a vacuum. One of the best reasons we have for GHz CPUs is gigabit networking: a slower CPU simply cannot keep an I/O pipe that big full. Computers tend to be as well balanced as we know how to make them. Take away some CPU power, and some of the I/O capacity goes idle. Increase the I/O demand, and the CPU ends up idling more, waiting on I/O.
-
xeruf over 2 yearsthe scan of all files before transfer is not enabled by default anymore, and the payoff of transferring less data is usually much greater than the comparison overhead
-
J. M. Becker over 2 years@xeruf, I wasn't speaking about a situation, where you have a payoff. If there is any amount of matching files on the other end, use
rsync
. I was only speaking about when you are making the initial copy, which I would not usersync
for. If it is on the same fs, I'd likely usecp -a
for convenience. If it was crossing partitions, disks, or network, I'd use a tarpipe. After that initial transfer, I would usersync
going forward. -
xeruf over 2 yearsDidn't know about
cp -a
, thanks! Though, as soon as the files to copy become bigger I preferrsync
for the progress meter. I don't see any meaningful downside on a modern computer to always using it. -
xeruf over 2 yearsthough it might be possible to get a rough indication of progress with
cp -v
into git.jfischer.org/xeruf/dotfiles/src/branch/main/.local/bin/…