cp or rsync, is cp really worth it?

17,084

Solution 1

cp is a part of coreutils, therefore it is present everywhere. Furthermore, it primarily was designed to copy files inside one computer.

rsync isn't a part of coreutils, it isn't present even on the default environment. Moreover it was primarily designed to transfer files over network. Also rsync has more dependencies comparing to coreutils, however this difference doesn't make a big sense.

PS: By the way, the CPU usage is still matters on the embedded systems.

Solution 2

The main reason you don't want to use rsync for every copy operation, is becasue rsync has calulating overhead. Before data transfer actually starts, rsync scans all the files. Then before each file, a comparison is made. This overhead is not insignificant, even with the fast CPU's available in 2012. I do these type of transfers all the time, and on pretty decent sized servers, once you start dealing with gigs of data the overhead can be time consuming.

I'm not saying don't use rsync, not at all, use rsync anytime you can save some transfer time. Just don't use rsync when cp could accomplish the same thing.

What I usually do, first bring over the data using regular copy methods. Then rsync for subsequent changes, which is when those diffs can be leveraged.

Solution 3

Quite apart from overheads in the case of a large or nonexistent diff, rsync does not appear to have an equivalent of cp --reflink=always, which can save an enormous amount of data if copying within a filesystem that supports it (it creates copy-on-write copies, so data in common between the original and the copy (which is, of course, initially all of it) is stored only once). rsync is, however, better at updating CoW copies, using --inplace.

Solution 4

I would expect cp to use less CPU when copying locally because it doesn't use diffs whereas rsync can reduce writes when using diffs. Compression should be avoided locally because you have to read and write the whole file/diff anyway and it requires additional computations.

Share:
17,084

Related videos on Youtube

Soyuz
Author by

Soyuz

Updated on September 18, 2022

Comments

  • Soyuz
    Soyuz over 1 year

    I hope this does not count as a question without a real answer, as I can't seem to find a good reason to use cp(1) over rsync(1) in virtually all circumstances. Should one typically favour rsync over cp? Is there any good guideline for their use?

    • rsync: Transfers the diffs, it can use compression, it can be used remotely (and securely), it can be restarted despite an interruption, even during the transfer of a single large file. '

    • cp : Perhaps it's just simpler to use? Is it faster than rsync?

  • Soyuz
    Soyuz almost 12 years
    Is it a large price to pay, given how powerful our processors are today?
  • Soyuz
    Soyuz almost 12 years
    So, it all boils down to CPU cost or I/O cost
  • J. M. Becker
    J. M. Becker almost 12 years
    Large is subjective, and would change depending on the number of files vs size of data vs IO. My main point would be, it's not insignificant regardless of today's processors. One thing to remember today's processors, still must deal with today's data.
  • J. M. Becker
    J. M. Becker almost 12 years
    @soyuz, that is true for data which has changed, in reference to other data. The smaller the diff, the more rsync is preferred. But when their is no diff, rsync is only bringing overhead without extra benefit.
  • Izkata
    Izkata almost 12 years
    @TechZilla I think you mean, when there's nothing in common. If there is no diff, rsync will just end early 'cause there's nothing to copy, making it far more preferred.
  • Warren Young
    Warren Young almost 12 years
    Agreed. CPUs don't get faster in a vacuum. One of the best reasons we have for GHz CPUs is gigabit networking: a slower CPU simply cannot keep an I/O pipe that big full. Computers tend to be as well balanced as we know how to make them. Take away some CPU power, and some of the I/O capacity goes idle. Increase the I/O demand, and the CPU ends up idling more, waiting on I/O.
  • xeruf
    xeruf over 2 years
    the scan of all files before transfer is not enabled by default anymore, and the payoff of transferring less data is usually much greater than the comparison overhead
  • J. M. Becker
    J. M. Becker over 2 years
    @xeruf, I wasn't speaking about a situation, where you have a payoff. If there is any amount of matching files on the other end, use rsync. I was only speaking about when you are making the initial copy, which I would not use rsync for. If it is on the same fs, I'd likely use cp -a for convenience. If it was crossing partitions, disks, or network, I'd use a tarpipe. After that initial transfer, I would use rsync going forward.
  • xeruf
    xeruf over 2 years
    Didn't know about cp -a, thanks! Though, as soon as the files to copy become bigger I prefer rsync for the progress meter. I don't see any meaningful downside on a modern computer to always using it.
  • xeruf
    xeruf over 2 years
    though it might be possible to get a rough indication of progress with cp -v into git.jfischer.org/xeruf/dotfiles/src/branch/main/.local/bin/…