Speeding up rsync over smb

29,420

Solution 1

I think you're having a misunderstanding of the rsync algorithm and how the tool should be applied.

Rsync's performance advantage comes from doing delta transfers-- that is, moving only the changed bits in a file. In order to determine the changed bits, the file has to be read by the source and destination hosts and block checksums compared to determine which bits changed. This is the "magic" part of rsync-- the rsync algorithm itself.

When you're mounting the destination volume with SMB and using rsync to copy files from what Linux "sees" as a local source and a local destination (both mounted on that machine), most modern rsync versions switch to 'whole file' copy mode, and switch off the delta copy algorithm. This is a "win" because, with the delta-copy algorithm on, rsync would read the entire destination file (over the wire from the NAS) in order to determine what bits of the file have changed.

The "right way" to use rsync is to run the rsync server on one machine and the rsync client on the other. Each machine will read files from its own local storage (which should be very fast), agree on what bits of the files have changed, and only transfer those bits. They way you're using rsync amounts of a trumped-up 'cp'. You could accomplish the same thing with 'cp' and it would probably be faster.

If your NAS device supports running an rsync server (or client) then you're in business. If you're just going to mount it on the source machine via SMB then you might as well just use 'cp' to copy the files.

Solution 2

It sounds like timestamps are your problem, as this page relates:

http://www.goodjobsucking.com/?p=16

The proposed solution is to add

--modify-window=1

to the rsync parameters.

Solution 3

Yes, you can speed it up. You need to make either the source or destination look like a remote machine, say by addressing it as "localhost:".

You stated that you are mounting the SMB share locally. This makes the source or destination look like a local path to rsync. The rsync man page states that copies where the source and destination are local paths will copy the whole file. This is stated in the paragraph for the "--whole-file" option in the man page. Therefore, the delta algorithm isn't used. Using the "localhost:" workaround will restore the delta algorithm functionality and will speed up transfers.

Solution 4

Thought I would throw my 2p in here.

My brother has just installed a Buffalo NAS on his office network. He's now looking at off-site backups, so that should the office burn down, at least he still has all his business documents elsewhere (many hundreds of miles away).

My first hurdle was to get the VPS he has (a small Linux virtual private server, nothing too beefy) to dial-in as a VPN user to his broadband router (he's using a DrayTek for this) so that it itself can be part of his VPN, and so it can then can access the NAS directly, in a secure fashion. Got that sorted and working brilliantly.

The next problem was then transferring the files from the NAS to the VPS server. I started off by doing a Samba mount and ran into exactly the same (or even worse) issue that you've described. I did a dry-run rsync and it took over 1 hour 30 mins just to work out what files it was going to transfer, because as Evan says, under this method, the other end isn't rsync so it has to do many filing system calls/reads on the Samba mount (across a PPTP/tunnelled connection, with a round trip time of about 40ms). Completely unworkable.

Little did I know that the Buffalo actually runs an rsync daemon so, using that instead, the entire dry-run takes only 1 minute 30 seconds for 87k files totalling 50Gb. Obviously, to transfer 50Gb of files (from a NAS that is on a broadband link with only 100k/sec outbound bandwidth) is another matter entirely (this will take several days) but, once the initial rsync is complete, any incremental backups should be grease lightening (his data is not going to change much on a daily basis).

My suggestion is use a decent NAS, that supports rsync, for the reasons Evan has said above. It will solve all your problems.

Share:
29,420

Related videos on Youtube

Pablo
Author by

Pablo

You can find my blog at https://pupeno.com where I publish about coding and other stuff.

Updated on September 17, 2022

Comments

  • Pablo
    Pablo almost 2 years

    I'm backing up a Linux box over SMB to a NAS. I mount the NAS locally and then I rsync a lot of data (100GB or so). I believe it's taking an awfully long time to do it: more than 12 hours. I would expected to be much faster once everything is copied since almost nothing is changed from day to day.

    Is there a way to speed this up?

    I was thinking that maybe rsync thinks it's working with local hard disks and uses checksum instead of time/size comparisons? But I didn't find a way to force time and date comparisons. Anything else I could check?

    • warren
      warren almost 15 years
      I'd also suggest looking at NFS instead fo SMB - I've noticed (and maybe it's just me) that it's faster tha Samba
    • Pablo
      Pablo almost 15 years
      Unfortunately, this NAS doesn't have NFS and for now, I'm stuck with it.
    • Kyle__
      Kyle__ almost 13 years
      Check the NAS's capabilities using a port mapper, like nmap. I've run into several NAS units that ran a native rsync service, even though there was no mention in the documentation, and no mention in the config.
    • dtoubelis
      dtoubelis almost 13 years
      Please also check this thread [rsync to NAS copies everything every time][1] [1]: serverfault.com/questions/262411/…
  • Spence
    Spence almost 15 years
    Ooo! Downvotes! I'd be curious to hear why you downvoted the answer, considering it's technically accurate.
  • Pablo
    Pablo almost 15 years
    Evan, give me a couple of minutes to write my comment.
  • Pablo
    Pablo almost 15 years
    The same NAS, the same switch, another computer, running Windows, back up to it, much more information, in under four hours.
  • Spence
    Spence almost 15 years
    What behaviour are you seeing that's telling you that it's checksumming the files? The "quick check" behaviour is the default behaviour, so there's no way to "force" it. If you can't run rsync on the NAS just use 'cp'. It'll be as fast or faster.
  • Pablo
    Pablo almost 15 years
    According to how I understand rsync work, it should check the local date and time, the remote date and time and if they match not copy the file. Which means it shouldn't copy 99% of the files, but the fact that it takes more than 12hs for 60GB or so tells me that is either copying everything (which seems to be what you are implying by saying that cp will be faster) or that it is actually checksumming, which means it's not copying everything, but it is downloading everything.
  • Spence
    Spence almost 15 years
    I'd run it with the "--dry-run" and "--verbose" arguments to see what it thinks it's doing. I wonder if your NAS device isn't representing the modification times exactly the same as the source. You could add a "--size-only" argument and see if that changes things. What filesystem are you running on the NAS device?
  • Pablo
    Pablo almost 15 years
    Thanks Evan, I'll try those recommendations. Regarding NAS' FS, I'm not sure, but I would guess it's ext3.
  • ash
    ash almost 13 years
    @Evan Anderson: He's locally mounting the SMB share. According to the rsync docs, copies to and from a local path doesn't use the delta transfers but instead copies the whole file. That coupled with the fact that rsync is less efficient than cp results in slow transfers.
  • Spence
    Spence almost 13 years
    @Starfish: That's what I say in my third paragraph. It switches to whole copy mode and doesn't do delta transfers in that situation.
  • Michael
    Michael over 12 years
    +1 for making me aware that Buffalo NASes run rsync -- thanks!
  • logoff
    logoff over 3 years
    -W // --whole-file was the key in my case (over local network). I went from ~3.5 MB/s to ~35 MB/s. a 10x factor!!