The most *robust* remote file copy?

7,851

Solution 1

OK, I have found the solution in my case. I am indeed using the suggested while loop. It now looks like this:

while ! \
rsync -aiizP --append --stats . -e ssh [email protected]:./path/rfiles ; \
do now=$(date +"%T") ; echo · Error at $now · ; sleep 5 ; done

Without the while loop, I would have to manually start the rsync again. Now, it works just like a charm.

The interesting thing is: I get the error exactly ten minutes after the connection is lost and about 9 minutes after the connection is up and running again! In the meantime, nothing is happening in the terminal window. I wonder where this 10 minute timeout comes from.

Thank you very much for your help.

Gary

FYI: This is the timeout error that I receive (10 mins after the fact):

...
thedirectory/afile.ext
Read from remote host myhost.com: Operation timed out
rsync: writefd_unbuffered failed to write 16385 bytes [sender]: Broken pipe (32)
rsync: connection unexpectedly closed (394 bytes received so far) [sender]
rsync error: unexplained error (code 255) at /SourceCache/rsync/rsync-40/rsync/io.c(452) [sender=2.6.9]

Solution 2

The main problem with rsync that it can't continue induvidual files. If you are copying a complex directory structure, it is okay, but if you want to copy for example a single dvd image, it won't be robust.

For such cases I use wget. More precisely,

wget -c -t 0 -T 10 http://....

Especially interesting is the 20 sec timeout, which resolves the common problem that our tools are effectively hanging/freezing because a single lost packet.

Of course it needs a http server on the source side. If it is impractical, there is a tool named

split

which can split big files to smaller, and then use for them rsync. The splitted files can be later reunified with a simple cat.

Of course you can run rsync even in a loop until it succeeds:

while ! rsync ...; do echo next try; done

Extension after a comment

rsync can continue files, with the -partial flag. Thank you @GregHewgill! :-)

Solution 3

I would definitely suggest rsync. I use rsync to copy files anytime I think that the connection has any possibility of being interrupted. If the copy fails, I know I can simply start it again.

It's easy to put it in a while loop if you need it to automatically restart until it succeeds.

Solution 4

If you try to solve this problem at the level of the file copy tool, rsync is as good as it gets. Be sure to use the options -au, so that rsync won't try to synchronize the same file multiple times. Rsync will make progress as long as it's able to at least exchange the file list and transfer one file fully before being interrupted; if you can't ensure that, you're going to have trouble without a network-level tool.

I think it is easier (and more natural) to solve this at the network level: build a reliable network tunnel on top of your unreliable network connection. Years ago I used Rocks for this very purpose; it's unmaintained and I haven't tried compiling or using it recently, but there's no fundamental reason why it wouldn't work. Watch this space for alternatives.

Share:
7,851
Gary Czychi
Author by

Gary Czychi

Updated on September 18, 2022

Comments

  • Gary Czychi
    Gary Czychi over 1 year

    How would I go about to copy files over a very unstable internet connection?

    Sometimes, the connection is lost, other times the IP of one machine or the other machine is changed, sometimes both, though dynamic DNS will catch it.

    Which tool or command would you suggest?

    I've heard that rsync is pretty nifty in copying only the diff, but that means a lot of work either restarting it again and again or putting it into a while or cronjob.

    I was hoping for something easier and foolproof.

    Addendum:

    It's about copying every now and then a couple of directories with a few very large files >5GB in them from one site to the other. After the copy, both are moved locally to different locations.

    I can't do anything on the networking level, I wouldn't have the knowledge to do so.

    I'd rather not set up a web server in order to use wget. That is not secure and seems like a circuitous route.

    I have already established an SSH connection and could now rsync, as rsync is already installed on both machines (I wouldn't be able to get an rsync daemon up and running).

    Any hints on how I could make an intelligent rsync over ssh so that it tries to continue when the line is temporarily cut? But rsync won't be the problem when the ssh connection dies. So something like this (https://serverfault.com/questions/98745/) probably won't work:

    while ! rsync -a .... ; do sleep 5 ; done
    

    Any ideas?

    Thanks a lot!

    Gary

    • Admin
      Admin almost 10 years
      That would mean installing the lftp package, and I don't know how to do this :-(
  • goldilocks
    goldilocks almost 10 years
    +1 For wget tackling the big file. However, you don't actually need apache, you just need a simple HTTP server.
  • Greg Hewgill
    Greg Hewgill almost 10 years
    rsync certainly can resume individual files; see the --partial option.
  • Gary Czychi
    Gary Czychi almost 10 years
    Rocks would be cool, but my knowledge is limited to install or even compile a tool. I was hoping for something easy.
  • i336_
    i336_ over 8 years
    devd or sthttpd are usable minimal servers; DEFINITELY USE split on poor links because the smaller the split size the less you have to redownload in case of failure.
  • Michael
    Michael about 7 years
    The main problem with rsync that it can't continue induvidual files. A big problem I am seeing is that with an rsync with lots of files, the network often flakes out before it gets through the list of files ("receiving incremental file list") to where it needs to start syncing again, and as a result never makes any progress...