Copying huge files between two remote machines - Efficiently

23,376

Solution 1

This problem can be solved with rsync. At least this solution should be competitive in terms of performance.

First, rsync can be called from one of the remote systems to overcome the limitation in the inability to copy between two remote systems directly.

Second, encryption/decryption can be avoided by running rsync in Daemon Access mode instead of Remote Shell Access mode.

In daemon access mode rsync does not tunnel the traffic through an ssh connection. Instead it uses its own protocol on top of TCP.

Normally you run rsync daemon from inet.d or stand-alone. Anyway this requires root access to one of the remote systems. Assuming root access is not available, it is still possible to start up the daemon.

Start rsync daemon as a non-privileged user on the destination machine

ssh -i private_key ssh_user@destination-IP \
       "echo -e 'pid file = /tmp/rsyncd.pid\nport = 1873' > /tmp/rsyncd.conf

ssh -i private_key ssh_user@destination-IP \
       rsync --config=/tmp/rsyncd.conf --daemon

Actually copy the files

ssh -i private_key ssh_user@source_ip \
       "rsync [OPTIONS] source-path \
              rsync://ssh_user@destination-IP:1873:destination-path"

Solution 2

The least-overhad solution would be using netcat:

destination$ nc -l -p 12345 > /path/destinationfile
source$ cat /path/sourcfile | nc desti.nation.ip.address 12345

(some netcat version do not need the "-p" flag for port)

All this does is send the unencrypted data, unauthenticated over the network from one pc to the other. Of course it is not the most "comfortable" way to do it.

Other alternatives would be trying to change the ssh cipher (ssh -c), or using ftp.

PS: rsync works fine with remote machines, but it is mostly used in combination with ssh, so no speedup here.

Solution 3

If encryption isn't a concern, throw up an NFS daemon on C and mount the directory on B. Use rsync run on B, but specify the local directory paths.

Ignoring whatever your use case for involving A is, just prepend ssh user@B rsync... to the command.

Transfers data without encryption overhead and only transfers the different files.

Also, FTP was built with 3rd party server-to-server transfers as a protocol feature.

Solution 4

You can use a low crypting method : you can use rsync --rsh="ssh -c arcfour" to increase the speed. I my tests, I am waiting disks and no more the network connection. And use rsync, it is good !

Share:
23,376

Related videos on Youtube

Varun
Author by

Varun

Updated on September 18, 2022

Comments

  • Varun
    Varun almost 2 years

    I have a shell script which keeps on copying huge files (2 GB to 5 GB) between remote systems. Key based authentication is used with agent-forwarding and everything works. For ex: Say the shell script is running on machine-A and copying files from machine-B to machine-C.

    "scp -Cp -i private-key ssh_user@source-IP:source-path ssh_user@destination-IP:destination-path"
    

    Now the problem is the process sshd is continuously taking loads of CPU.
    For ex: top -c on destination machine (i.e. machine-C) shows

      PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                         
    14580 ssh_user  20   0 99336 3064  772 R 85.8  0.0   0:05.39 sshd: ssh_user@notty                                                            
    14581 ssh_user  20   0 55164 1984 1460 S  6.0  0.0   0:00.51 scp -p -d -t /home/binary/instances/instance-1/user-2993/
    

    This results in high load average.

    I believe scp is taking so much CPU because its encrypting/decrypting data. But I don't need encrypted data-transfer as both machine-B and machine-C are in a LAN.

    What other options do I have? I considered 'rsync'. But the rsync man page says:

    GENERAL
           Rsync  copies files either to or from a remote host, or locally on the current host (it does not support copying files between two
           remote hosts).
    

    Edit 1: I am already using ssh cipher = arcfour128. Little improvement but that doesn't solve my problem.

    Edit 2: There are other binaries (my main application) running on the machines and high load average causing them to perform poorly.

    • Admin
      Admin about 12 years
      "rsync doesn't support copying data between remote machines" - erm...what makes you think that? that's exactly what most people use it for
    • Admin
      Admin about 12 years
      @Chopper3: IIRC, rsync doesn't support his very unusual method of copying with two remote machines. Either source or target has to be local.
    • Admin
      Admin about 12 years
      @Varun: If you don't need the files to be copied quickly, you can use -l limit option to limit the transfer speed. This should lower the CPU usage also.
    • Admin
      Admin about 12 years
      This is irrelevant anyway, as the usual transport backend of rsync is ssh, the same as with scp.
    • Admin
      Admin about 12 years
      @Chopper3: The 'rsync' man page says that :)
    • Admin
      Admin about 12 years
      I have modified my question and quoted what the man page says.
    • Admin
      Admin about 12 years
      "This results in high load average." - so what. If you said it was affecting performance elsewhere then it would be worth worrying about, but making your system metrics look nice is not a basis for tuning a system. BTW yes, as mulaz says, it's easy to pass the data via other means, but this may actually be more work for the TCP stack to push more packets across the network. You could still use nc and gzip/gunzip but you'll probably find little difference in the impact compared with scp -C - the encryption part does not require a lot of effort.
  • Varun
    Varun about 12 years
    I am selecting this as correct answer. The 'netcat' solution given by @mulaz is also good but rsync gives many more options like preserving permissions, timestamps etc. Thanks.