Copying huge files between two remote machines - Efficiently
Solution 1
This problem can be solved with rsync
. At least this solution should be competitive in terms of performance.
First, rsync
can be called from one of the remote systems to overcome the limitation in the inability to copy between two remote systems directly.
Second, encryption/decryption can be avoided by running rsync
in Daemon Access mode instead of Remote Shell Access mode.
In daemon access mode rsync
does not tunnel the traffic through an ssh connection. Instead it uses its own protocol on top of TCP.
Normally you run rsync daemon from inet.d or stand-alone. Anyway this requires root access to one of the remote systems. Assuming root access is not available, it is still possible to start up the daemon.
Start rsync
daemon as a non-privileged user on the destination machine
ssh -i private_key ssh_user@destination-IP \
"echo -e 'pid file = /tmp/rsyncd.pid\nport = 1873' > /tmp/rsyncd.conf
ssh -i private_key ssh_user@destination-IP \
rsync --config=/tmp/rsyncd.conf --daemon
Actually copy the files
ssh -i private_key ssh_user@source_ip \
"rsync [OPTIONS] source-path \
rsync://ssh_user@destination-IP:1873:destination-path"
Solution 2
The least-overhad solution would be using netcat:
destination$ nc -l -p 12345 > /path/destinationfile
source$ cat /path/sourcfile | nc desti.nation.ip.address 12345
(some netcat version do not need the "-p" flag for port)
All this does is send the unencrypted data, unauthenticated over the network from one pc to the other. Of course it is not the most "comfortable" way to do it.
Other alternatives would be trying to change the ssh cipher (ssh -c), or using ftp.
PS: rsync works fine with remote machines, but it is mostly used in combination with ssh, so no speedup here.
Solution 3
If encryption isn't a concern, throw up an NFS daemon on C
and mount the directory on B
. Use rsync run on B
, but specify the local directory paths.
Ignoring whatever your use case for involving A
is, just prepend ssh user@B rsync...
to the command.
Transfers data without encryption overhead and only transfers the different files.
Also, FTP was built with 3rd party server-to-server transfers as a protocol feature.
Solution 4
You can use a low crypting method : you can use rsync --rsh="ssh -c arcfour"
to increase the speed. I my tests, I am waiting disks and no more the network connection. And use rsync, it is good !
Related videos on Youtube
Varun
Updated on September 18, 2022Comments
-
Varun almost 2 years
I have a shell script which keeps on copying huge files (2 GB to 5 GB) between remote systems. Key based authentication is used with agent-forwarding and everything works. For ex: Say the shell script is running on machine-A and copying files from machine-B to machine-C.
"scp -Cp -i private-key ssh_user@source-IP:source-path ssh_user@destination-IP:destination-path"
Now the problem is the process sshd is continuously taking loads of CPU.
For ex: top -c on destination machine (i.e. machine-C) showsPID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14580 ssh_user 20 0 99336 3064 772 R 85.8 0.0 0:05.39 sshd: ssh_user@notty 14581 ssh_user 20 0 55164 1984 1460 S 6.0 0.0 0:00.51 scp -p -d -t /home/binary/instances/instance-1/user-2993/
This results in high load average.
I believe scp is taking so much CPU because its encrypting/decrypting data. But I don't need encrypted data-transfer as both machine-B and machine-C are in a LAN.
What other options do I have? I considered 'rsync'. But the rsync man page says:
GENERAL Rsync copies files either to or from a remote host, or locally on the current host (it does not support copying files between two remote hosts).
Edit 1: I am already using ssh cipher = arcfour128. Little improvement but that doesn't solve my problem.
Edit 2: There are other binaries (my main application) running on the machines and high load average causing them to perform poorly.
-
Admin about 12 years"rsync doesn't support copying data between remote machines" - erm...what makes you think that? that's exactly what most people use it for
-
Admin about 12 years@Chopper3: IIRC, rsync doesn't support his very unusual method of copying with two remote machines. Either source or target has to be local.
-
Admin about 12 years@Varun: If you don't need the files to be copied quickly, you can use
-l limit
option to limit the transfer speed. This should lower the CPU usage also. -
Admin about 12 yearsThis is irrelevant anyway, as the usual transport backend of
rsync
is ssh, the same as withscp
. -
Admin about 12 years@Chopper3: The 'rsync' man page says that :)
-
Admin about 12 yearsI have modified my question and quoted what the man page says.
-
Admin about 12 years"This results in high load average." - so what. If you said it was affecting performance elsewhere then it would be worth worrying about, but making your system metrics look nice is not a basis for tuning a system. BTW yes, as mulaz says, it's easy to pass the data via other means, but this may actually be more work for the TCP stack to push more packets across the network. You could still use nc and gzip/gunzip but you'll probably find little difference in the impact compared with scp -C - the encryption part does not require a lot of effort.
-
-
Varun about 12 yearsI am selecting this as correct answer. The 'netcat' solution given by @mulaz is also good but rsync gives many more options like preserving permissions, timestamps etc. Thanks.