Download big file over bad connection

24,713

Solution 1

lftp (Wikipedia) is good for that. It supports a number of protocols, can download files using several concurrent parallel connections (useful where there's a lot of packet loss not caused by congestion), and can automatically resume downloads. It's also scriptable.

Here including the fine-tuning you came up with (credits to you):

lftp -c 'set net:idle 10
         set net:max-retries 0
         set net:reconnect-interval-base 3
         set net:reconnect-interval-max 3
         pget -n 10 -c "https://host/file.tar.gz"'

Solution 2

I can't test this for you in your situation, but you should not be using --range with -C -. Here's what the man page has to say on the subject:

Use -C - to tell curl to automatically find out where/how to resume the transfer. It then uses the given output/input files to figure that out.

Try this instead:

curl -s --retry 9999 --retry-delay 3 --speed-limit 2048 --speed-time 10 \
    --retry-max-time 0 -C - -o "${FILENAME}.part${i}" "${URL}" &

I'd also strongly recommend that you always double-quote your variables so that the shell won't try to parse them. (Consider a URL https://example.net/param1=one&param2=two, where the shell would split the value at &.)

Incidentally, 120 KB/s is approximately 1.2 Mb/s, which is a typical xDSL upload speed in many parts of the world. 10 seconds per MB, so a little under one hour for the entire file. Not so slow, although I do appreciate you're more concerned with reliability rather than speed.

Solution 3

Maybe you have more luck with wget --continue:

wget --continue ${URL}

See also https://www.cyberciti.biz/tips/wget-resume-broken-download.html

Solution 4

I had the same problem in my previous job (except with 300GB+ offsite database backups on an (from the office) unstable connection). Users had grave problems downloading file bigger than approx. 1 GB before the connection conked out. Since they used the standard Windows copy/paste file over an RDP connection, no wonder.

One thing I found out, was that our VPN settings were completely mismatched with the network setup (mainly the MTU length). The second thing is that Windows' file copier is NOT made for copying stuff over the internet.

My first solution was a simple FTP server, however, it didn't solve the problem of transmission time (often 3-4 hours on our connection).

My second solution was to use Syncthing to send the files directly to an inhouse NAS. Each night after backups were complete, Syncthing sent everything we needed back to a NAS in the office. Not only was the problem of 3+ hours transmission time solved, but I was spared the 1-2 hours to courier the data if there was a crisis. At 8AM every morning, the files would be updated on the NAS, and we had our backups ready. Even with huge files (at one point an almost 700GB database), I have yet to experience any file corruption or other problems...

Syncthing is very easy to set up and manage and is avalable for all platforms (even phones), and has very good handling of bad connections.. if the connection fails, Syncthing simply waits a few minutes and tries again.

You do need a local folder to sync things to, but your files will be available almost as soon as they are updated.

Another good thing about syncthing, is that it can be set to only syncronize the changes in the file (like in a differential backup)... possibly solving a part of your bandwidth problem.

Solution 5

Outside the box: Put on an eyepatch and use bittorrent. Make the blocksize small when you create the torrent. Obviously, encrypt the file so anyone else who finds the torrent gets nothing useful.

Share:
24,713

Related videos on Youtube

Crouching Kitten
Author by

Crouching Kitten

Updated on September 18, 2022

Comments

  • Crouching Kitten
    Crouching Kitten over 1 year

    Is there an existing tool, which can be used to download big files over a bad connection?

    I have to regularly download a relatively small file: 300 MB, but the slow (80-120 KBytes/sec) TCP connection randomly breaks after 10-120 seconds. (It's a big company's network. We contacted their admins (working from India) multiple times, but they can't or don't want to do anything.) The problem might be with their reverse proxies / load balancers.

    Up until now I used a modified version of pcurl: https://github.com/brunoborges/pcurl

    I changed this line:

    curl -s --range ${START_SEG}-${END_SEG} -o ${FILENAME}.part${i} ${URL} &
    

    to this:

    curl -s --retry 9999 --retry-delay 3 --speed-limit 2048 --speed-time 10 \
        --retry-max-time 0 -C - --range ${START_SEG}-${END_SEG} -o ${FILENAME}.part${i} ${URL} &
    

    I had to add --speed-limit 2048 --speed-time 10 because the connection mostly just hangs for minutes when it fails.

    But recently even this script can't complete.

    One problem is that it seems to ignore the -C - part, so it doesn't "continue" the segment after a retry. It seems to truncate the relating temp file, and start from the beginning after each fail. (I think the --range and the -C options cannot be used together.)

    The other problem is that this script downloads all segments at the same time. It cannot have 300 segments, of which only are 10 being downloaded at a time.

    I was thinking of writing a download tool in C# for this specific purpose, but if there's an existing tool, or if the curl command could work properly with different parameters, then I could spare some time.

    UPDATE 1: Additional info: The parallel download functionality should not be removed, because they have a bandwidth limit (80-120 Kbytes / sec, mostly 80) per connection, so 10 connections can cause a 10 times speedup. I have to finish the file download in 1 hour, because the file is generated hourly.

  • Crouching Kitten
    Crouching Kitten about 6 years
    Thank you. I tried this, but it doesn't seem to use parallel connections: lftp -e 'set net:timeout 15; set net:max-retries 0; set net:reconnect-interval-base 3; set net:reconnect-interval-max 3; pget -n 10 -c "https://host/file.tar.gz"; exit'
  • Crouching Kitten
    Crouching Kitten about 6 years
    Oh, when I removed the "net:timeout" setting, it became parallel. But it slows down after a while. I think because the connections start to "hang".
  • Crouching Kitten
    Crouching Kitten about 6 years
    It works perfectly with the net:idle setting. Thank you! I'll add my solution to the question.
  • RonJohn
    RonJohn about 6 years
    It's the rare corporation that internally distributes files over torrent.
  • Eric Duminil
    Eric Duminil about 6 years
    Exactly. Even if the connection is really bad and the file somehow got damaged, it should work fine. PRO-TIP: Encrypt it, rename it to 'KimKardashianNude.mp4' and let thousands of people help you with the connection. Automatic, distributed backup for free! :)
  • ivanivan
    ivanivan about 6 years
    As Linus himself said - "Only wimps use tape backup: real men just upload their important stuff on ftp, and let the rest of the world mirror it ;)"
  • slebetman
    slebetman about 6 years
    Note that lftp supports torrent as the underlying transfer protocol. Use it. All the other protocols it supports don't support per-chunk error detection/correction and rely on TCP to provide error detection. Note that torrent uses TCP error detection but on top of it verifies the sha1 hash of your entire file and also each block transferred over the network. In my experience a 4GB movie torrented over a 4G network typically have around two hash verification errors - this means TCP considered the received packet to be error free even though they were corrupted
  • slebetman
    slebetman about 6 years
    If you care about your data not being corrupted, use torrent
  • user1730706
    user1730706 about 6 years
    Note that this is not compatible with Windows beyond Cygwin github.com/lavv17/lftp/issues/431
  • Stéphane Chazelas
    Stéphane Chazelas about 6 years
    @StevenPenny, this is unix.se. Microsoft Windows is off topic here (and in my limited experience, MS Windows is useless without Cygwin anyway).
  • Stéphane Chazelas
    Stéphane Chazelas about 6 years
    @slebetman, here the OP uses HTTPS. TLS provides extra integrity check (over TCP's weak checksum) via HMAC. Also HTTP has support for checksuming content or chunks with the Content-MD5 and Digest headers (though I don't know if lftp supports those or if they would be used in the OP's case). In any case, it doesn't look like torrent would be an option for the OP.
  • Loren Pechtel
    Loren Pechtel about 6 years
    @RonJohn I know it's not commonly used but that doesn't mean it couldn't be used. The bittorrent protocol is very good at putting up with bad connections.
  • RonJohn
    RonJohn about 6 years
    @LorenPechtel a Work Order for RISK to approved the ports, a WO for the NOC to open the ports, and WOs for the Linux and Windows teams to install the torrent clients, and another WO to monitor them all so that only approved files are being transferred. And none of that takes into account HIPPA, PCI or the fact that a file that's supposed to go from Point A to Point B is now going from Point A to Points C, D, E, F, G, H, I and J before getting to Point B. RISK will disapprove for that very reason.
  • Loren Pechtel
    Loren Pechtel about 6 years
    @RonJohn Yeah, there are a lot of headaches. You can keep it from going elsewhere, though--both stations are set only to connect with the IP of the other computer.
  • RonJohn
    RonJohn about 6 years
    More than once, I've had to copy prod files to my laptop from A, and then to B. Good thing it's an official corporate laptop... :)