What protocol is used for downloading files?

5,243

Solution 1

Say I download an executable like Pycharm from Jetbrains.com. HTTP was used to deliver contents of the website - is this also used when I download the file? I read that FTP was used but also saw it's been disabled for modern browsers - what is the recommended protocol?

Look at the URL shown in your downloads list – if it says http:// or https://, then yes, HTTP was used to download the file.

Nearly all file downloads from websites (and even most downloads not from websites, such as game updates) are nowadays done via HTTP.

There aren't many alternatives. Anonymous FTP used to be more common in the past, but several aspects of its design are problematic nowadays (FTP actually predates the TCP/IP of the Internet), such as its usage of separate "data" connections resulting in firewall-related problems. Anonymous NFS (WebNFS) never became a thing, either.

Also, if there is a network disruption, sometimes I can resume the download without losing progress. Is this because a "session" was created and I can rejoin the session and continue the download?

No; the resumption mechanism is stateless, as is most everything else about HTTP.

When you're requesting a static file (as opposed to a dynamically generated webpage), the browser can ask for a specific range instead of the whole file. For example, if your download stopped after 12300 bytes, you can resume at any time by including a Range: 12301- header.

So as long as the file still exists, all you need to do is keep re-requesting the same URL with an appropriate Range header added. (Browsers additionally use the If-Match header to make sure the file hasn't changed.)

There are websites which offer downloads limited to a specific session (either as a cookie, or a special token embedded in the URL). Those downloads are still resumed using the same range requests as before – while the web server may decide that your URL has expired and prevent continuing the download at all, it has nothing to do with the actual resumption mechanism.

(And, sure, a website could serve the download entirely through a dynamic script. In this case it's up to the programmer whether they handle range requests or not. For example, when downloading a zipped folder from Google Drive, the .zip file is generated on the go; even its "total size" is unknown – in this case, the file likely won't be resumable at all.)

Solution 2

The short answer is yes it is HTTP/HTTPS.

However, I'd like to take your time to demonstrate why the longer answer matters especially to people who are interested in technology.

HTTP is nothing but a file transfer protocol. It is not special. HTTP cannot handle things other than files.

Images - they're just files. Javascript: just text files. Webpages: again, just text files. Videos are files. Even youtube videos are just a bunch of files (a single youtube video is split into hundreds of smaller files around 10 seconds in length so that you can rewind and forward the video, youtube videos are not single files - video downloaders will automatically join the files for you when saving).

The core of how HTTP works is really simple. Indeed it is stupidly simple and this simplicity (that all things are just files to download) is what made HTTP successful compared to the other networked multimedia/interactive protocols. Files, especially text files are something programmers understand.

The complicated bits added to HTTP to make the internet what it is today are added as metadata to the "files". Just like your files on disk have metadata such as file name, created date, ownership etc. files served by HTTP have metadata such as cookies, authorization information, last updated time etc.

Knowing this you should realize that there is nothing magical about the web especially HTTP. It just allows your browser to download files. It is how your browser interprets those files that adds the magic. Still, a http agent does not need to be a browser. You can write a program to download anything available via HTTP as long as you know how to craft the correct request. Indeed most people use curl and wget for this.

Share:
5,243

Related videos on Youtube

username128437855
Author by

username128437855

Updated on September 18, 2022

Comments

  • username128437855
    username128437855 over 1 year

    Say I download an executable like Pycharm from Jetbrains.com. HTTP was used to deliver contents of the website - is this also used when I download the file? I read that FTP was used but also saw it's been disabled for modern browsers - what is the recommended protocol?

    Also, if there is a network disruption, sometimes I can resume the download without losing progress. Is this because a "session" was created and I can rejoin the session and continue the download? What determines how long this period lasts for before I have to restart the download from scratch?

    • Giacomo1968
      Giacomo1968 about 3 years
      HTTP and HTTPS is now files are downloaded nowadays. No need for FTP anymore. If you download stuff from modern packages installers it is either HTTP or HTTPS or even SSH related protocols. Like perhaps SCP? But as I understand it SCP is not that great. So perhaps tools like Git uses a proprietary SSH-based protocol? But in general, FTP might exists for some cases but it’s never really a case of better or worse; it simply is.
    • user1686
      user1686 about 3 years
      @Giacomo1968: SSH does also have a standard file-transfer protocol -- SFTP. (Which is not related to FTP or FTPS in any way, it's specifically the SSH file transfer protocol.) Among "proprietary" protocols transported via SSH, Rsync is probably more common for one-off transfers than Git is.
    • Martheen
      Martheen about 3 years
      @Giacomo1968 Git can use HTTPS, SSH, and the very rarely used Git protocol. The combination of corporate firewalls blocking anything not HTTPS, Git over SSH not supporting anonymous access, and Git protocol itself lacking any authentication makes HTTPS more popular unless users are in an environment that already have or require SSH.
    • user1686
      user1686 about 3 years
      @Martheen: SSH can be configured to allow anonymous access; I've seen that used by the Solaris Hg repository. It's sshd-dependent but the protocol allows skipping the authentication steps. There are often more caveats associated with allowing it in OpenSSH though (you have to explicitly restrict unwanted things, vs CGI which only allows what you tell it to allow).
  • J. Shmoe
    J. Shmoe about 3 years
    Also, there is a reason for the dedicated data connection in FTP: it allows server-to-server transfers without forcing the data through the client or exposing credentials for one server to the other (like scp does).
  • iBug
    iBug about 3 years
    Note that if the server returns HTTP 200 for a range request, it means you're downloading the whole file again, or "failed partial". A "successful partial" response should be HTTP 206.
  • user253751
    user253751 about 3 years
    @SimonRichter how often do people actually use that feature? I thought it was a side effect of the design but not the main purpose.
  • Peter Cordes
    Peter Cordes about 3 years
    Perhaps worth mentioning that web browsers know to save to a file or render as a web page based on the Content-type: HTTP header. developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Ty‌​pe
  • user1686
    user1686 about 3 years
    @user253751, SimonRichter: I'm not sure whether that was the primary intent, although such a possibility was specifically called out as early as in RFC 454. But I think its origins have something to do with the early (pre-TCP/IP) ARPANET protocols design, where the FTP "control" connection was defined as a TELNET connection (probably to make use of the existing bidirectional socket support, NVT-ASCII definition, etc.) while the separate data socket didn't even need to be 8-bit bytes.
  • user253751
    user253751 about 3 years
    The separate data socket makes sense from a simplicity perspective, IMO. The OS already has the ability to multiplex several data streams (they are called sockets) so why reinvent the wheel? Only on the modern NATted Internet has the separate data socket become a problem.
  • Dev
    Dev about 3 years
    @user253751 Less to do with NAT and more so stateful firewalls in general. Unless the FTP connection was in PASV mode the server would attempt to open a new connection to the client. Unsolicited connections are going to commonly be blocked at edges regardless of NAT being in the mix.