Upload large files to S3 with resume support

18,953

Update 20150527

The meanwhile available AWS SDK for JavaScript (in the Browser) supports Amazon S3, including a class ManagedUpload to support the multipart upload aspects of the use case at hand (see preceding update for more on this). It might now be the best solution for your scenario accordingly, see e.g. Uploading a local file using the File API for a concise example that uses the HTML5 File API in turn - the introductory blog post Announcing the Amazon S3 Managed Uploader in the AWS SDK for JavaScript provides more details about this SDK feature.

Update 20120412

My initial answer apparently missed the main point, so to clarify:

If you want to do browser based upload via simple HTML forms, you are constrained to using the POST Object operation, which adds an object to a specified bucket using HTML forms:

POST is an alternate form of PUT that enables browser-based uploads as a way of putting objects in buckets. Parameters that are passed to PUT via HTTP Headers are instead passed as form fields to POST in the multipart/form-data encoded message body. [...]

The upload is handled in a single operation here, thus doesn't support pause/resume and limits you to the original maximum object size of 5 gigabytes (GB) or less.

You can only overcome both limitations by Using the REST API for Multipart Upload instead, which is in turn used by SDKs like the AWS SDK for PHP to implement this functionality.

This obviously requires a server (e.g. on EC2) to handle the operation initiated via the browser (which allows you to facilitate S3 Bucket Policies and/or IAM Policies for access control easily as well).

The one alternative might be using a JavaScript library and performing this client side, see e.g. jQuery Upload Progress and AJAX file upload for an initial pointer. Unfortunately there is no canonical JavaScript SDK for AWS available (aws-lib surprisingly doesn't even support S3 yet) - apparently some forks of knox have added multipart upload, see e.g. slakis's fork, I haven't used either of these for the use case at hand though.


Initial Answer

If it's possible to upload [large files] directly to S3, how can I handle pause/resume?

The AWS SDK for PHP supports uploading large files to Amazon S3 by means of the Low-Level PHP API for Multipart Upload:

The AWS SDK for PHP exposes a low-level API that closely resembles the Amazon S3 REST API for multipart upload (see Using the REST API for Multipart Upload ). Use the low-level API when you need to pause and resume multipart uploads, vary part sizes during the upload, or do not know the size of the data in advance. Use the high-level API (see Using the High-Level PHP API for Multipart Upload) whenever you don't have these requirements. [emphasis mine]

Amazon S3 can handle objects from 1 byte all the way to 5 terabytes (TB), see the respective introductory post Amazon S3 - Object Size Limit Now 5 TB:

[...] Now customers can store extremely large files as single objects, which greatly simplifies their storage experience. Amazon S3 does the bookkeeping behind the scenes for our customers, so you can now GET that large object just like you would any other Amazon S3 object.

In order to store larger objects you would use the new Multipart Upload API that I blogged about last month to upload the object in parts. [...]

Share:
18,953
style-sheets
Author by

style-sheets

Updated on July 07, 2022

Comments

  • style-sheets
    style-sheets almost 2 years

    (I'm new to Amazon AWS/S3, so please bear with me)

    My ultimate goal is to allow my users to upload files to S3 using their web browser, my requirements are:

    1. I must handle large files (2GB+)
    2. I must support pause/resume with progress indicator
    3. (Optional but desirable!) Ability to resume upload if connection temporarily drops out

    My two-part question is:

    • I've read about the S3 multipart upload but it's not clear how can I implement the pause/resume for webbrowser-based uploads.

    Is it even possible to do this for large files? If so how?

    • Should I upload files to EC2 then move them to S3 once I'm done? Can I (securely) upload files directly to S3 instead of using a temp. webserver?

    If it's possible to upload directly to S3, how can I handle pause/resume?

    PS. I'm using PHP 5.2+

  • Alfred Godoy
    Alfred Godoy about 12 years
    But would it be possible to port this multipart stuff into javascript (or flash/actionscript) and do it in the browser, without giving away aws credentials?
  • style-sheets
    style-sheets about 12 years
    Thank you Steffen, but my understanding is that the low level doesn't allow to pass from client to S3 directly (without web server), at least that's what the PHP example shows if I'm correct...am I missing something here?
  • Steffen Opel
    Steffen Opel about 12 years
    @style-sheets: I've apparently missed the main point in my initial answer and updated it accordingly now, sorry for being misleading!
  • Steffen Opel
    Steffen Opel about 12 years
    @AlfredGodoy: It should work in principle, insofar the REST API supports pre-signed URLs for GET, PUT and DELETE; I haven't tried this myself yet for the use case at hand though.
  • style-sheets
    style-sheets about 12 years
    @SteffenOpel: a lot of good info, thank you! Just to make sure, by saying This requires a server (e.g. on EC2) to handle the operation initiated via the browser; do you mean executing the PHP code or uploading the temporary file? Obviously I'm going to use PHP anyway, what I'd like to avoid is the upload to EC2 and then to S3
  • Steffen Opel
    Steffen Opel about 12 years
    @style-sheets: There is no way to avoid this other than exploring a client side JavaScript solution using the S3 REST API directly; I don't think it is much of a problem cost/performance wise, insofar EC2 to S3 connections are rather fast and free within one region. Obviously this approach offloads the pause/resume problem to HTML forms though, which again requires JavaScript as well as modern browsers supporting the File API - maybe How to resume a paused or broken file upload can get you started in case.
  • Geoff Appleford
    Geoff Appleford about 12 years
    @style-sheets I believe you can accomplish this using a browser plugin like flash, silverlight or java and directly using the REST API. I currently use a silverlight plugin to upload large files (up to 5GB) directly to S3. I haven't implement pause/resume don't use the S3 large file support but it should be possible. Using a plugin is the only way to achieve broad browser coverage. Checkout this SO thread stackoverflow.com/questions/478799/…. There are lots of links there to various free and not free plugins.
  • ipegasus
    ipegasus almost 12 years
    If that is the case, Is there a plugin or demo app with pause/resume functionality available?
  • Alex
    Alex almost 9 years
    There is an official JS SDK today. Also, there is an intelligent multipart upload API available.
  • Steffen Opel
    Steffen Opel almost 9 years
    @Alex - thanks for the bump/pointer, I have updated my answer accordingly.