Proper way to implement RESTful large file upload

37,156

Solution 1

I would recommend taking a look at the Amazon S3 Rest API's solution to multipart file upload. The documentation can be found here.

To summarize the procedure Amazon uses:

  1. The client sends a request to initiate a multipart upload, the API responds with an upload id

  2. The client uploads each file chunk with a part number (to maintain ordering of the file), the size of the part, the md5 hash of the part and the upload id; each of these requests is a separate HTTP request. The API validates the chunk by checking the md5 hash received chunk against the md5 hash the client supplied and the size of the chunk matches the size the client supplied. The API responds with a tag (unique id) for the chunk. If you deploy your API across multiple locations you will need to consider how to store the chunks and later access them in a way that is location transparent.

  3. The client issues a request to complete the upload which contains a list of each chunk number and the associated chunk tag (unique id) received from API. The API validates there are no missing chunks and that the chunk numbers match the correct chunk tag and then assembles the file or returns an error response.

Amazon also supplies methods to abort the upload and list the chunks associated with the upload. You may also want to consider a timeout for the upload request in which the chunks are destroyed if the upload is not completed within a certain amount of time.

In terms of controlling the chunk sizes that the client uploads, you won't have much control over how the client decides to split up the upload. You could consider having a maximum chunk size configured for the upload and supply error responses for requests that contain chunks larger than the max size.

I've found the procedure works very well for handling large file uploads in REST APIs and facilitates the handling of the many edge cases associated with file upload. Unfortunately, I've yet to find a library that makes this easy to implement in any language so you pretty much have to write all of the logic yourself.

Solution 2

https://tus.io/ is resumable protocol which helps in chunk uploading and resuming the upload after timeout. This is a opensource implementation and has various client and server implementations already in different languages.

Share:
37,156
Aleksandar Stojadinovic
Author by

Aleksandar Stojadinovic

Updated on July 04, 2020

Comments

  • Aleksandar Stojadinovic
    Aleksandar Stojadinovic almost 4 years

    I've been making REST APIs for some time now, and I'm still bugged with one case - large file upload. I've read a couple of other APIs, like Google Drive, Twitter and other literature, and I got two ideas, but I'm not sure is any of them "proper". As in proper, I mean it is somewhat standardized, there is not too much client logic needed (since other parties will be implementing that client), or even better, it could be easily called with cURL. The plan is to implement it in Java, preferably Play Framework.

    Obviously I'll need some file partitioning and server-side buffering mechanism since the files are large.

    So, the first solution I've got is a multipart upload (multipart/form-data). I get this way and I have implemented it like this before, but it is always strange to me to actually emulate a form on the client side, especially since the client has to set the file key name, and in my experience, that is something that clients kinda forget or do not understand. Also, how is the chunk size/part size dictated? What keeps the client from putting the whole file in one chunk?

    Solution two, at least what I understood, but without finding an actual implementation implementation is that a "regular" POST request can work. The content should be chunked and data is buffered on the on the server side. However, I am not sure this is a proper understanding. How is data actually chunked, does the upload span multiple HTTP requests or is it chunked on the TCP level? What is the Content-Type?

    Bottom line, what of these two (or anything else?) should be a client-friendly, widely understandable, way of implementing a REST API for file upload?

  • Pronoy999
    Pronoy999 about 3 years
    I was thinking of doing the same thing, but I was wondering that for the last API where the server is merging the file, won't it take too long and won't the HTTP connection be timed out?
  • crawfobw
    crawfobw about 3 years
    Good point. Probably best to just do the chunk validation and return 202 while assembly continues.