best approach to design a rest web service with binary data to be consumed from the browser

15,249

Solution 1

My research results:

  1. Single request (data included)

    The request contains metadata. The data is a property of metadata and encoded (for example: Base64).

    Pros:

    • transactional
    • everytime valid (no missing metadata or data)

    Cons:

    • encoding makes the request very large

    Examples:

  2. Single request (multipart)

    The request contains one or more parts with metadata and data.

    Content types:

    Pros:

    • transactional
    • everytime valid (no missing metadata or data)

    Cons:

    • content type negotiation is complex
    • content type for data is not visible in WADL

    Examples:

    • Confluence (with parts for data and for metadata)
    • Jira (with one part for data, metadata only part headers for file name and mime type)
    • Bitbucket (with one part for data, no metadata)
    • Google Drive (with one part for metadata and one for part data)
  3. Single request (metadata in HTTP header and URL)

    The request body contains the data and the HTTP header and the URL contains the metadata.

    Pros:

    • transactional
    • everytime valid (no missing metadata or data)

    Cons:

    • no nested metadata possible

    Examples:

  4. Two request

    One request for metadata and one or more requests for data.

    Pros:

    • scalability (for example: data request could go to repository server)
    • resumable (see for example Google Drive)

    Cons:

    • not transactional
    • not everytime valid (before second request, one part is missing)

    Examples:

Solution 2

I can't think of any other approaches off the top of my head.

Of your 3 approaches, I've worked with method 3 the most. The biggest difference I see is between the first method and the other 2: Separating metadata and content into 2 resources

  • Pro: Scalability
    • while your solution involves posting to the same server, this can easily be changed to point the content upload to a separate server (i.e. Amazon S3)
    • In the first method, the same server that serves metadata to users will have a process blocked by a large upload.
  • Con: Orphaned Data/Added complexity
    • failed uploads (either metadata or content) will leave orphaned data in the server DB
    • Orphaned data can be cleaned up with a scheduled job, but this adds code complexity
    • Method II reduces the orphan possibilities, at the cost of longer client wait time as you're blocking on the response of the first POST

The first method seems the most straightforward to code. However, I'd only go with the first method if anticipate this service being used infrequently and you can set a reasonable limit on the user file uploads.

Solution 3

I believe the ultimate method is number 3 (separate resource) for the main reason that it allows maximizing the value I get from the HTTP standard, which matches how I think of REST APIs. For example, and assuming a well-grounded HTTP client is in the use, you get the following benefits:

  • Content compression: You optimize by allowing servers to respond with compressed result if clients indicate they support, your API is unchanged, existing clients continue to work, future clients can make use of it
  • Caching: If-Modified-Since, ETag, etc. Clients can advoid refetching the binary data altogether
  • Content type abstraction: For example, you require an uploaded image, it can be of types image/jpeg or image/png. The HTTP headers Accept and Content-type give us some elegant semantics for negotiating this between clients and servers without having to hardcode it all as part of our schema and/or API

On the other hand, I believe it's fair to conclude that this method is not the simplest if the binary data in question is not optional. In which case the Cons listed in Eric Hu's answer will come into play.

Share:
15,249
opensas
Author by

opensas

Updated on June 12, 2022

Comments

  • opensas
    opensas almost 2 years

    I'm developing a json rest web service that will be consumed from a single web page app built with backbone.js

    This API will let the consumer upload files related to some entity, like pdf reports related to a project

    Googling around and doing some research at stack overflow I came with these possible approaches:

    First approach: base64 encoded data field

    POST: /api/projects/234/reports
    {
      author: 'xxxx',
      abstract: 'xxxx',
      filename: 'xxxx',
      filesize: 222,
      content: '<base64 encoded binary data>'
    }
    

    Second approach: multipart form post:

    POST: /api/projects/234/reports
    {
      author: 'xxxx',
      abstract: 'xxxx',
    }
    

    as a response I'll get a report id, and with that I shall issue another post

    POST: /api/projects/234/reports/1/content
    enctype=multipart/form-data
    

    and then just send the binary data

    (have a look at this: https://stackoverflow.com/a/3938816/47633)

    Third approach: post the binary data to a separate resource and save the href

    first I generate a random key at the client and post the binary content there

    POST: /api/files/E4304205-29B7-48EE-A359-74250E19EFC4
    enctype=multipart/form-data
    

    and then

    POST: /api/projects/234/reports
    {
      author: 'xxxx',
      abstract: 'xxxx',
      filename: 'xxxx',
      filesize: 222,
      href: '/api/files/E4304205-29B7-48EE-A359-74250E19EFC4'
    }
    

    (see this: https://stackoverflow.com/a/4032079/47633)

    I just wanted to know if there's any other approach I could use, the pros/cons of each, and if there's any established way to deal with this kind of requirements

    the big con I see to the first approach, is that I have to fully load and base64 encode the file on the client

    some useful resources: