best approach to design a rest web service with binary data to be consumed from the browser

rest backbone.js binary-data restful-architecture

15,249

Solution 1

My research results:

Single request (data included)

The request contains metadata. The data is a property of metadata and encoded (for example: Base64).

Pros:
- transactional
- everytime valid (no missing metadata or data)
Cons:
- encoding makes the request very large
Examples:
- Twitter
- GitHub
- Imgur
Single request (multipart)

The request contains one or more parts with metadata and data.

Content types:
Pros:
- transactional
- everytime valid (no missing metadata or data)
Cons:
- content type negotiation is complex
- content type for data is not visible in WADL
Examples:
- Confluence (with parts for data and for metadata)
- Jira (with one part for data, metadata only part headers for file name and mime type)
- Bitbucket (with one part for data, no metadata)
- Google Drive (with one part for metadata and one for part data)
Single request (metadata in HTTP header and URL)

The request body contains the data and the HTTP header and the URL contains the metadata.

Pros:
- transactional
- everytime valid (no missing metadata or data)
Cons:
- no nested metadata possible
Examples:
- S3 GetObject and PutObject
Two request

One request for metadata and one or more requests for data.

Pros:
- scalability (for example: data request could go to repository server)
- resumable (see for example Google Drive)
Cons:
- not transactional
- not everytime valid (before second request, one part is missing)
Examples:
- Google Drive
- YouTube

Solution 2

I can't think of any other approaches off the top of my head.

Of your 3 approaches, I've worked with method 3 the most. The biggest difference I see is between the first method and the other 2: Separating metadata and content into 2 resources

Pro: Scalability
- while your solution involves posting to the same server, this can easily be changed to point the content upload to a separate server (i.e. Amazon S3)
- In the first method, the same server that serves metadata to users will have a process blocked by a large upload.
Con: Orphaned Data/Added complexity
- failed uploads (either metadata or content) will leave orphaned data in the server DB
- Orphaned data can be cleaned up with a scheduled job, but this adds code complexity
- Method II reduces the orphan possibilities, at the cost of longer client wait time as you're blocking on the response of the first POST

The first method seems the most straightforward to code. However, I'd only go with the first method if anticipate this service being used infrequently and you can set a reasonable limit on the user file uploads.

Solution 3

I believe the ultimate method is number 3 (separate resource) for the main reason that it allows maximizing the value I get from the HTTP standard, which matches how I think of REST APIs. For example, and assuming a well-grounded HTTP client is in the use, you get the following benefits:

Content compression: You optimize by allowing servers to respond with compressed result if clients indicate they support, your API is unchanged, existing clients continue to work, future clients can make use of it
Caching: If-Modified-Since, ETag, etc. Clients can advoid refetching the binary data altogether
Content type abstraction: For example, you require an uploaded image, it can be of types image/jpeg or image/png. The HTTP headers Accept and Content-type give us some elegant semantics for negotiating this between clients and servers without having to hardcode it all as part of our schema and/or API

On the other hand, I believe it's fair to conclude that this method is not the simplest if the binary data in question is not optional. In which case the Cons listed in Eric Hu's answer will come into play.

15,249

Author by

opensas

Updated on June 12, 2022

Comments

opensas almost 2 years
I'm developing a json rest web service that will be consumed from a single web page app built with backbone.js

This API will let the consumer upload files related to some entity, like pdf reports related to a project

Googling around and doing some research at stack overflow I came with these possible approaches:

First approach: base64 encoded data field
```
POST: /api/projects/234/reports
{
  author: 'xxxx',
  abstract: 'xxxx',
  filename: 'xxxx',
  filesize: 222,
  content: '<base64 encoded binary data>'
}
```
Second approach: multipart form post:
```
POST: /api/projects/234/reports
{
  author: 'xxxx',
  abstract: 'xxxx',
}
```
as a response I'll get a report id, and with that I shall issue another post
```
POST: /api/projects/234/reports/1/content
enctype=multipart/form-data
```
and then just send the binary data

(have a look at this: https://stackoverflow.com/a/3938816/47633)

Third approach: post the binary data to a separate resource and save the href

first I generate a random key at the client and post the binary content there
```
POST: /api/files/E4304205-29B7-48EE-A359-74250E19EFC4
enctype=multipart/form-data
```
and then
```
POST: /api/projects/234/reports
{
  author: 'xxxx',
  abstract: 'xxxx',
  filename: 'xxxx',
  filesize: 222,
  href: '/api/files/E4304205-29B7-48EE-A359-74250E19EFC4'
}
```
(see this: https://stackoverflow.com/a/4032079/47633)

I just wanted to know if there's any other approach I could use, the pros/cons of each, and if there's any established way to deal with this kind of requirements

the big con I see to the first approach, is that I have to fully load and base64 encode the file on the client

some useful resources: