Consequences of POST not being idempotent (RESTful API)

17,517

Solution 1

The only advantage of POST-creation over PUT-creation is the server generation of IDs. I don't think it worths the lack of idempotency (and then the need for removing duplicates or empty objets).

Instead, I would use a PUT with a UUID in the URL. Owing to UUID generators you are nearly sure that the ID you generate client-side will be unique server-side.

Solution 2

well it all depends, to start with you should talk more about URIs, resources and representations and not be concerned about objects.

The POST Method is designed for non-idempotent requests, or requests with side affects, but it can be used for idempotent requests.

on POST of form data to /some_collection/

normalize the natural key of your data (Eg. "lowercase" the Title field for a blog post)
calculate a suitable hash value (Eg. simplest case is your normalized field value)
lookup resource by hash value
if none then
    generate a server identity, create resource
        Respond =>  "201 Created", "Location": "/some_collection/<new_id>" 
if found but no updates should be carried out due to app logic
        Respond => 302 Found/Moved Temporarily or 303 See Other 
        (client will need to GET that resource which might include fields required for updates, like version_numbers)
if found but updates may occur
   Respond => 307 Moved Temporarily, Location: /some_collection/<id> 
   (like a 302, but the client should use original http method and might do automatically) 

A suitable hash function might be as simple as some concatenated fields, or for large fields or values a truncated md5 function could be used. See [hash function] for more details2.

I've assumed you:

  • need a different identity value than a hash value
  • data fields used for identity can't be changed

Solution 3

Your method of generating ids at the server, in the application, in a dedicated request-response, is a very good one! Uniqueness is very important, but clients, like suitors, are going to keep repeating the request until they succeed, or until they get a failure they're willing to accept (unlikely). So you need to get uniqueness from somewhere, and you only have two options. Either the client, with a GUID as Aurélien suggests, or the server, as you suggest. I happen to like the server option. Seed columns in relational DBs are a readily available source of uniqueness with zero risk of collisions. Round 2000, I read an article advocating this solution called something like "Simple Reliable Messaging with HTTP", so this is an established approach to a real problem.

Reading REST stuff, you could be forgiven for thinking a bunch of teenagers had just inherited Elvis's mansion. They're excitedly discussing how to rearrange the furniture, and they're hysterical at the idea they might need to bring something from home. The use of POST is recommended because its there, without ever broaching the problems with non-idempotent requests.

In practice, you will likely want to make sure all unsafe requests to your api are idempotent, with the necessary exception of identity generation requests, which as you point out don't matter. Generating identities is cheap and unused ones are easily discarded. As a nod to REST, remember to get your new identity with a POST, so it's not cached and repeated all over the place.

Regarding the sterile debate about what idempotent means, I say it needs to be everything. Successive requests should generate no additional effects, and should receive the same response as the first processed request. To implement this, you will want to store all server responses so they can be replayed, and your ids will be identifying actions, not just resources. You'll be kicked out of Elvis's mansion, but you'll have a bombproof api.

Solution 4

But now you have two requests that can be lost? And the POST can still be repeated, creating another resource instance. Don't over-think stuff. Just have the batch process look for dupes. Possibly have some "access" count statistics on your resources to see which of the dupe candidates was the result of an abandoned post.

Another approach: screen incoming POST's against some log to see whether it is a repeat. Should be easy to find: if the body content of a request is the same as that of a request just x time ago, consider it a repeat. And you could check extra parameters like the originating IP, same authentication, ...

Solution 5

No matter what HTTP method you use, it is theoretically impossible to make an idempotent request without generating the unique identifier client-side, temporarily (as part of some request checking system) or as the permanent server id. An HTTP request being lost will not create a duplicate, though there is a concern that the request could succeed getting to the server but the response does not make it back to the client.

If the end client can easily delete duplicates and they don't cause inherent data conflicts it is probably not a big enough deal to develop an ad-hoc duplication prevention system. Use POST for the request and send the client back a 201 status in the HTTP header and the server-generated unique id in the body of the response. If you have data that shows duplications are a frequent occurrence or any duplicate causes significant problems, I would use PUT and create the unique id client-side. Use the client created id as the database id - there is no advantage to creating an additional unique id on the server.

Share:
17,517
mibollma
Author by

mibollma

Updated on June 08, 2022

Comments

  • mibollma
    mibollma almost 2 years

    I am wondering if my current approach makes sense or if there is a better way to do it.

    I have multiple situations where I want to create new objects and let the server assign an ID to those objects. Sending a POST request appears to be the most appropriate way to do that. However since POST is not idempotent the request may get lost and sending it again may create a second object. Also requests being lost might be quite common since the API is often accessed through mobile networks.

    As a result I decided to split the whole thing into a two-step process:

    1. First sending a POST request to create a new object which returns the URI of the new object in the Location header.

    2. Secondly performing an idempotent PUT request to the supplied Location to populate the new object with data. If a new object is not populated within 24 hours the server may delete it through some kind of batch job.

    Does that sound reasonable or is there a better approach?