RESTful APIs must be stateless, but what about concurrency?

api rest concurrency stateless

14,531

Solution 1

There are two basic approaches you can take:

Go completely stateless, and adopt a "last request wins" strategy. As odd as it might sound, it's likely the cleanest solution in terms of predictability, scalability, code complexity and implementation on both client and server sides. There's also plenty of precedence for it: look at how sites like Google paginate through queries using a start=10 for page 2, start=20 for page 3, etc.

You might find that the content changes within pages as you navigate back and forth between them, but so what? You're always getting the latest information, and Google can handle your requests on any of their many servers without having to find your session information to determine what your last query context was.

The biggest advantage to this approach is the simplicity of your server's implementation. Each request can just pass right through to the data layer at the back-end, and it's absolutely ripe for caching at both the HTTP level (via E-Tags or Last-Modified headers) and the server side (using something like memcache, for example).
Go stateful, and figure out a way to have your servers dole out some kind of per-client lock or token for each API "session". This will be like trying to fight the ocean's tide with a stick, because you'll end up failing and frustrated.

How will you identify clients? Session keys? IP address? File descriptor for the socket they rode in on (good luck with that if you're using a transport like HTTP where the connection can be closed between requests...)? The details you choose for this will have to be persisted on the server side, or you'll have to use some nasty old sticky session feature on your app server (and if so, heaven help your client if the server they are using goes down mid-session).

How will you handle API clients that disappear ungracefully? Will you timeout their session locks automatically by having a reaper thread clean up idle ones? That's more code, more complexity and more places for bugs to hide. What about API clients that come back from a long idle time and try to re-use an expired lock, how should client applications be built to handle that situation?

I could go on, but hopefully you can see my point. Go with option 1, and go stateless. Otherwise you'll end up trying to track client state on the server side. And the only thing that should track a client's state is the client itself.

Solution 2

It is okay to maintain resource state. The "stateless prohibition" just refers to session state.

Here's an excerpt from Roy Fielding's seminal REST derivation:

We next add a constraint to the client-server interaction: communication must be stateless in nature, as in the client-stateless-server (CSS) style of Section 3.4.3 (Figure 5-3), such that each request from client to server must contain all of the information necessary to understand the request, and cannot take advantage of any stored context on the server. Session state is therefore kept entirely on the client.

14,531

Author by

M. Herold

Updated on July 20, 2022

Comments

M. Herold almost 2 years

I'm curious how I solve the concurrency issue for a RESTful API. More specifically, I have a collection of objects that need manual examination and update, e.g. a number of rows that need a column updated by hand; however, if I open up the API to a number of clients, they will all be grabbing these items from the top down, so many users will be filling the column of the same row at the same time. I'd prefer to not have collisions, and the simple, stateful way is to just dump items into a queue on the service and pop them off as people request them.

What is the stateless version of this? Hash by IP address, or randomly grab rows based on id?

:: update ::

"Hrm, so it must simply be stateless from the perspective of the client?

That certainly makes a lot of sense. I was just reading an article (ibm.com/developerworks/webservices/library/ws-restful) about RESTful APIs, and after encountering the bit about paging, I was worried that my quite stateful queue was similar to incrementing by a page, but they're actually quite different as "next page" is relative on the client side, whereas "pop" is always stateless for the client: It doesn't matter what was popped before.

Thanks for clearing my head!" -Me
Christian Gosch over 9 years

I do not think that accepting content changes while browsing result set pages is the one-and-only way of thinking. You may see double entries -- OK (and client could handle this anyway), but you may also miss entries totally due to removals in the scope of past / previous pages -- not nice. For Google search results that may not matter, but in other contexts it may.
Christian Gosch over 9 years

As an alternative one can fetch "all IDs" on the client (OK, for google search results that is not feasible most the time) and walk through this list page by page, loading more content when reaching the result set page with that ID sub-set. The bad news are, that the problem comes back just round the corner: There may be items meanwhile deleted (which cannot be loaded any more -- must be catched somehow), and there may be new items added meanwhile which you do not see by browsing your current result set. Thus the question is: What is more important to you?
Christian Gosch over 9 years

If you do not care about missing or double items while browsing your result set page by page back and forth, then you will be fine with putting the page number or index range into the URL as a parameter -- burdening the service with loading and throwing away all items which would have been placed on any page before. (How does Google handle this? Anyone tried an absurd query and initially specifying a high page number, like "query=millionaire&page=59786"?)
Christian Gosch over 9 years

If, on the other hand, you want stable result sets during page browsing while delivering fresh detail data, you should cache at least the query result's ID set (if feasible) and walk through this set while browsing pages. You will have to deal with meanwhile removed items however, and you will not see items added before reaching the page where they "would have appeared when the query were asked later". -- There is some price to pay anyway, as it seems.
Christian Gosch over 9 years

By the way this approach should be considerable as "RESTful" also since there is no state managed on the server side. If you consider the item IDs as equivalent to hyperlinks pointing to any single item's resource location, then it is hypermedia-style also (and would be perfect so, if the ID set is in fact an ID resource locator set, but then more bandwidth and client-side memory is needed).