Transactions across REST microservices?

rest architecture transactions microservices

83,914

Solution 1

What doesn't make sense:

distributed transactions with REST services. REST services by definition are stateless, so they should not be participants in a transactional boundary that spans more than one service. Your user registration use case scenario makes sense, but the design with REST microservices to create User and Wallet data is not good.

What will give you headaches:

EJBs with distributed transactions. It's one of those things that work in theory but not in practice. Right now I'm trying to make a distributed transaction work for remote EJBs across JBoss EAP 6.3 instances. We've been talking to RedHat support for weeks, and it didn't work yet.
Two-phase commit solutions in general. I think the 2PC protocol is a great algorithm (many years ago I implemented it in C with RPC). It requires comprehensive fail recovery mechanisms, with retries, state repository, etc. All the complexity is hidden within the transaction framework (ex.: JBoss Arjuna). However, 2PC is not fail proof. There are situations the transaction simply can't complete. Then you need to identify and fix database inconsistencies manually. It may happen once in a million transactions if you're lucky, but it may happen once in every 100 transactions depending on your platform and scenario.
Sagas (Compensating transactions). There's the implementation overhead of creating the compensating operations, and the coordination mechanism to activate compensation at the end. But compensation is not fail proof either. You may still end up with inconsistencies (= some headache).

What's probably the best alternative:

Eventual consistency. Neither ACID-like distributed transactions nor compensating transactions are fail proof, and both may lead to inconsistencies. Eventual consistency is often better than "occasional inconsistency". There are different design solutions, such as:
- You may create a more robust solution using asynchronous communication. In your scenario, when Bob registers, the API gateway could send a message to a NewUser queue, and right-away reply to the user saying "You'll receive an email to confirm the account creation." A queue consumer service could process the message, perform the database changes in a single transaction, and send the email to Bob to notify the account creation.
- The User microservice creates the user record and a wallet record in the same database. In this case, the wallet store in the User microservice is a replica of the master wallet store only visible to the Wallet microservice. There's a data synchronization mechanism that is trigger-based or kicks in periodically to send data changes (e.g., new wallets) from the replica to the master, and vice-versa.

But what if you need synchronous responses?

Remodel the microservices. If the solution with the queue doesn't work because the service consumer needs a response right away, then I'd rather remodel the User and Wallet functionality to be collocated in the same service (or at least in the same VM to avoid distributed transactions). Yes, it's a step farther from microservices and closer to a monolith, but will save you from some headache.

Solution 2

This is a classic question I was asked during an interview recently How to call multiple web services and still preserve some kind of error handling in the middle of the task. Today, in high performance computing, we avoid two phase commits. I read a paper many years ago about what was called the "Starbuck model" for transactions: Think about the process of ordering, paying, preparing and receiving the coffee you order at Starbuck... I oversimplify things but a two phase commit model would suggest that the whole process would be a single wrapping transaction for all the steps involved until you receive your coffee. However, with this model, all employees would wait and stop working until you get your coffee. You see the picture ?

Instead, the "Starbuck model" is more productive by following the "best effort" model and compensating for errors in the process. First, they make sure that you pay! Then, there are message queues with your order attached to the cup. If something goes wrong in the process, like you did not get your coffee, it is not what you ordered, etc, we enter into the compensation process and we make sure you get what you want or refund you, This is the most efficient model for increased productivity.

Sometimes, starbuck is wasting a coffee but the overall process is efficient. There are other tricks to think when you build your web services like designing them in a way that they can be called any number of times and still provide the same end result. So, my recommendation is:

Don't be too fine when defining your web services (I am not convinced about the micro-service hype happening these days: too many risks of going too far);
Async increases performance so prefer being async, send notifications by email whenever possible.
Build more intelligent services to make them "recallable" any number of times, processing with an uid or taskid that will follow the order bottom-top until the end, validating business rules in each step;
Use message queues (JMS or others) and divert to error handling processors that will apply operations to "rollback" by applying opposite operations, by the way, working with async order will require some sort of queue to validate the current state of the process, so consider that;
In last resort, (since it may not happen often), put it in a queue for manual processing of errors.

Let's go back with the initial problem that was posted. Create an account and create a wallet and make sure everything was done.

Let's say a web service is called to orchestrate the whole operation.

Pseudo code of the web service would look like this:

Call Account creation microservice, pass it some information and a some unique task id 1.1 Account creation microservice will first check if that account was already created. A task id is associated with the account's record. The microservice detects that the account does not exist so it creates it and stores the task id. NOTE: this service can be called 2000 times, it will always perform the same result. The service answers with a "receipt that contains minimal information to perform an undo operation if required".
Call Wallet creation, giving it the account ID and task id. Let's say a condition is not valid and the wallet creation cannot be performed. The call returns with an error but nothing was created.
The orchestrator is informed of the error. It knows it needs to abort the Account creation but it will not do it itself. It will ask the wallet service to do it by passing its "minimal undo receipt" received at the end of step 1.
The Account service reads the undo receipt and knows how to undo the operation; the undo receipt may even include information about another microservice it could have called itself to do part of the job. In this situation, the undo receipt could contain the Account ID and possibly some extra information required to perform the opposite operation. In our case, to simplify things, let's say is simply delete the account using its account id.
Now, let's say the web service never received the success or failure (in this case) that the Account creation's undo was performed. It will simply call the Account's undo service again. And this service should normaly never fail because its goal is for the account to no longer exist. So it checks if it exists and sees nothing can be done to undo it. So it returns that the operation is a success.
The web service returns to the user that the account could not be created.

This is a synchronous example. We could have managed it in a different way and put the case into a message queue targeted to the help desk if we don't want the system to completly recover the error". I've seen this being performed in a company where not enough hooks could be provided to the back end system to correct situations. The help desk received messages containing what was performed successfully and had enough information to fix things just like our undo receipt could be used for in a fully automated way.

I have performed a search and the microsoft web site has a pattern description for this approach. It is called the compensating transaction pattern:

Compensating transaction pattern

Solution 3

All distributed systems have trouble with transactional consistency. The best way to do this is like you said, have a two-phase commit. Have the wallet and the user be created in a pending state. After it is created, make a separate call to activate the user.

This last call should be safely repeatable (in case your connection drops).

This will necessitate that the last call know about both tables (so that it can be done in a single JDBC transaction).

Alternatively, you might want to think about why you are so worried about a user without a wallet. Do you believe this will cause a problem? If so, maybe having those as separate rest calls are a bad idea. If a user shouldn't exist without a wallet, then you should probably add the wallet to the user (in the original POST call to create the user).

Solution 4

IMHO one of the key aspects of microservices architecture is that the transaction is confined to the individual microservice (Single responsibility principle).

In the current example, the User creation would be an own transaction. User creation would push a USER_CREATED event into an event queue. Wallet service would subscribe to the USER_CREATED event and do the Wallet creation.

Solution 5

If my wallet was just another bunch of records in the same sql database as the user then I would probably place the user and wallet creation code in the same service and handle that using the normal database transaction facilities.

It sounds to me you are asking about what happens when the wallet creation code requires you touch another other system or systems? Id say it all depends on how complex and or risky the creation process is.

If it's just a matter of touching another reliable datastore (say one that can't participate in your sql transactions), then depending on the overall system parameters, I might be willing to risk the vanishingly small chance that second write won't happen. I might do nothing, but raise an exception and deal with the inconsistent data via a compensating transaction or even some ad-hoc method. As I always tell my developers: "if this sort of thing is happening in the app, it won't go unnoticed".

As the complexity and risk of wallet creation increases you must take steps to ameliorate the risks involved. Let's say some of the steps require calling multiple partner apis.

At this point you might introduce a message queue along with the notion of partially constructed users and/or wallets.

A simple and effective strategy for making sure your entities eventually get constructed properly is to have the jobs retry until they succeed, but a lot depends on the use cases for your application.

I would also think long and hard about why I had a failure prone step in my provisioning process.

View more solutions

83,914

Olivier Lalonde

Updated on July 19, 2020

Comments

Olivier Lalonde almost 4 years
Let's say we have a User, Wallet REST microservices and an API gateway that glues things together. When Bob registers on our website, our API gateway needs to create a user through the User microservice and a wallet through the Wallet microservice.

Now here are a few scenarios where things could go wrong:
- User Bob creation fails: that's OK, we just return an error message to the Bob. We're using SQL transactions so no one ever saw Bob in the system. Everything's good :)
- User Bob is created but before our Wallet can be created, our API gateway hard crashes. We now have a User with no wallet (inconsistent data).
- User Bob is created and as we are creating the Wallet, the HTTP connection drops. The wallet creation might have succeeded or it might have not.
What solutions are available to prevent this kind of data inconsistency from happening? Are there patterns that allow transactions to span multiple REST requests? I've read the Wikipedia page on Two-phase commit which seems to touch on this issue but I'm not sure how to apply it in practice. This Atomic Distributed Transactions: a RESTful design paper also seems interesting although I haven't read it yet.

Alternatively, I know REST might just not be suited for this use case. Would perhaps the correct way to handle this situation to drop REST entirely and use a different communication protocol like a message queue system? Or should I enforce consistency in my application code (for example, by having a background job that detects inconsistencies and fixes them or by having a "state" attribute on my User model with "creating", "created" values, etc.)?
- Olivier Lalonde almost 9 years
  
  Interesting link: news.ycombinator.com/item?id=7995130
- Vladislav Rastrusny almost 9 years
  
  If a user doesn't make sense without a wallet, why to create a separate microservice for it? May be something is not right with the architecture in the first place? Why do you need a generic API gateway, btw? Is there any specific reason for it?
- Olivier Lalonde about 7 years
  
  @VladislavRastrusny it was a fictional example, but you could think of the wallet service as being handled by Stripe for example.
- andrew pate about 6 years
  
  You could use a process manager to track the transaction (process manager pattern) or have each microservice know how to trigger a rollback (saga manager pattern) or do some sort of two phase commit (blog.aspiresys.com/software-product-engineering/producteeri‌ng/…)
- Nik almost 6 years
  
  @VladislavRastrusny "If a user doesn't make sense without a wallet, why to create a separate microservice for it" -- for example, apart from the fact a User cannot exist without a Wallet they don't have any code in common. So two teams are going to develop and deploy User and Wallet microservices independently. Isn't it the whole point of doing microservices in the first place?
- C.P. over 4 years
  
  @OlivierLalonde - Fast forward to 2019...How did you handled this problem eventually? Whats the best way/solution? It would be helpful if you could write answer to this great question.
Olivier Lalonde almost 9 years

Thanks for the suggestion. The User/Wallet services were fictional, just to illustrate the point. But I agree that I should design the system as to avoid the need for transactions as much as possible.
Sattar Imamov almost 9 years

I agree with second point of view. It seems, what your microservice, which create user, should also create a wallet, because this operation represents atomic unit of work. Also, you can read this eaipatterns.com/docs/IEEE_Software_Design_2PC.pdf
Roman Kharkovski over 7 years

Assuming we want to avoid any and all 2PC, and assuming that the User service writes into a database, then we can't make push of the message into an event queue by the User to be transactional, which means it may never make it to the Wallet service.
jwg almost 7 years

Do you think you could expand on this answer to provide more specific advice to the OP. As it stands, this answer is somewhat vague and hard to understand. Although I understand how coffee is served at Starbucks, it's unclear to me what aspects of this system should be emulated in REST services.
user8098437 almost 7 years

I have added an example related to the case initially provided in the original post.
user8098437 almost 7 years

Just added a link to the compensating transaction pattern as described by Microsoft.
Ram Bavireddi almost 7 years

Eventual consistency worked for me. In this case "NewUser" queue should be high available and resilient.
v.oddou about 6 years

@RamBavireddi do Kafka or RabbitMQ support resilient queues ?
Ram Bavireddi about 6 years

@v.oddou Yes, they do.
balsick almost 6 years

@PauloMerson I'm not sure on how you differ Compensating transactions to eventual consistency. What if, in your eventual consistency, the creation of the wallet fails?
Paulo Merson almost 6 years

@balsick One of the challenges of eventual consistency settings is increased design complexity. Consistency checks and correction events are often required. The design of the solution varies. In the answer, I suggest the situation where the Wallet record is created in the database when processing a message sent via a message broker. In this case, we could set a Dead Letter Channel, that is, if processing that message generates an error, we can send the message to a dead letter queue and notify the team responsible for "Wallet".
Timo almost 6 years

This is actually a great idea. Undos are a headache. But creating something in a pending state is much less invasive. Any checks have been performed, but nothing definitive is created yet. Now we only need to activate the created components. We can probably even do that non-transactionally.
Timo almost 6 years

@RomanKharkovski An important point indeed. One way to tackle it might be to start a transaction, save the User, publish the event (not part of the transaction), and then commit the transaction. (Worst case, highly unlikely, the commit fails, and those responding to the event will be unable to find the user.)
Carmine Ingaldi over 5 years

@PauloMerson "they [REST services] should not be participants in a transactional boundary that spans more than one service" DO you mean that each microservice should complete the transaction without interact with others? This seems a violation of SRP (and DRY) because only the wallet service should know how to instantiate a wallet, and User service that encapsulates User entity, knows only the wallet Id, if any, and a data contract exposed from Wallet Service; this, though, mostly depends from relationship beetween users and wallets....maybe is just the example that is wrong
Yan Khonski almost 5 years

Then store the event into the database as well as the entity. Have a scheduled job to process stored events and send them to the message broker. stackoverflow.com/a/52216427/4587961
Anmol Singh Jaggi about 4 years

Note that compensating transactions might be outright impossible in certain complex scenarios (as brilliantly highlighted in the microsoft docs). In this example, imagine before the wallet creation could fail, someone could read the details about the associated account by doing a GET call on the Account service, which ideally shouldn't exist in the first place since account creation had failed. This can lead to data inconsistency. This isolation problem is well-known in the SAGAS pattern.
AjayLohani almost 4 years

APIM is the process of creating and publishing web application. I am not able to understand how it can help here. Can you explain?
David Prifti over 3 years

Reading through your answer I imagine that the "Undo" recipe involves delete operations on the newly added record. But what if the "Undo" operations fail themselves? Then data in the User database would remain inconsistent for some time until it's deleted.
harshit2811 about 3 years

if wallet creation is failed and if there is a requirement to remove the user (with no wallet) then what's your approach? wallet should send WALLET_CREATE_FAILED event into separate queue which user services will consumer and remove user?
DubZ about 2 years

thanks for the word compensating transaction. That describes what I was thinking around the last days for an issue I want to solve. There are some critics like the possibility that in the time there could be some unwanted read action. Question: any cons about a mix of compensating transaction and two phase commit (without transactions). Imagine I want to insert in DB with a column active=false. When I run into an error, the requester requests a rollback, otherwise a commit (update active=true).