What is Bulkhead Pattern used by Hystrix?

40,061

Solution 1

General

In general, the goal of the bulkhead pattern is to avoid faults in one part of a system to take the entire system down. The term comes from ships where a ship is divided in separate watertight compartments to avoid a single hull breach to flood the entire ship; it will only flood one bulkhead.

Implementations of the bulkhead pattern can take many forms depending on what kind of faults you want to protect the system from. I will only discuss the type of faults Hystrix handles in this answer.

I think the bulkhead pattern was popularized by the book Release It! by Michael T. Nygard.

What Hystrix Solves

The bulkhead implementation in Hystrix limits the number of concurrent calls to a component. This way, the number of resources (typically threads) that is waiting for a reply from the component is limited.

Assume you have a request based, multi threaded application (for example a typical web application) that uses three different components, A, B, and C. If requests to component C starts to hang, eventually all request handling threads will hang on waiting for an answer from C. This would make the application entirely non-responsive. If requests to C is handled slowly we have a similar problem if the load is high enough.

Hystrix' implementation of the bulkhead pattern limits the number of concurrent calls to a component and would have saved the application in this case. Assume we have 30 request handling threads and there is a limit of 10 concurrent calls to C. Then at most 10 request handling threads can hang when calling C, the other 20 threads can still handle requests and use components A and B.

Hystrix' approaches

Hystrix' has two different approaches to the bulkhead, thread isolation and semaphore isolation.

Thread Isolation

The standard approach is to hand over all requests to component C to a separate thread pool with a fixed number of threads and no (or a small) request queue.

Semaphore Isolation

The other approach is to have all callers acquire a permit (with 0 timeout) before requests to C. If a permit can't be acquired from the semaphore, calls to C are not passed through.

Differences

The advantage of the thread pool approach is that requests that are passed to C can be timed out, something that is not possible when using semaphores.

Solution 2

Here is a good example with runtime explanation for bulkhead in Resilience4j which is inspired by Netflix Hystrix.

Below example configurations might give some clarity of usage.

Example configurations: Allow maximum 5 concurrent calls at any given time. Keep other calls waiting for until one of the in-process 5 concurrent finishes or until maximum of 2 seconds.

Idea is not to burden any system with load more than they can consume. If incoming load is greater than consumption, then wait for reasonable time or just timeout & go for alternate path.

Share:
40,061
Sashank
Author by

Sashank

Updated on June 28, 2021

Comments

  • Sashank
    Sashank about 3 years

    Hystrix, a Netflix API for latency and fault tolerance in complex distributed systems uses Bulkhead Pattern technique for thread isolation. Can someone please elaborate on it.

  • Dmitry
    Dmitry about 7 years
    As an addition, in the original Hystrix wiki now there is a detailed description of both approaches: github.com/Netflix/Hystrix/wiki/How-it-Works
  • voipp
    voipp over 5 years
    what is the difference between circuit breaker and bulkhead?
  • K Erlandsson
    K Erlandsson over 5 years
    @voipp circuit breakers are a quite different thing. They detect when a service is in an unhealthy state and moves callers into a "fail fast" state where they don't call the unhealthy service, but return an error code instead until the service is fine again. This avoids overloading the unhealthy service so that it can recover, and it prevents cascading failures since callers are not slowed down.
  • Jeremy Caney
    Jeremy Caney almost 3 years
    Is this really adding value beyond the accepted answer, which is very comprehensive?