Sleek Software - Circuit breaker implementation

The circuit breaker design pattern is used to provide stability and prevent cascading failures in distributed systems. A circuit breaker is placed between a service and each of its remote dependencies - it can then isolate that service from a failed dependency to prevent that failure affecting the rest of the system.

As an example, imagine you've created a web application which has a dependency on a remote third-party service. Let's say the third-party has oversold its system capacity and their database struggles under load. Assume that their database fails in such a way that it takes a long time to hand back an error to the third-party service. This in turn makes your calls to that service fail after a significant period of time. Back in your web application, the end-users have noticed that their form submissions are taking a long time, indeed appearing to hang. Of course the users are likely start hammering the refresh button, so adding more requests. This will eventually cause the failure of your web application due to resource exhaustion. The failure will affect all users, even those who aren't using functionality that's dependent on this third-party service.

Introducing a circuit breaker between your web application and the third-party service lets your requests fail-fast, thereby letting your end-users know that something is wrong and that they don't need to refresh their requests. This also confines the failure behavior to only those users that are using functionality dependent on the third party - other users are no longer affected as there is no resource exhaustion. Circuit breakers can also allow developers to mark as unavailable those services that use the broken functionality, or perhaps show some cached content as appropriate while the circuit breaker is open.

As another example, imagine a distributed system where one of its services has experienced a failure and a subsequent restart. While it's restarting, the serice may be bombarded with requests, thus potentially causing it to fail again. A circuit breaker can give the service sufficient time to initialise itself properly before having to deal with new requests.

My circuit breaker implementation is basically a state machine with the following behaviour.

During normal operation, the circuit breaker is in the Closed state:

Any call that fails with an exception or that exceeds the configured timeout increments a failure counter.
A successful call resets the failure counter to zero.
When the failure count reaches the configured maximum, the breaker is tripped into the Open state.

While the circuit breaker is in the Open state:

All calls fail-fast with a CircuitBreakerOpenException.
After the configured reset interval, the circuit breaker enters a Half-Open state.

While the circuit breaker is in the Half-Open state:

The first call is allowed through without failing fast.
If the first call succeeds, the breaker is reset back to Closed state.
If the first call fails, the breaker is tripped again into the Open state

My implementation can handle the following types of call:

Synchronous call without a result, for example:

Action COMMAND_TIMEOUT = () => { Thread.Sleep(10000); };

Synchronous call with a result:

Func<object> COMMAND_TIMEOUT = () => { Thread.Sleep(10000); return new object(); };

Asynchronous call without a result:

Func<Task> COMMAND_TIMEOUT = () => Task.Delay(10000);

Asynchronous call with a result:

Func<Task<bool>> COMMAND_TIMEOUT = () => { return Task.Delay(10000).ContinueWith(t => false); };

You can also supply an alternative "fall-back" command in case the original command failed and couldn't return a result. Note that the fall-back command is designed to provide a default value for the result, perhaps from a cache. It should really be executed locally if possible, as a fall-back command executed against a remote service could quite likely fail as well.

The library is "thread-safe" in the sense that each shared state mutation is controlled by an atomic compare-and-swap operation. This is typically much faster than a lock operation as it goes directly to the CPU via a single assembly instruction.

I've placed my C# open-source circuit breaker implementation on GitHub. It has 40 unit tests that you can look at to help you understand how to use the library. You can see some asynchronous call snippets below:

[TestClass]

public sealed class TestAsyncWithResult

{

private const int MAX_FAILURES_BEFORE_TRIP = 3;

private readonly TimeSpan CIRCUIT_RESET_TIMEOUT = TimeSpan.FromSeconds(30);

private readonly TimeSpan CALL_TIMEOUT = TimeSpan.FromSeconds(5);

private readonly Func<Task<bool>> COMMAND_EMPTY = () => Task.FromResult(false);

private readonly Func<Task<bool>> COMMAND_TIMEOUT = () => { return Task.Delay(10000).ContinueWith(t => false); };

private readonly Func<Task<bool>> COMMAND_EXCEPTION = () => { throw new ArithmeticException(); };

private readonly Circuit m_Circuit;

public TestAsyncWithResult()

{

m_Circuit = new Circuit(MAX_FAILURES_BEFORE_TRIP, CALL_TIMEOUT, CIRCUIT_RESET_TIMEOUT);

}

[TestMethod]

public async Task AsyncResult_MultipleCallsShouldSucceed()

{

m_Circuit.Close();

for ( int i = 0; i < 20; i++ )

{

await m_Circuit.ExecuteAsync(COMMAND_EMPTY);

}

Assert.IsTrue(m_Circuit.IsClosed);

}

[TestMethod]

[ExpectedException(typeof(CircuitBreakerTimeoutException))]

public async Task AsyncResult_TimeoutShouldThrow()

{

m_Circuit.Close();

await m_Circuit.ExecuteAsync(COMMAND_TIMEOUT);

}

[TestMethod]

[ExpectedException(typeof(ArithmeticException))]

public async Task AsyncResult_ExceptionShouldThrow()

{

m_Circuit.Close();

await m_Circuit.ExecuteAsync(COMMAND_EXCEPTION);

}