Circuit Breaker Pattern

Chapter 07 · Microservices and Architecture

When a downstream service is failing, hammering it harder makes things worse. A circuit breaker stops calls from going through, gives the failing service time to recover, and protects your own system from cascading failure.

The cascading failure problem

Service A calls service B. B is slow. A's threads start piling up waiting on B. A runs out of threads. A starts failing too. C calls A; same thing happens to C. Within minutes, half the system is down because of one slow service.

The circuit breaker cuts the chain. When A notices B is failing, A stops calling B for a while. A's threads stay free. A degrades gracefully (returns a default, errors quickly, falls back). B gets breathing room.

The three states

Closed. Normal operation. Calls pass through. The breaker counts failures.
Open. Failure threshold exceeded. Calls fail immediately without trying. Breaker waits.
Half-open. After a timeout, the breaker lets a few calls through to test the waters. If they succeed, go back to closed. If they fail, back to open.

The circuit breaker oscillates between closed (normal), open (failing fast), and half-open (testing).

Try it: drive the breaker through its three states

Toggle the downstream's health and send calls. Watch the breaker count failures, open, time out, and probe in half-open.

Downstream broken StateCLOSED Failures0

Tuning

Failure threshold: N failures in M seconds before opening. Too aggressive → flickers; too lax → cascade.
Open duration: how long to stay open before testing. Long enough for the downstream to recover; short enough to test recovery promptly.
Half-open allowance: how many test calls to send.

Pair with fallbacks

When the breaker is open, what does your service return? Three options:

Fail fast. Return an error. Acceptable for non-critical paths.
Cached response. Show stale data. Better than nothing for read-heavy paths.
Default value. A reasonable placeholder (zero items in cart, default avatar).

Hystrix and successors Netflix's Hystrix popularized the circuit breaker. It is now in maintenance mode. Modern alternatives: resilience4j (Java), Polly (.NET), Failsafe (Java). Service meshes like Istio do it at the proxy level.

Bulkhead pattern (worth pairing)

Isolate resources so one slow dependency can't drain everything. Allocate separate thread pools or connection pools per downstream. If service B exhausts its pool, A's pool to service C is unaffected. Often deployed alongside circuit breakers.

Circuit breakers prevent cascading failure. They're not magic; they're an explicit choice to fail fast when something is sick. Combined with retries, timeouts, and bulkheads, they keep your system robust under partial failure.

← Previous

Service Discovery

Saga Pattern