Saga Pattern

When a transaction spans multiple services, you can't use a single ACID transaction. The saga pattern coordinates the sequence and provides compensation steps for rollback.

The distributed transaction problem

You're processing an order. You need to: charge the customer, reserve inventory, schedule shipping, send a confirmation email. Each lives in a different service with its own database. There is no single ACID transaction across them.

Two-phase commit (2PC) tries to coordinate, but it locks resources, slows everything down, and breaks horribly under partition. Most modern systems avoid it.

The saga pattern is the practical alternative.

The idea

Break the long transaction into a sequence of local transactions. Each step does something and emits an event. The next step listens, does its thing, emits an event. If any step fails, run compensating transactions to undo the work done so far.

Two flavors

Choreographed saga. No central coordinator. Each service listens for events and reacts. Pure event-driven. Pros: decentralized, no single point of failure. Cons: hard to see the full flow; debugging is painful.

Orchestrated saga. A central orchestrator (Temporal, Camunda, AWS Step Functions, or a custom service) tells each step what to do. Pros: clear flow, easier to debug, easier to add new steps. Cons: orchestrator becomes a critical service.

SAGA WITH COMPENSATION Charge card ✓ Reserve stock ✓ Ship order ✗ Send email step 3 fails → run compensations → Refund card Release stock (nothing to undo)
If a step fails, run compensations in reverse to undo the previous successful steps.

Designing compensations

Each step needs an inverse. "Charge card" is undone by "refund card". "Reserve stock" by "release stock". "Send email" by, well, you can't unsend an email. Some actions have no real inverse; the saga has to either tolerate them or only do them at the very end.

Compensations are not free transactions. A refund is not exactly a charge in reverse (fees, banking timing). The saga reaches a consistent end state, but not necessarily the original state.

Idempotency is mandatory Steps and compensations might be retried. They must be idempotent. Charging the card twice or refunding twice is exactly the kind of bug that ruins trust.

When to use sagas

When not to: simple CRUD, single-service work, anything truly atomic. Don't fan out a tiny operation just to use a saga.

Tools

Temporal is the most popular orchestration platform today. It handles retries, timeouts, state, and resumption out of the box. AWS Step Functions for AWS-native shops. Camunda for BPMN-style workflows. For choreographed sagas, plain Kafka with careful event design.

Sagas trade ACID for eventual consistency with explicit compensation. They're a heavier mental model than transactions but the only realistic option in distributed systems. Design the steps and compensations together; that's where the bugs hide.