Saga Pattern
When a transaction spans multiple services, you can't use a single ACID transaction. The saga pattern coordinates the sequence and provides compensation steps for rollback.
The distributed transaction problem
You're processing an order. You need to: charge the customer, reserve inventory, schedule shipping, send a confirmation email. Each lives in a different service with its own database. There is no single ACID transaction across them.
Two-phase commit (2PC) tries to coordinate, but it locks resources, slows everything down, and breaks horribly under partition. Most modern systems avoid it.
The saga pattern is the practical alternative.
The idea
Break the long transaction into a sequence of local transactions. Each step does something and emits an event. The next step listens, does its thing, emits an event. If any step fails, run compensating transactions to undo the work done so far.
Two flavors
Choreographed saga. No central coordinator. Each service listens for events and reacts. Pure event-driven. Pros: decentralized, no single point of failure. Cons: hard to see the full flow; debugging is painful.
Orchestrated saga. A central orchestrator (Temporal, Camunda, AWS Step Functions, or a custom service) tells each step what to do. Pros: clear flow, easier to debug, easier to add new steps. Cons: orchestrator becomes a critical service.
Designing compensations
Each step needs an inverse. "Charge card" is undone by "refund card". "Reserve stock" by "release stock". "Send email" by, well, you can't unsend an email. Some actions have no real inverse; the saga has to either tolerate them or only do them at the very end.
Compensations are not free transactions. A refund is not exactly a charge in reverse (fees, banking timing). The saga reaches a consistent end state, but not necessarily the original state.
When to use sagas
- Multi-service business processes with clear failure semantics.
- Anywhere you'd previously have reached for a distributed transaction.
When not to: simple CRUD, single-service work, anything truly atomic. Don't fan out a tiny operation just to use a saga.
Tools
Temporal is the most popular orchestration platform today. It handles retries, timeouts, state, and resumption out of the box. AWS Step Functions for AWS-native shops. Camunda for BPMN-style workflows. For choreographed sagas, plain Kafka with careful event design.
Sagas trade ACID for eventual consistency with explicit compensation. They're a heavier mental model than transactions but the only realistic option in distributed systems. Design the steps and compensations together; that's where the bugs hide.