Database Replication

Chapter 03 · Databases

Replication makes copies of your data on other machines. It buys you availability, read scale, and disaster recovery. It also creates the most subtle bugs in distributed systems.

Why we replicate

If your database is on one machine and that machine dies, you are down. If the disk corrupts, you may lose data. If a region has an outage, your global users are stuck. Replication solves all three by keeping copies on other machines.

Three goals, often overlapping:

Availability. When one node dies, traffic continues on a replica.
Read scale. Reads can hit any replica, multiplying read capacity.
Disaster recovery. A replica in another region survives a regional outage.

The basic patterns

Single-leader (master-slave) replication

One node is the leader. All writes go to the leader. The leader streams its changes to followers. Reads can be served by any node.

Pro: simple. Easy to reason about. Good consistency on the leader.

Con: writes are bottlenecked by leader capacity. If the leader dies, you need a failover.

This is the default for Postgres, MySQL, MongoDB, and most relational databases. Production-grade for most workloads.

Multi-leader replication

Multiple nodes accept writes. They sync with each other. Used in geo-distributed setups (one leader per region, lower write latency).

Pro: low-latency writes from any region. No single bottleneck.

Con: conflicts when two leaders accept writes to the same row simultaneously. Resolution is complex (last-write-wins, vector clocks, custom logic).

Leaderless replication

Every node accepts writes. Clients write to several nodes (W) and read from several nodes (R). If R + W > N (total replicas), you are guaranteed to read the latest. Cassandra and Dynamo use this.

Pro: high availability. No leader to fail.

Con: tunable consistency comes with mental overhead. Application logic gets harder.

Single-leader replication. Writes funnel to one node, reads spread across many.

Sync vs async replication

Synchronous: the leader waits for replicas to acknowledge before confirming the write to the client. Stronger durability. Higher write latency. If a replica is down, writes stall.

Asynchronous: the leader confirms immediately and pushes to replicas in the background. Fast writes. Risk: if the leader dies before replicas catch up, you lose recent writes.

Semi-synchronous: wait for at least one replica. The pragmatic middle.

Most production systems use async replication for read replicas and synchronous (or quorum-based) replication for high-stakes writes.

Replication lag and what it does to your app

Async replicas are always slightly behind. Usually milliseconds, sometimes seconds, occasionally minutes when a replica is overwhelmed. Your application can read its own write from one replica and not see it. Classic bug:

User updates their profile photo.
Write goes to leader.
App redirects user to profile page.
Profile page reads from a replica. Replica is 200ms behind.
User sees their old photo. Looks like the update failed.

Mitigations: read-your-writes consistency (route same-user reads to the leader briefly after a write), session-level routing, or just reading from the leader for the post-write screen.

Failover

When the leader dies, one replica must be promoted. Steps:

Detect the failure (heartbeats, health checks).
Pick the replica with the latest data.
Reconfigure clients to send writes to the new leader.
Stop the old leader from accepting writes if it comes back (split-brain).

Each step has failure modes. Automated failover (Patroni for Postgres, Orchestrator for MySQL, MongoDB's replica set elections) is essential at scale. Manual failover at 3 AM is a humbling experience you should not need to repeat.

Split brain If the old leader and new leader both think they are the leader (network partition heals weirdly), they accept conflicting writes. Quorum-based election (more than half the nodes must agree) prevents this.

What to remember

Replication gives you availability and read scale at the cost of complexity and possibly stale reads. Single-leader is the safest default. Multi-leader and leaderless are powerful but require thinking about conflict resolution. Failover is when your design gets tested. Plan and rehearse it.

← Previous

Database Sharding

Consistent Hashing