Distributed Caching
When one cache node is not enough, you spread the cache across many. Sharding by key, dealing with hot keys, and surviving node failures are the three main concerns.
Why distribute the cache
One Redis instance can hold tens of GBs in memory and serve hundreds of thousands of ops per second. Plenty for many systems. But once you cross those limits, or once you need redundancy in case the cache box dies, you split the cache across multiple nodes.
The two distribution strategies
Client-side sharding. The application chooses which node to talk to (often via consistent hashing). Simple, no extra hops, but every client must know the topology. Memcached works this way.
Server-side cluster. The cache nodes form a cluster. The client sends to any node; the cluster routes internally if needed. Redis Cluster, DynamoDB DAX. More moving parts, easier client.
Hot keys
The single biggest problem in distributed caches. One key gets 10x more traffic than all others combined (think: the front-page article). The node owning that key is overloaded, the rest are idle.
Mitigations:
- Key replication. Store hot keys on multiple nodes. Reads can hit any of them.
- Local cache layer. Cache hot keys in the application process for a few seconds. Massive read volume becomes local.
- Key splitting. Append a random suffix when reading (
article:42:rand0througharticle:42:rand9) and write to all variants. Spreads the load.
What happens when a node dies
Two scenarios.
Cache is just a performance layer. Losing a node means a temporary spike in DB queries while keys re-warm on other nodes. Annoying, survivable.
Cache holds critical data with no DB backing. Now data is gone forever. Don't do this. Always treat the cache as derivable from a source of truth.
Production caches usually run with replicas (Redis Sentinel or Cluster) so a node failure triggers failover, not data loss.
Redis vs Memcached: the eternal question
| Redis | Memcached | |
|---|---|---|
| Data structures | Strings, hashes, lists, sets, sorted sets, streams | Strings only |
| Persistence | Optional (RDB, AOF) | None |
| Replication | Built-in | Client-side or via add-ons |
| Pub/Sub | Yes | No |
| Performance | Single-threaded, ~100K ops/sec | Multi-threaded, can scale a single node |
| Use case | General purpose, cache + queue + leaderboards | Plain key-value cache |
Today, Redis wins almost every "we need a cache" decision. Memcached is still around for very simple, very high-throughput scenarios.
Common architectures
Single Redis with replica. Plenty for most teams. Redis Sentinel handles failover.
Redis Cluster. 3-6+ shards, each with a replica. Scales out as needed.
Tiered cache. In-process cache (per-app instance) plus Redis. Hot keys served from process; warm keys from Redis. Best latency, cost in complexity.
Distributed caching is one of those topics where the implementation looks fine until traffic shifts and you discover hot keys, replication lag, or a missed invalidation. Build for failure modes from day one. Monitor hit rate. Have a runbook for "cache is down".