Distributed Caching

When one cache node is not enough, you spread the cache across many. Sharding by key, dealing with hot keys, and surviving node failures are the three main concerns.

Why distribute the cache

One Redis instance can hold tens of GBs in memory and serve hundreds of thousands of ops per second. Plenty for many systems. But once you cross those limits, or once you need redundancy in case the cache box dies, you split the cache across multiple nodes.

The two distribution strategies

Client-side sharding. The application chooses which node to talk to (often via consistent hashing). Simple, no extra hops, but every client must know the topology. Memcached works this way.

Server-side cluster. The cache nodes form a cluster. The client sends to any node; the cluster routes internally if needed. Redis Cluster, DynamoDB DAX. More moving parts, easier client.

Hot keys

The single biggest problem in distributed caches. One key gets 10x more traffic than all others combined (think: the front-page article). The node owning that key is overloaded, the rest are idle.

Mitigations:

What happens when a node dies

Two scenarios.

Cache is just a performance layer. Losing a node means a temporary spike in DB queries while keys re-warm on other nodes. Annoying, survivable.

Cache holds critical data with no DB backing. Now data is gone forever. Don't do this. Always treat the cache as derivable from a source of truth.

Production caches usually run with replicas (Redis Sentinel or Cluster) so a node failure triggers failover, not data loss.

DISTRIBUTED CACHE WITH CONSISTENT HASHING App tier Node A Node B Node C replica A replica B replica C key → consistent hash → node owner; replica takes over on failure
Each key lives on its primary node and a replica. Consistent hashing decides ownership.

Redis vs Memcached: the eternal question

RedisMemcached
Data structuresStrings, hashes, lists, sets, sorted sets, streamsStrings only
PersistenceOptional (RDB, AOF)None
ReplicationBuilt-inClient-side or via add-ons
Pub/SubYesNo
PerformanceSingle-threaded, ~100K ops/secMulti-threaded, can scale a single node
Use caseGeneral purpose, cache + queue + leaderboardsPlain key-value cache

Today, Redis wins almost every "we need a cache" decision. Memcached is still around for very simple, very high-throughput scenarios.

The first thing to monitor Hit rate. Cache hit rate is the single most important cache metric. If it drops, your DB load goes up and latency follows. Alert on hit rate below threshold (say 90% for production). Investigate before performance degrades visibly.

Common architectures

Single Redis with replica. Plenty for most teams. Redis Sentinel handles failover.

Redis Cluster. 3-6+ shards, each with a replica. Scales out as needed.

Tiered cache. In-process cache (per-app instance) plus Redis. Hot keys served from process; warm keys from Redis. Best latency, cost in complexity.

Distributed caching is one of those topics where the implementation looks fine until traffic shifts and you discover hot keys, replication lag, or a missed invalidation. Build for failure modes from day one. Monitor hit rate. Have a runbook for "cache is down".