Consistent Hashing

Consistent hashing distributes data across nodes so that adding or removing one node moves only 1/N of the keys. It is the trick that makes elastic clusters and content delivery networks possible.

The problem with plain hash mod N

You have N servers. To pick which server owns a key, you compute hash(key) % N. Simple. Fast. Correct.

Now you add a server. N goes from 4 to 5. hash(key) % 4 and hash(key) % 5 give different answers for almost every key. You have to re-shuffle roughly 80% of your data. While the system is running. With users complaining.

This is fine for a textbook. It is unacceptable in production. Consistent hashing fixes it.

The hash ring

Imagine a ring numbered from 0 to a huge number (2^32, say). Each server gets a position on the ring, picked by hashing its name. Each key gets a position on the ring, picked by hashing the key. To find which server owns a key, you walk clockwise from the key's position until you hit a server.

CONSISTENT HASH RING A B C D key1 → B key2 → C key3 → A key4 → C walk clockwise to next server
Servers and keys placed on the same ring. Each key belongs to the next server clockwise.

Why this is better

Add a new server E. Place it on the ring. Only the keys that were going to fall between E's position and the next server clockwise need to move. Roughly 1/N of the data, where N is the new server count.

Remove server B. Its keys now go to the next server clockwise. Again, only B's slice moves. The other N-1 servers are unaffected.

This is the property that makes consistent hashing magical. Resharding is bounded. You move just enough data, and the rest of the system keeps humming.

Try it: add and remove nodes, watch the remap cost
Compare hash-mod-N (everything moves) with consistent hashing (only one slice moves) when the cluster changes.
Nodes3 Keys40 Last remap0 If mod-N0

Virtual nodes (the practical refinement)

The naive ring has a problem: the random positions of just a few servers can lead to uneven load. Server A might own a huge slice; server D might own a tiny one. Solution: each physical server is represented by many virtual nodes (vnodes) at different ring positions. With 100-256 vnodes per server, the load distribution evens out beautifully.

Bonus: when a server fails, its vnodes are scattered across the ring, so its load gets redistributed across all remaining servers, not concentrated on the next one.

Where you actually see consistent hashing

The interview tip Whenever a system design problem says "scale this cache" or "shard this data" or "we keep adding more servers", the answer almost always includes consistent hashing. Knowing why and being able to draw the ring diagram from memory is high-value interview prep.

Limits and refinements

Consistent hashing is great but not magic. Two known issues:

Other partitioning algorithms worth knowing

Rendezvous hashing. A simpler algorithm that gives similar properties without a ring. Each server scores each key (hash of key+server name) and the highest score wins. No need for a coordinator. Slightly less code, equally good distribution.

Jump consistent hash. A small Google paper from 2014. Two integer arguments in, one bucket out. No memory overhead. Used at very large scale.

You may go years without implementing consistent hashing yourself. But understanding it lets you read any distributed system's docs and instantly grasp how it scales. That mental model is worth the small upfront investment.