Load Balancing

A load balancer is the traffic cop of your system. It spreads requests across servers, hides failed servers, and is the place you implement most cross-cutting concerns. Critical, often invisible.

What a load balancer does

A load balancer sits in front of your application servers. Every client request hits the LB first; the LB picks a healthy server and forwards the request. To the client, it looks like there is one big server. Behind the scenes, work is distributed.

Three jobs:

  1. Distribute load. Spread requests so no server is overwhelmed.
  2. Health check. If a server fails, stop sending it traffic.
  3. Cross-cutting concerns. SSL termination, rate limiting, request logging.

L4 vs L7 load balancers

The difference is where in the network stack the LB looks at traffic.

L4 (transport layer). Operates on TCP/UDP. Looks at IP addresses and ports. Fast, simple, but doesn't understand HTTP. Examples: AWS NLB, HAProxy in TCP mode.

L7 (application layer). Operates on HTTP. Sees URLs, headers, cookies. Can route based on path or hostname, do SSL termination, content-based routing. Heavier than L4 but vastly more flexible. Examples: NGINX, Envoy, AWS ALB, HAProxy in HTTP mode.

Modern systems usually use L7 for everything except very high-throughput TCP traffic.

clients Load Balancer health checks · routing server 1 ✓ server 2 ✓ server 3 ✗ server 4 ✓
A load balancer routes around failed servers and spreads traffic across the healthy ones.

Routing algorithms

AlgorithmHow it picksWhen to use
Round robinCycle through servers in orderSimple, default, fine when servers are equal
Least connectionsServer with fewest active connectionsWhen request times vary widely
WeightedRound robin with weights for bigger serversHeterogeneous fleet
IP hashHash of client IP picks serverSticky sessions without cookies
Least response timeFastest-responding serverMixed performance servers
Consistent hashHash of request keyCache affinity, stateful systems
Try it: watch routing algorithms in action
Switch between round robin, least connections, and random. Send a burst of requests with varying durations and see how loads end up.

Health checks

Health checks are how the LB knows which servers are alive. Two flavors:

Make your health endpoint deep enough to detect real problems (DB connection works, dependencies reachable) but shallow enough to be fast. A common mistake is making the health check itself fail under load, taking down healthy servers.

SSL termination

The LB decrypts HTTPS, talks plain HTTP to backend servers, then re-encrypts the response. Why? Each server doesn't need to do TLS, which is CPU-heavy. The LB has hardware acceleration. Centralized cert management.

Trade-off: traffic between LB and backend is plain text. Fine inside a private network; not fine across the internet.

Sticky sessions

Sometimes you need the same user to keep hitting the same backend server (in-memory session, websocket connection). Sticky sessions (session affinity) bind a user to one server, usually via a cookie.

Sticky sessions are a smell. They prevent free horizontal scaling and make rollouts harder. Prefer stateless services with shared session storage. Use stickiness only when you must (websockets, some legacy apps).

The single point of failure A single load balancer is itself a SPOF. Run them in pairs (active-passive or active-active) with health monitoring between them. Cloud LBs (AWS ELB, GCP LB) do this for you under the hood.

Where the LB fits

Most production systems have multiple layers:

  1. DNS / global LB: routes users to the nearest region.
  2. Edge LB: per-region public-facing LB. SSL termination, DDoS protection, public IP.
  3. Internal LB: in front of each service tier. L7 routing.

Each layer has its own scaling and failure characteristics. Skip a layer at your peril.