Load Balancing
A load balancer is the traffic cop of your system. It spreads requests across servers, hides failed servers, and is the place you implement most cross-cutting concerns. Critical, often invisible.
What a load balancer does
A load balancer sits in front of your application servers. Every client request hits the LB first; the LB picks a healthy server and forwards the request. To the client, it looks like there is one big server. Behind the scenes, work is distributed.
Three jobs:
- Distribute load. Spread requests so no server is overwhelmed.
- Health check. If a server fails, stop sending it traffic.
- Cross-cutting concerns. SSL termination, rate limiting, request logging.
L4 vs L7 load balancers
The difference is where in the network stack the LB looks at traffic.
L4 (transport layer). Operates on TCP/UDP. Looks at IP addresses and ports. Fast, simple, but doesn't understand HTTP. Examples: AWS NLB, HAProxy in TCP mode.
L7 (application layer). Operates on HTTP. Sees URLs, headers, cookies. Can route based on path or hostname, do SSL termination, content-based routing. Heavier than L4 but vastly more flexible. Examples: NGINX, Envoy, AWS ALB, HAProxy in HTTP mode.
Modern systems usually use L7 for everything except very high-throughput TCP traffic.
Routing algorithms
| Algorithm | How it picks | When to use |
|---|---|---|
| Round robin | Cycle through servers in order | Simple, default, fine when servers are equal |
| Least connections | Server with fewest active connections | When request times vary widely |
| Weighted | Round robin with weights for bigger servers | Heterogeneous fleet |
| IP hash | Hash of client IP picks server | Sticky sessions without cookies |
| Least response time | Fastest-responding server | Mixed performance servers |
| Consistent hash | Hash of request key | Cache affinity, stateful systems |
Health checks
Health checks are how the LB knows which servers are alive. Two flavors:
- Active. LB pings each server every few seconds (often
GET /health). If it fails N times in a row, server is marked unhealthy. - Passive. LB watches actual traffic. Too many failures or timeouts → server pulled.
Make your health endpoint deep enough to detect real problems (DB connection works, dependencies reachable) but shallow enough to be fast. A common mistake is making the health check itself fail under load, taking down healthy servers.
SSL termination
The LB decrypts HTTPS, talks plain HTTP to backend servers, then re-encrypts the response. Why? Each server doesn't need to do TLS, which is CPU-heavy. The LB has hardware acceleration. Centralized cert management.
Trade-off: traffic between LB and backend is plain text. Fine inside a private network; not fine across the internet.
Sticky sessions
Sometimes you need the same user to keep hitting the same backend server (in-memory session, websocket connection). Sticky sessions (session affinity) bind a user to one server, usually via a cookie.
Sticky sessions are a smell. They prevent free horizontal scaling and make rollouts harder. Prefer stateless services with shared session storage. Use stickiness only when you must (websockets, some legacy apps).
Where the LB fits
Most production systems have multiple layers:
- DNS / global LB: routes users to the nearest region.
- Edge LB: per-region public-facing LB. SSL termination, DDoS protection, public IP.
- Internal LB: in front of each service tier. L7 routing.
Each layer has its own scaling and failure characteristics. Skip a layer at your peril.