Service Discovery
When services come and go (scaling up, scaling down, redeploying), how does service A find a healthy instance of service B? Service discovery is the mechanism that answers that question dynamically.
Why dynamic discovery
In a static world, you hardcode service B's address in service A's config. Easy. In a dynamic world (containers, autoscaling, rolling deploys), service B's instances change addresses constantly. Hardcoding doesn't work.
Service discovery solves this. Each service registers itself with a registry on startup. Other services query the registry to find healthy instances. As things come and go, the registry stays current.
Two patterns
Client-side discovery. The client queries the registry, gets a list of instances, picks one (with its own load balancing), and sends the request directly. Examples: Eureka with Ribbon. Pro: no extra hop. Con: every client needs the registry library.
Server-side discovery. The client sends to a fixed endpoint (a load balancer or proxy). The proxy queries the registry and forwards. Examples: Kubernetes Services, AWS ELB with target groups. Pro: clients are simpler. Con: extra hop.
How instances register
- Self-registration. The service calls the registry on boot ("I'm here, port 8080") and sends heartbeats. Stops heartbeating when shutting down.
- Third-party registration. A platform component watches for new instances and registers them. Kubernetes works this way; pods are auto-registered with the Service.
How instances are removed
Two flavors of failure detection:
- Heartbeats. Service pings registry every N seconds. Miss too many → marked unhealthy.
- Active health checks. Registry actively probes each instance.
Most production systems use both. Heartbeats for liveness, health checks for readiness.
The tools
- Consul. Full-featured registry with health checks, KV store, and DNS interface.
- etcd. Fast KV store, used under Kubernetes.
- Eureka. Netflix's registry, popular in Java/Spring shops.
- Kubernetes Services. Built-in service discovery for k8s. DNS-based;
my-service.namespace.svc.cluster.local. - AWS Cloud Map / ELB. Hosted options.
Service mesh
A service mesh (Istio, Linkerd) is a deeper layer that handles discovery, routing, security, and observability across all service-to-service traffic. A sidecar proxy runs alongside each service. Powerful for complex environments. Heavy for small ones. Adopt when complexity warrants it.