Pub/Sub and Kafka

Pub/sub broadcasts a message to many interested consumers. Kafka takes that idea, makes it durable, partitioned, and replayable. It is the backbone of modern data pipelines.

Queue vs pub/sub

A queue is one-to-one. A message put in by a producer is consumed by exactly one worker. Pub/sub is one-to-many. A message published to a topic is delivered to every subscriber.

Pub/sub fits when many parts of your system care about the same event. A user signs up; the analytics service wants to know, the email service wants to know, the recommendation service wants to know. With pub/sub each one subscribes; the producer doesn't even know they exist.

Kafka: pub/sub on steroids

Kafka is the de facto pub/sub system at scale. It has three core ideas:

  1. Topic. A named log of messages. Producers append; consumers read.
  2. Partition. Each topic is split into partitions. Different partitions live on different brokers, enabling horizontal scale.
  3. Offset. Each consumer tracks where it is in each partition. Messages aren't deleted on read; they expire by time or size.

The append-only log mental model

Kafka is not a queue, despite being used like one. It is a distributed, replicated, append-only log. Producers add to the end. Consumers read from any position they like. Messages stick around for the retention period (often 7 days), so multiple consumers can read the same data at different speeds, even replay history.

KAFKA TOPIC WITH PARTITIONS P0 P1 P2 offset 0 → ∞ (append only) Consumer A (offset 4) Consumer B (offset 6) Consumer C (offset 0, replay) each consumer tracks its own position
Topic split into partitions. Each consumer group reads at its own pace.

Consumer groups

A consumer group is a set of consumers that cooperatively read a topic. Kafka assigns each partition to one consumer in the group. To scale, add more consumers (up to the partition count). Multiple consumer groups can independently read the same topic without affecting each other.

Why people pick Kafka

Why people regret picking Kafka

Pick the partition key carefully The partition is decided by hashing the message key. Same key → same partition → guaranteed ordering. Pick a key that gives you ordering where you need it (user_id, order_id) and even distribution otherwise.

Alternatives

AWS Kinesis, Google Pub/Sub, Apache Pulsar, Redpanda. Each has its angle. Pulsar separates compute from storage. Redpanda is Kafka-API-compatible without ZooKeeper. Kinesis is hosted simplicity. Pick by operational fit.