Design YouTube / Video Streaming

Chapter 11 · Case Studies and Interview Designs

Design YouTube. Video upload, transcoding, storage at petabyte scale, global delivery via CDN, recommendation, and the surprisingly hard problem of view counting at scale.

The problem

Users upload videos. The system transcodes them into multiple resolutions and bitrates. Users worldwide stream them with low buffering. Comments, likes, view counts, recommendations.

Scale

500 hours of video uploaded per minute. 1 billion hours watched per day. Petabytes of storage. The dominant cost is bandwidth and storage, not compute.

Upload pipeline

Browser uploads via resumable chunked upload (so a dropped connection does not lose the whole file). Raw video lands in object storage (S3 or equivalent). A message goes onto a queue saying "video X needs processing".

A transcoding fleet picks up jobs. For each video, it produces multiple renditions: 144p, 240p, 360p, 480p, 720p, 1080p, sometimes 4K. Each rendition is split into 2-10 second chunks for adaptive streaming (HLS or DASH). Per-rendition manifest plus a master manifest gets generated.

Streaming protocol

HLS (HTTP Live Streaming) or MPEG-DASH. The video is split into chunks (each maybe 6 seconds). The client downloads a manifest listing all chunks at all qualities. As the player plays, it monitors bandwidth and switches between qualities chunk by chunk. This is "adaptive bitrate streaming" and is why YouTube smoothly degrades quality on slow connections.

CDN delivery

You can not serve global users from a single datacenter. Encoded chunks are pushed to a CDN (Cloudflare, Akamai, Google's own). Edge nodes cache popular videos near users. A user in Tokyo hitting a popular video gets it from a Tokyo edge, not from Virginia. This is the difference between 200 ms first frame and 2 seconds first frame.

Metadata storage

Video metadata (title, description, uploader, tags, etc.) goes in a relational or document store. Comments are huge volume — stored separately, sharded by video_id. Likes and views are counters with their own scaling problem.

The view count problem

A popular video gets 100K views per second. If every view did UPDATE videos SET views = views + 1 on a single row, you would have a hot row that would melt the database. Solutions: write events to Kafka instead of incrementing in-line, then have a batch job aggregate counts every few seconds and update the metadata. Or use approximate counters (HyperLogLog). Or per-shard counters that get summed when displayed.

Recommendations

Out of scope here, but worth mentioning. A separate ML pipeline ingests watch history, click events, dwell time. Trains models offline. Serves recommendations from an online service. The recommendation feed is heavily personalized and pre-computed for active users.

Other interesting bits

Live streaming: Different beast. Uses RTMP for ingest, ultra-low-latency HLS or LL-DASH for playback. End-to-end latency around 5 seconds is the typical target.
Copyright (Content ID): Fingerprint every uploaded video, match against rights-holder registry, take action. Massive audio/video matching pipeline.
Storage tiering: Hot videos on fast SSD, long tail on cheaper cold storage. Tiered automatically based on access patterns.

← Previous

Design WhatsApp / Messaging Service

Design Uber / Ride Sharing