Design a URL Shortener
Design a URL shortener like Bitly. The classic system design warm-up. Looks simple, but it forces you to think about ID generation, write-heavy vs read-heavy traffic, caching, and analytics at scale.
The problem
Build a service where users paste a long URL and get back a short code. When someone hits the short URL, redirect them to the original. Stretch goals: analytics on clicks, custom aliases, expiration, link previews.
Step 1: clarify and estimate
In an interview, never start drawing boxes. Ask questions first. How many URLs created per day? How many redirects? How long are short codes? Custom aliases allowed? Analytics required?
Reasonable assumptions for the exercise: 100 million new URLs per month (about 40 per second), 100x reads vs writes (so 4000 redirects per second), 7-character codes, 5 year retention. That gives us roughly 6 billion URLs total. Each row maybe 500 bytes, so around 3 TB of storage over 5 years. Easy.
Step 2: API
POST /shorten
body: { "long_url": "https://...", "custom_alias": "..." }
response: { "short_url": "https://sd.in/aB3xZ9k" }
GET /:code
→ 301 redirect to long URL
Step 3: short code generation
Three real options.
Hash and truncate
MD5 the URL, take 7 base62 chars. Simple but collisions happen. You need to check the database for the existing code and retry on conflict, which adds a read on every write.
Random + collision check
Generate a random 7-char string, insert with unique constraint, retry on conflict. Probability of collision in 6 billion / 62^7 = 6 billion / 3.5 trillion is small, so retries are rare.
Counter and base62 encode (recommended)
Maintain a global counter. For each new URL, increment the counter, base62 encode the result. Six billion fits in 6 base62 chars; 62^7 is 3.5 trillion which gives us decades of runway. No collision check, no retry. The trick is making the counter scale — use a distributed ticket server like Twitter's Snowflake, or pre-allocate batches of IDs to each application server (server gets 1000 IDs at a time, hands them out locally).
Step 4: storage
Schema: (short_code PRIMARY KEY, long_url, user_id, created_at, expires_at). Reads are exact-match on short_code, so a key-value store like DynamoDB, Cassandra, or even sharded MySQL works perfectly. Shard by hash of short_code so reads go to one node.
Step 5: caching
Reads are hugely skewed. The top 1 percent of links get 90 percent of clicks. Cache hot codes in Redis with a TTL. Cache hit rate over 95 percent is realistic, which means the database mostly handles writes plus the long tail of cold reads.
Step 6: analytics
On every click, fire an event to Kafka with code, timestamp, IP, user agent, referrer. A separate consumer aggregates these into a warehouse for click-count queries. Do not write analytics inline with the redirect — you want the redirect path to be as fast as possible.
Step 7: things they will probe
- Custom aliases: Insert with the chosen code, fail on conflict, let user pick another.
- Expiration: Add expires_at. Background job removes expired rows, or check at read time.
- Abuse prevention: Rate limit by IP on POST. Reject blacklisted target domains. Scan for malware via a service like Google Safe Browsing.
- Hot keys: If one code goes massively viral, even Redis gets hot-keyed. Solution: replicate that key to multiple cache shards or move it to a CDN edge cache.