SQL vs NoSQL
SQL gives you structure, joins, and ACID. NoSQL gives you flexibility, scale, and schema freedom. You pick by access pattern and growth profile, not by religion.
The decision that shapes everything
The database is the heart of your system. Picking the wrong one early is a mistake you pay for daily. The first decision is rarely "which product" but "which family". SQL or NoSQL? Even within NoSQL, which kind?
SQL (relational) databases
Relational databases organize data into tables with strict schemas. Rows are linked by foreign keys. SQL (Structured Query Language) lets you join, filter, aggregate, and transact across them. Postgres, MySQL, Oracle, SQL Server are the headliners. They have been the default for forty years for good reason.
Strengths:
- ACID transactions across multiple tables.
- Powerful querying. JOINs, GROUP BY, window functions, full-text search.
- Strict schemas catch errors at write time, not query time.
- Mature tooling. Decades of indexes, optimizers, monitoring.
Weaknesses:
- Hard to scale horizontally. Sharding a relational DB is doable but painful.
- Schema changes (migrations) on huge tables are operationally tricky.
- Some workloads (graphs, time series, blob storage) fit poorly.
NoSQL: four distinct families
"NoSQL" is a marketing term that hides four very different things. Confusing them is common.
Key-value stores
Store: a key maps to a blob of data. Get by key, set by key. That's it. Redis, Memcached (in memory), DynamoDB (durable). Built for fast lookups at huge scale. Bad at anything that requires querying by value.
Document stores
Store: each record is a JSON-like document. Schema is flexible per document. MongoDB, CouchDB, Firestore. Good when records have nested structure or differ in shape (e.g., a product catalog where each category has different attributes).
Wide-column stores
Store: each row can have wildly different columns. Optimized for write-heavy workloads at planet scale. Cassandra, HBase, Bigtable. Used by Netflix, Apple, big telcos. Great for time series, event logs, sensor data. Cumbersome for ad-hoc queries.
Graph databases
Store: nodes and edges. Querying friend-of-friend relationships, fraud rings, dependency chains. Neo4j is the reference. Niche but unbeatable for the right problem.
The honest selection guide
| Need | Pick |
|---|---|
| You're building a SaaS app and don't know yet | Postgres. Default. You will not regret it. |
| You need ACID across multiple records | SQL. |
| Cache or session store | Redis. |
| Hierarchical or flexible product data | MongoDB or Postgres with JSONB. |
| Billions of writes per day, time-ordered | Cassandra. |
| Social graphs or recommendations | Graph DB. |
| Massive read fanout, eventually consistent OK | DynamoDB or Cassandra. |
| Search across text | Elasticsearch (alongside the primary DB). |
The "polyglot persistence" reality
Mature systems use multiple databases. The orders table in Postgres for ACID. Redis for sessions and rate limit counters. Elasticsearch for search. Cassandra for click events. Each tool for its job. This adds operational complexity, but it stops happening when you try to force one DB to do everything.
Pick by access pattern (how will you query?), scale (how big will it grow?), and consistency need (does staleness hurt?). Not by what's trendy. The boring database is usually the right one.