5 min read

#11. SQL vs NoSQL Decision Guide - Flexibility vs Scale

The One Thing to Remember

SQL = relationships and consistency. NoSQL = flexibility and scale. Choose SQL when you need complex queries and ACID. Choose NoSQL when you need horizontal scaling, flexible schemas, or specific data models (documents, graphs, time-series).


Building on Article 10

In Article 10: Isolation Levels & Anomalies, you learned how SQL databases provide different isolation levels. But here's the question: When should you use SQL vs NoSQL, and what are the real trade-offs?

Choosing the wrong database is expensive—migrations take months, performance problems are hard to fix, and team expertise is wasted.

Previous: Article 10 - Isolation Levels & Anomalies


Why This Matters (A Migration Horror Story)

I once watched a team migrate from MongoDB to PostgreSQL after 6 months of data consistency issues. They used MongoDB for financial transactions, then struggled with eventual consistency. The migration took 3 months and cost hundreds of thousands in engineering time. The lesson? Choose the right database from the start.

This isn't academic knowledge—it's the difference between:

  • Building on the right foundation vs rebuilding later

    • Understanding trade-offs = you choose correctly
    • Not understanding = you migrate later, waste months
  • Scaling successfully vs hitting walls

    • Understanding scaling models = you plan for growth
    • Not understanding = you hit limits, panic, rebuild
  • Team productivity vs constant firefighting

    • Understanding database strengths = you use the right tool
    • Not understanding = you fight the database constantly

The Categories

┌─────────────────────────────────────────────────────────────────┐
│                    DATABASE LANDSCAPE                            │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  RELATIONAL (SQL)                                               │
│  ═════════════════                                              │
│  PostgreSQL, MySQL, SQLite, SQL Server, Oracle                  │
│  • Tables with rows and columns                                 │
│  • Fixed schema                                                 │
│  • ACID transactions                                            │
│  • Complex joins and queries                                    │
│                                                                 │
│  DOCUMENT                                                       │
│  ══════════                                                     │
│  MongoDB, CouchDB, Amazon DocumentDB                            │
│  • JSON-like documents                                          │
│  • Flexible schema                                              │
│  • Good for nested data                                         │
│  • Limited joins                                                │
│                                                                 │
│  KEY-VALUE                                                      │
│  ═══════════                                                    │
│  Redis, Memcached, Amazon DynamoDB, etcd                        │
│  • Simple get/set by key                                        │
│  • Extremely fast                                               │
│  • Limited query capability                                     │
│  • Good for caching, sessions                                   │
│                                                                 │
│  WIDE-COLUMN                                                    │
│  ════════════                                                   │
│  Cassandra, HBase, ScyllaDB                                     │
│  • Rows with dynamic columns                                    │
│  • Optimized for writes                                         │
│  • Horizontal scaling                                           │
│  • Good for time-series, IoT                                    │
│                                                                 │
│  GRAPH                                                          │
│  ═══════                                                        │
│  Neo4j, Amazon Neptune, ArangoDB                                │
│  • Nodes and relationships                                      │
│  • Optimized for traversals                                     │
│  • Good for social networks, recommendations                    │
│                                                                 │
│  TIME-SERIES                                                    │
│  ════════════                                                   │
│  InfluxDB, TimescaleDB, Prometheus                              │
│  • Optimized for time-stamped data                              │
│  • Compression for time data                                    │
│  • Good for metrics, IoT, financial data                        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Common Mistakes (I've Made These)

Mistake #1: "NoSQL because it scales"

Why it's wrong: PostgreSQL handles millions of rows easily. Most apps never need NoSQL scale. SQL + read replicas handles most load.

Right approach: Start with SQL, migrate if needed. Don't prematurely optimize with NoSQL.

Mistake #2: "MongoDB for everything"

Why it's wrong: MongoDB's flexible schema becomes a mess. Lack of transactions (historically) caused real data loss. Joins across collections are painful.

Right approach: Use MongoDB for actual document data (nested, varying structure). Don't use MongoDB as a "schemaless SQL."

Mistake #3: "One database for all"

Why it's wrong: Different data has different needs. Logging doesn't need ACID. User auth doesn't need Cassandra's scale.

Right approach: Use the right tool for each job. Most production systems use multiple databases.


SQL vs NoSQL: Core Trade-offs

┌─────────────────────────────────────────────────────────────────┐
│                                                                 │
│          SQL (Relational)          NoSQL (Various)              │
│          ═════════════════         ═══════════════              │
│                                                                 │
│  Schema:   Fixed (strict)          Flexible (schemaless)        │
│                                                                 │
│  Scaling:  Vertical (bigger box)   Horizontal (more boxes)      │
│            + Read replicas         Native sharding              │
│                                                                 │
│  ACID:     Full support            Varies (often eventual)      │
│                                                                 │
│  Queries:  Complex joins,          Limited joins,               │
│            aggregations            document/key lookups         │
│                                                                 │
│  Best for: Complex relationships   High volume, simple access   │
│            Transactions            Flexible data models         │
│            Unknown query patterns  Known access patterns        │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

When to Choose SQL

Strong Use Cases

✅ USE SQL WHEN:

1. Complex relationships between entities
   • E-commerce: Users → Orders → Products → Categories
   • ERP systems with hundreds of related tables
   
2. ACID transactions are critical
   • Financial systems
   • Inventory management
   • Any "money" operations

3. Unknown query patterns
   • Business intelligence
   • Ad-hoc reporting
   • Data exploration

4. Data integrity is paramount
   • Healthcare records
   • Audit trails
   • Compliance requirements

5. Your data is naturally tabular
   • Spreadsheet-like data
   • Consistent structure across records

SQL Example: E-commerce

-- Natural fit: Related entities with constraints
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL
);

CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    status VARCHAR(50),
    total DECIMAL(10,2),
    created_at TIMESTAMP DEFAULT NOW()
);

-- Complex query: Top customers by category
SELECT u.email, c.name as category, SUM(oi.price * oi.quantity) as total
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
JOIN categories c ON p.category_id = c.id
GROUP BY u.email, c.name
ORDER BY total DESC;

When to Choose NoSQL

Document Database (MongoDB)

✅ USE DOCUMENT DB WHEN:

1. Data is naturally hierarchical/nested
   • Blog posts with comments and tags
   • Product catalogs with varying attributes
   
2. Schema varies per record
   • Different product types have different fields
   • User-generated content

3. Rapid iteration on schema
   • Startup MVPs
   • Prototyping

4. Self-contained documents
   • Each document has all needed data
   • Minimal need for joins

Key-Value Store (Redis)

✅ USE KEY-VALUE WHEN:

1. Simple access patterns
   • Get user session by ID
   • Cache computed results
   
2. Extreme performance needed
   • Sub-millisecond latency
   • High throughput

3. Ephemeral data
   • Sessions, caches
   • Rate limiting counters

Wide-Column (Cassandra)

✅ USE WIDE-COLUMN WHEN:

1. Write-heavy workloads
   • IoT sensor data
   • User activity logs
   
2. Time-series data at scale
   • Metrics, events
   • Financial ticks

3. Horizontal scaling required
   • Petabytes of data
   • Global distribution

The Hybrid Approach (Most Real Systems)

┌─────────────────────────────────────────────────────────────────┐
│           TYPICAL PRODUCTION ARCHITECTURE                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│                      ┌─────────────────┐                        │
│                      │   Application   │                        │
│                      └────────┬────────┘                        │
│                               │                                 │
│     ┌────────────┬────────────┼────────────┬─────────────┐     │
│     ▼            ▼            ▼            ▼             ▼     │
│ ┌────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ │
│ │  Redis │ │PostgreSQL│ │ MongoDB  │ │Cassandra │ │  S3     │ │
│ │        │ │          │ │          │ │          │ │         │ │
│ │Sessions│ │  Users   │ │ Product  │ │ Events   │ │  Files  │ │
│ │ Cache  │ │  Orders  │ │ Catalog  │ │ Metrics  │ │  Media  │ │
│ │Rate Lim│ │ Payments │ │ Reviews  │ │  Logs    │ │ Backups │ │
│ └────────┘ └──────────┘ └──────────┘ └──────────┘ └─────────┘ │
│                                                                 │
│ Each database does what it's best at!                          │
└─────────────────────────────────────────────────────────────────┘

Real example: Netflix

  • Cassandra: Viewing history (write-heavy, time-series)
  • MySQL: Billing and accounts (ACID transactions)
  • ElasticSearch: Search and discovery
  • Redis: Session cache, real-time data
  • S3: Video file storage

Lesson: Most production systems use multiple databases. Use the right tool for each job.


Decision Framework

START HERE
    │
    ▼
┌───────────────────────────────────────────┐
│ Do you need complex joins and             │
│ transactions across multiple entities?    │
└───────────────────┬───────────────────────┘
                    │
        ┌───────────┴───────────┐
        ▼                       ▼
       YES                      NO
        │                       │
        ▼                       ▼
   ┌─────────┐     ┌───────────────────────────────┐
   │   SQL   │     │ Is your data naturally        │
   │PostgreSQL│    │ nested/hierarchical?          │
   │  MySQL  │     └───────────────┬───────────────┘
   └─────────┘                     │
                       ┌───────────┴───────────┐
                       ▼                       ▼
                      YES                      NO
                       │                       │
                       ▼                       ▼
                ┌───────────┐    ┌───────────────────────────┐
                │  Document │    │ Is it simple key-value    │
                │  MongoDB  │    │ access patterns?          │
                └───────────┘    └───────────────┬───────────┘
                                                 │
                                     ┌───────────┴───────────┐
                                     ▼                       ▼
                                    YES                      NO
                                     │                       │
                                     ▼                       ▼
                              ┌───────────┐    ┌──────────────────────┐
                              │ Key-Value │    │ Is it time-series   │
                              │   Redis   │    │ or write-heavy?     │
                              │  DynamoDB │    └──────────┬───────────┘
                              └───────────┘               │
                                             ┌────────────┴────────────┐
                                             ▼                        ▼
                                            YES                       NO
                                             │                        │
                                             ▼                        ▼
                                       ┌───────────┐    ┌──────────────────┐
                                       │Wide-Column│    │ Relationships    │
                                       │ Cassandra │    │ are the focus?   │
                                       │InfluxDB  │    │                  │
                                       └───────────┘    │ YES → Graph DB  │
                                                        │ NO → SQL/Document│
                                                        └──────────────────┘

Quick Reference

Need Best Choice Why
E-commerce orders PostgreSQL ACID, relationships
Product catalog MongoDB Varying attributes
User sessions Redis Fast, ephemeral
Event logs Cassandra Write-heavy, scale
Social graph Neo4j Relationship traversals
Metrics InfluxDB Time-series optimized
Search Elasticsearch Full-text, facets
Files S3 Blob storage

Key Takeaways

  1. SQL is the safe default - handles most use cases well
  2. NoSQL for specific needs - not just "because scale"
  3. Hybrid is normal - use multiple databases for different data (Netflix uses 5+)
  4. Consider operations - can your team run Cassandra? (it's complex)
  5. Plan for migration - wrong choice isn't forever, but it's expensive (months of work)
  6. Start simple - PostgreSQL handles most apps. Migrate when you hit real limits.

What's Next

Now that you understand when to choose SQL vs NoSQL, the next question is: How do you scale beyond a single machine?

In the next article, CAP Theorem Demystified - Consistency vs Availability, you'll learn:

  • What CAP theorem actually means (most people get it wrong)
  • Why partition tolerance isn't optional
  • How to choose between consistency and availability
  • Real-world examples of CP vs AP systems

This builds on what you learned here—NoSQL databases often make different CAP trade-offs than SQL databases.

Continue to Article 12: CAP Theorem


This article is part of the Backend Engineering Mastery series. Choosing the right database is one of the most important architectural decisions.