#12. CAP Theorem Demystified - Consistency vs Availability
The One Thing to Remember
During a network partition, choose Consistency or Availability. You cannot have both. This isn't about picking 2 of 3—Partition tolerance isn't optional in distributed systems. The real choice is: when the network fails, do you want correct answers or any answers?
Building on Article 11
In Article 11: SQL vs NoSQL, you learned when to choose different database types. But here's the question: How do distributed databases handle failures, and what trade-offs do they make?
CAP theorem is the most misunderstood concept in distributed systems. Understanding it correctly helps you choose the right database and design systems with realistic expectations.
← Previous: Article 11 - SQL vs NoSQL
Why This Matters (A CAP Misunderstanding Story)
I once debugged a production issue where users saw wrong account balances during a network partition. The mystery: Team chose MongoDB with w:1 (AP mode) for "performance," but didn't understand what that meant. During partition, users saw stale balances. The fix? Switch to w:majority (CP mode) for financial data. But it took 2 days to diagnose because no one understood CAP trade-offs.
This isn't academic knowledge—it's the difference between:
-
Choosing the right database vs the wrong one
- Understanding CAP = you choose based on requirements
- Not understanding = you choose for "performance," get wrong behavior
-
Designing with realistic expectations vs false promises
- Understanding CAP = you know what's possible
- Not understanding = you promise both consistency and availability during partitions
-
Debugging distributed failures vs being confused
- Understanding CAP = you know what to expect during partitions
- Not understanding = you're confused when partitions happen
The Three Properties
C - Consistency
Every read receives the most recent write or an error.
CLIENT A SYSTEM CLIENT B
│ │ │
│── Write X=5 ────────────► │ │
│◄── Success ───────────────│ │
│ │ │
│ │ ◄──────────── Read X ──────│
│ │ ────────────► X=5 ─────────│
Consistent: B sees 5 (the latest write)
A - Availability
Every request receives a (non-error) response, without guarantee it's the most recent.
CLIENT NODE A NODE B
│ │ │
│── Request ───────────────►│ │
│◄── Response (maybe stale)─│ │
│ │ │
│───────────────────────────────── Request ────────────►│
│◄──────────────────────────────── Response ───────────│
Available: Always get a response (even if nodes disagree)
P - Partition Tolerance
System continues to operate despite network partitions (messages lost or delayed between nodes).
Node A ◄───────✂────────► Node B
(Network partition)
Partition tolerant: System still works (somehow)
The Real CAP Insight
Partition Tolerance isn't optional. In any distributed system:
- Network switches fail
- Cables get cut
- Data centers lose connectivity
- Cloud availability zones have issues
You WILL have partitions. The question is: what do you do when they happen?
┌─────────────────────────────────────────────────────────────────┐
│ THE ACTUAL CHOICE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Normal operation: You can have both C and A │
│ ═══════════════════════════════════════════ │
│ │
│ Node A ◄───────────────────────────────► Node B │
│ (Network is fine) │
│ │
│ • Writes go to A, replicate to B │
│ • Reads from either node get latest data │
│ • Everyone is happy! │
│ │
├─────────────────────────────────────────────────────────────────┤
│ │
│ During partition: Choose C or A │
│ ═══════════════════════════════ │
│ │
│ Node A ◄──────────✂───────────► Node B │
│ (Network down!) │
│ │
│ CHOICE CP: Consistency + Partition tolerance │
│ ───────────────────────────────────────────── │
│ Node B: "I can't reach A. I refuse to answer." │
│ Client: Gets ERROR │
│ Result: ✓ Consistent (no wrong answers) │
│ ✗ Not available (some clients get errors) │
│ │
│ CHOICE AP: Availability + Partition tolerance │
│ ───────────────────────────────────────────── │
│ Node B: "I'll answer with what I have." │
│ Client: Gets (possibly stale) data │
│ Result: ✓ Available (always responds) │
│ ✗ Not consistent (might be stale) │
│ │
└─────────────────────────────────────────────────────────────────┘
Trade-offs: When to Choose CP vs AP
Choose CP (Consistency) When:
✅ CORRECTNESS IS NON-NEGOTIABLE
• Banking / Financial systems
- Can't show wrong balance
- Can't allow double-spending
- Better: "Service unavailable" than wrong data
• Inventory systems
- Can't oversell products
- Last item must go to one customer
• Leader election / Coordination
- Must have exactly one leader
- Split-brain is catastrophic
• Configuration management
- All nodes must see same config
- Inconsistent config = bugs
TRADE-OFF ACCEPTED: Some requests fail during partitions
Choose AP (Availability) When:
✅ AVAILABILITY IS MORE IMPORTANT THAN PERFECT ACCURACY
• Social media feeds
- Missing a few likes is okay
- Service down = users leave
• Shopping carts (Amazon's example)
- Accept items even during partition
- Merge conflicts later
- Worst case: duplicate item, user removes it
• DNS
- Stale record is better than no record
- Eventually updates
• Analytics / Metrics
- Approximate counts are fine
- Data loss is acceptable
• Caching layers
- Stale cache is better than cache miss
TRADE-OFF ACCEPTED: Might return stale/inconsistent data
Real Systems on the CAP Spectrum
CONSISTENCY ◄────────────────────────────► AVAILABILITY
│ │
│ │
┌───────┴───────┐ ┌──────────┴───────────┐
│ │ │ │
PostgreSQL MongoDB Cassandra DynamoDB
(single node) (w:majority) (CL: ONE) (eventual)
│ │ │ │
MySQL etcd CouchDB DNS
│ │ │ │
│ Zookeeper │ │
│ │ │ │
└───────────────┴───────────────────────┴──────────────────────┘
│
CONFIGURABLE
(you choose per operation!)
System Details
| System | Default | CP Mode | AP Mode |
|---|---|---|---|
| PostgreSQL | CP | Single node, strong consistency | N/A (single node) |
| MongoDB | Configurable | w:majority, readConcern:majority | w:1, readConcern:local |
| Cassandra | Configurable | CL: ALL or QUORUM | CL: ONE or ANY |
| DynamoDB | AP | Strongly consistent reads (option) | Eventual consistency |
| Redis | AP | WAIT command for sync replication | Default async replication |
| etcd | CP | Raft consensus always | N/A (always consistent) |
Code Examples: Demonstrating the Trade-off
Simulating CP vs AP Behavior
import random
import time
class CPNode:
"""Consistency-preferring: Refuse to answer if uncertain"""
def __init__(self):
self.data = {}
self.peer_reachable = True
def write(self, key, value):
if not self.peer_reachable:
raise Exception("Cannot write: peer unreachable, consistency at risk")
self.data[key] = value
return "OK"
def read(self, key):
if not self.peer_reachable:
raise Exception("Cannot read: peer unreachable, might be stale")
return self.data.get(key, None)
class APNode:
"""Availability-preferring: Always respond, even if stale"""
def __init__(self):
self.data = {}
self.peer_reachable = True
def write(self, key, value):
# Write locally, sync to peer later (if reachable)
self.data[key] = value
if self.peer_reachable:
# Would sync to peer here
pass
return "OK (will sync when partition heals)"
def read(self, key):
# Always return what we have
value = self.data.get(key, None)
if not self.peer_reachable:
return f"{value} (WARNING: might be stale)"
return value
# Simulate partition
cp_node = CPNode()
ap_node = APNode()
print("=== Normal Operation ===")
cp_node.write("balance", 100)
print(f"CP read: {cp_node.read('balance')}")
ap_node.write("balance", 100)
print(f"AP read: {ap_node.read('balance')}")
print("\n=== During Partition ===")
cp_node.peer_reachable = False
ap_node.peer_reachable = False
try:
print(f"CP read: {cp_node.read('balance')}")
except Exception as e:
print(f"CP error: {e}")
print(f"AP read: {ap_node.read('balance')}")
MongoDB: Configuring Consistency
// CP MODE: Strong consistency
// Write waits for majority of replicas
db.users.insertOne(
{ name: "Alice" },
{ writeConcern: { w: "majority", wtimeout: 5000 } }
)
// Read from primary only (guaranteed latest)
db.users.find({ name: "Alice" }).readConcern("majority")
// AP MODE: Availability over consistency
// Write returns immediately (might be lost if primary fails)
db.users.insertOne(
{ name: "Bob" },
{ writeConcern: { w: 1 } }
)
// Read from any node (might be stale)
db.users.find({ name: "Bob" }).readPref("nearest")
Cassandra: Tuning Consistency Per Query
-- AP: Fast, eventually consistent (read from any one node)
SELECT * FROM users WHERE id = 123
USING CONSISTENCY ONE;
-- CP: Strong consistency (read from majority)
SELECT * FROM users WHERE id = 123
USING CONSISTENCY QUORUM;
-- Write to ALL nodes (strongest, slowest)
INSERT INTO users (id, name) VALUES (456, 'Alice')
USING CONSISTENCY ALL;
Real-World Trade-off Stories
Amazon's Shopping Cart (The AP Choice)
Situation: Amazon needed shopping carts that work during outages.
The choice: AP - always accept items into cart
What happens during partition:
- User adds item on Node A
- Different user experience on Node B (can't see the item)
- When partition heals: carts merge
Potential problems:
- Duplicate items
- Old items reappear
Why it works: Users can easily remove duplicates. Service being down = lost sales.
The paper: Dynamo: Amazon's Highly Available Key-value Store
GitHub's 2018 Outage (The CP Cost)
Situation: Network partition between data centers.
What happened:
- MySQL primary in DC1 became unreachable from DC2
- Orchestrator promoted replica in DC2 to primary
- Partition healed - now TWO primaries!
- Writes happened on both = data divergence
Recovery: 24+ hours to reconcile data
Lesson: CP systems can have split-brain too if failover is automated. Need careful orchestration.
Google Spanner (Having Your Cake)
Situation: Google wanted global strong consistency.
How they "cheated":
- Use GPS and atomic clocks in every data center
- Bound clock uncertainty to ~7ms
- Wait out uncertainty before committing
The catch:
- Requires specialized hardware
- Still has latency cost (waiting for uncertainty)
- Not truly beating CAP, just minimizing partition impact
Lesson: You can minimize trade-offs with engineering, but can't eliminate them.
Common Mistakes (I've Made These)
Mistake #1: "We'll just avoid partitions"
Why it's wrong: You cannot avoid partitions. They happen: network switches fail, cables get cut, data centers lose connectivity. I've seen teams design systems assuming partitions won't happen, then panic when they do.
Right approach: Design for partitions, don't try to prevent them. Choose CP or AP based on what happens during partitions.
Mistake #2: "AP means no consistency"
Why it's wrong: AP systems have consistency—just eventual, not immediate. Most AP systems converge within seconds. I've seen teams avoid AP systems thinking they'll have no consistency, when eventual consistency is fine for their use case.
Right approach: Understand that AP = eventual consistency (seconds to minutes), not no consistency. Use AP when eventual consistency is acceptable.
Mistake #3: "CAP says we can only have 2 of 3"
Why it's wrong: This is misleading. During normal operation, you can have all three. CAP only applies during partitions. I've seen teams avoid distributed systems thinking they can't have consistency and availability.
Right approach: Understand that CAP is about partition scenarios. Normal operation: you have C + A + P. Partition: choose C or A.
Mistake #4: "We chose CP, so we're always consistent"
Why it's wrong: CP systems are consistent during partitions, but they might not be available. I've seen teams choose CP for everything, then wonder why their service is down during network issues.
Right approach: Choose CP when correctness is critical (money, inventory). Choose AP when availability is more important (social feeds, shopping carts).
Decision Framework
□ What happens if users see stale data?
→ Catastrophic (money, inventory): CP
→ Annoying but recoverable: AP
□ What happens if service is unavailable?
→ Users leave (social, e-commerce): AP
→ Users wait (banking, coordination): CP
□ Can conflicts be resolved automatically?
→ Yes (timestamps, merging): AP works well
→ No (business logic required): CP safer
□ What's your partition frequency?
→ Rare, short partitions: CP cost is low
→ Frequent, long partitions: AP keeps you running
□ Can you afford to over-provision?
→ Yes: CP with many replicas reduces unavailability
→ No: AP with eventual consistency is cheaper
Memory Trick
"In a SPLIT, pick:"
- Safe = Consistent (CP) - refuse bad answers
- Present = Available (AP) - always respond
- LIve with the trade-off
- There's no magic option
Self-Assessment
Before moving on:
- [ ] Can you explain why Partition Tolerance isn't optional?
- [ ] Know whether your database is CP or AP by default?
- [ ] Can justify choosing AP over CP for a shopping cart?
- [ ] Understand what "eventual consistency" actually means?
- [ ] Know how to configure MongoDB/Cassandra for CP vs AP?
Key Takeaways
- CAP is about partitions - during normal operation, you have both C and A
- P is mandatory - partitions WILL happen in distributed systems
- Most systems are configurable - you choose CP or AP per operation
- Know your defaults - many databases default to AP for performance
- Match to business needs - money/inventory = CP, social/caching = AP
What's Next
Now that you understand CAP theorem, the next question is: How do you actually split your data across multiple machines?
In the next article, Sharding Strategies - Query Flexibility vs Scale, you'll learn:
- Hash vs range vs composite sharding strategies
- How to choose shard keys (this decision will haunt you)
- The trade-offs between query flexibility and scale
- Real-world sharding examples from production systems
This builds on what you learned here—CAP helps you understand trade-offs, but sharding is how you actually scale.
→ Continue to Article 13: Sharding Strategies
This article is part of the Backend Engineering Mastery series. CAP theorem is foundational for distributed systems design.