#14. Replication Patterns - Consistency vs Latency
The One Thing to Remember
Replication is about trade-offs between consistency, availability, and latency. Synchronous replication = strong consistency but higher latency. Asynchronous replication = lower latency but risk of data loss. Choose based on what you can afford to lose.
Building on Article 13
In Article 13: Sharding Strategies, you learned how to split data across machines. But here's the question: How do you keep copies of that data for availability and performance?
Replication is where I've seen the most subtle bugs—replica lag causes issues that only appear in production under load.
← Previous: Article 13 - Sharding Strategies
Why This Matters (A Replica Lag Bug)
I once debugged a production issue where users saw old messages after posting. The mystery: User posts message (write to primary), then refreshes (read from lagging replica). Message not visible yet! The fix? Read-your-writes consistency—track recent writes, route to primary if user wrote recently. Two lines of code, but it took 3 days to find.
This isn't academic knowledge—it's the difference between:
-
Handling failures gracefully vs cascading failures
- Understanding replication = you design for failures
- Not understanding = one failure brings down everything
-
Fast reads vs consistent reads
- Understanding replica lag = you route reads correctly
- Not understanding = users see stale data, confused
-
Avoiding split-brain vs data corruption
- Understanding failover = you prevent split-brain
- Not understanding = two primaries, data divergence
Why Replicate?
┌─────────────────────────────────────────────────────────────────┐
│ WHY REPLICATE DATA? │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. HIGH AVAILABILITY │
│ Primary dies → Replica takes over │
│ Uptime: 99.9% → 99.99% │
│ │
│ 2. READ SCALING │
│ Primary handles writes │
│ Replicas handle reads (scale horizontally) │
│ │
│ 3. GEOGRAPHIC DISTRIBUTION │
│ Replica in each region │
│ Lower latency for local users │
│ │
│ 4. DISASTER RECOVERY │
│ Data center fire? Replica is safe. │
│ Ransomware? Restore from replica. │
│ │
└─────────────────────────────────────────────────────────────────┘
Replication Topologies
1. Leader-Follower (Primary-Replica)
LEADER-FOLLOWER REPLICATION
═══════════════════════════
┌────────────────┐
│ PRIMARY │
Writes ─────────► │ (Leader) │
└───────┬────────┘
│
┌───────────────┼───────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Replica │ │ Replica │ │ Replica │
│ 1 │ │ 2 │ │ 3 │
└──────────┘ └──────────┘ └──────────┘
▲ ▲ ▲
└───────────────┴───────────────┘
│
Reads ◄────────────────┘
How it works:
- All writes go to primary
- Primary logs changes (WAL/binlog)
- Replicas pull and apply changes
- Reads can go to any node
Trade-offs:
| Pros | Cons |
|---|---|
| ✅ Simple to understand | ❌ Single point of write failure |
| ✅ No write conflicts | ❌ Failover complexity |
| ✅ Consistent reads from primary | ❌ Replica lag |
2. Multi-Leader (Master-Master)
MULTI-LEADER REPLICATION
════════════════════════
Region: US Region: EU
┌────────────────┐ ┌────────────────┐
│ Leader 1 │◄─── Async ───────►│ Leader 2 │
│ (US Primary) │ Replication │ (EU Primary) │
└───────┬────────┘ └───────┬────────┘
│ │
US Writes/Reads EU Writes/Reads
Trade-offs:
| Pros | Cons |
|---|---|
| ✅ Low write latency (local) | ❌ Conflict resolution required |
| ✅ Survives region failure | ❌ Complex to implement correctly |
| ✅ Geographic distribution | ❌ Eventual consistency |
3. Leaderless (Quorum-Based)
LEADERLESS REPLICATION
══════════════════════
Client Write Client Read
│ │
▼ ▼
┌────────────────┐ ┌────────────────┐
│ Write to N=3 │ │ Read from N=3 │
│ Wait for W=2 │ │ Wait for R=2 │
└────────┬───────┘ └────────┬───────┘
│ │
┌───────┼───────┐ ┌───────┼───────┐
▼ ▼ ▼ ▼ ▼ ▼
┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐
│Node1│ │Node2│ │Node3│ │Node1│ │Node2│ │Node3│
│ ✓ │ │ ✓ │ │ ✗ │ │v=10 │ │v=10 │ │v=9 │
└─────┘ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘
W=2 achieved! R=2: Return v=10
Write succeeds (highest version)
Quorum Configurations:
| Config | Meaning | Use Case |
|---|---|---|
| W=N, R=1 | Write-heavy, eventual reads | Analytics |
| W=1, R=N | Fast writes, slow consistent reads | Logs |
| W=2, R=2 (N=3) | Balanced | General purpose |
| W=3, R=3 (N=3) | Strongest consistency | Finance |
Common Mistakes (I've Made These)
Mistake #1: "Reading from replicas is always safe"
Why it's wrong: Replica lag means you might read stale data. I've seen users post a message, refresh, and not see it because they read from a lagging replica.
Right approach: Use read-your-writes consistency—track recent writes per user, route to primary if user wrote recently.
Mistake #2: "Async replication is fine for everything"
Why it's wrong: Async replication can lose data if primary crashes before replication completes. I've seen financial transactions lost because of this.
Right approach: Use synchronous replication for critical data (money, inventory). Use async for less critical data (logs, analytics).
Mistake #3: "Failover is automatic, we don't need to test it"
Why it's wrong: Failover is complex. I've seen split-brain scenarios where two primaries accepted writes, causing data divergence.
Right approach: Test failover regularly. Use proper fencing (STONITH, quorum-based) to prevent split-brain.
Synchronous vs Asynchronous
SYNCHRONOUS REPLICATION:
════════════════════════
Client ──► Primary ──► Replica ──► ACK ──► Primary ──► Client
│
"Committed!"
✓ No data loss on primary failure
✓ Strong consistency
✗ Higher latency (wait for replica)
✗ Replica failure blocks writes
ASYNCHRONOUS REPLICATION:
═════════════════════════
Client ──► Primary ──► Client
│ "Committed!"
│
└───► Replica (eventually)
✓ Low latency
✓ Replica failure doesn't block
✗ Data loss possible (uncommitted changes)
✗ Stale reads from replica
SEMI-SYNCHRONOUS:
═════════════════
Client ──► Primary ──► [Any 1 of N Replicas] ──► Client
│ "Committed!"
└──► Other replicas (async)
✓ Balance of safety and speed
✓ At least one replica has data
✗ Still some latency added
Replica Lag: The Silent Problem
PRIMARY REPLICA
════════ ═══════
Time 0: balance = $100 balance = $100
Time 1: UPDATE balance = $50
✓ Committed
Time 2: User reads from balance = $100 ← STALE!
replica (lag hasn't caught up)
Time 3: Replication catches up
balance = $50
Measuring and Handling Lag
PostgreSQL:
SELECT
client_addr,
state,
sent_lsn - write_lsn AS write_lag,
sent_lsn - flush_lsn AS flush_lag,
sent_lsn - replay_lsn AS replay_lag
FROM pg_stat_replication;
MySQL:
SHOW SLAVE STATUS;
-- Look at: Seconds_Behind_Master
Application-level: Read-your-writes consistency
class ReadYourWritesRouter:
def __init__(self):
self.recent_writes = {} # user_id -> timestamp
def record_write(self, user_id):
self.recent_writes[user_id] = time.time()
def get_connection(self, user_id, is_write=False):
if is_write:
return self.primary
# If user wrote recently, read from primary
last_write = self.recent_writes.get(user_id, 0)
if time.time() - last_write < 5: # 5 second window
return self.primary
return self.random_replica()
Real-World Trade-off Stories
MySQL Replication Lag in Production
Situation: Replication lag is a common production issue that can impact data consistency and application performance. The delay between when transactions execute on the master/primary database and when they're applied on the slave/replica node can cause significant problems.
Common causes:
- Bad queries lacking primary or unique keys
- Poor network hardware or malfunctioning network cards
- Disk I/O issues
- Physical backups running on the replica
- Geographic distance between regions
Solutions:
- Configure parallel replication to run multiple slave threads
- Implement semi-synchronous replication for synchronous behavior
- Monitor using Performance_Schema views and metrics (MySQL 8+)
- Optimize queries to include proper indexing and keys
Key diagnostic tool: SHOW SLAVE STATUS (or SHOW REPLICA STATUS in MySQL 8+) shows Seconds_Behind_Master/Source—how many seconds the replica is behind.
References:
Lesson: Monitor replica lag continuously. High lag (>1 second) means users are seeing stale data. Use read-your-writes consistency or route critical reads to primary.
GitHub: MySQL Replication Disaster (2018)
Situation: Network partition between data centers during maintenance.
What happened:
- Primary in DC1 became unreachable
- Orchestrator promoted replica in DC2
- Network healed - now TWO primaries!
- Both accepted writes → data divergence
Impact: 24+ hours to reconcile data.
The problem: Automated failover without robust fencing. When the network partition healed, both nodes thought they were primary.
Lesson: Automated failover needs robust fencing (STONITH, quorum-based). Split-brain is the worst outcome—better to have downtime than data corruption.
Slack: Read Replica Lag (Real User Impact)
Situation: Users saw old messages after posting.
Cause:
- User posts message (write to primary)
- User refreshes (read from lagging replica)
- Message not visible yet!
Solution: Read-your-writes consistency
- Track recent write timestamps per user
- Route to primary if user wrote recently (5-second window)
- Simple fix, but took days to diagnose
Lesson: Replica lag causes real user-facing bugs. Always implement read-your-writes consistency for user-generated content.
Failover: When Primary Dies
AUTOMATIC FAILOVER SEQUENCE:
════════════════════════════
1. DETECTION
───────────
Heartbeat timeout (e.g., 30 seconds)
Multiple monitors must agree (avoid false positives)
2. ELECTION
─────────
Choose replica with least lag
(or use consensus algorithm)
3. PROMOTION
──────────
Replica becomes new primary
Stops accepting replication
Opens for writes
4. RECONFIGURATION
────────────────
Other replicas point to new primary
Application routes to new primary
DNS/load balancer updated
5. OLD PRIMARY HANDLING
─────────────────────
When it comes back: becomes replica
Or: fence it (prevent split-brain)
RISKS:
══════
• Split-brain: Two nodes think they're primary
• Data loss: Async replication lag
• Application confusion: Cached connections to old primary
Split-Brain Prevention
Fencing strategies:
- STONITH (Shoot The Other Node In The Head) - Power off old primary via IPMI/BMC
- Quorum-based - Node needs majority of votes to be primary
- Shared storage lock - Primary holds lock on shared storage
- Network fencing - Block old primary's network access
Decision Framework
□ What's the acceptable data loss (RPO)?
→ Zero: Synchronous replication
→ Seconds: Semi-synchronous
→ Minutes: Asynchronous
□ What's the acceptable downtime (RTO)?
→ Seconds: Automated failover
→ Minutes: Manual failover
→ Hours: Cold standby
□ What's your read/write ratio?
→ Read-heavy: Add read replicas
→ Write-heavy: Consider sharding instead
□ Do you need geographic distribution?
→ Yes: Multi-region replicas
→ Cross-region writes: Consider multi-leader
□ Can your application handle stale reads?
→ Yes: Route reads to replicas freely
→ No: Read-your-writes or read from primary
Key Takeaways
- Sync vs Async is the fundamental trade-off (consistency vs latency)
- Replica lag causes most application bugs - plan for it (read-your-writes)
- Failover is hard - test it regularly, use proper fencing
- Split-brain is the worst outcome - use quorum-based or STONITH
- Read-your-writes is often necessary for good UX
- Multi-leader is complex - use only when necessary (geographic distribution)
What's Next
Now that you understand replication patterns, the next question is: How do distributed systems actually agree on a single value when nodes can fail?
In the next article, Consensus & Raft - Availability vs Strong Consistency, you'll learn:
- How Raft achieves consensus (simpler than Paxos)
- Why etcd, Consul, and Kubernetes use Raft
- The trade-offs between availability and strong consistency
- How leader election works in practice
This builds on what you learned here—replication keeps copies, but consensus ensures they agree.
→ Continue to Article 15: Consensus & Raft
This article is part of the Backend Engineering Mastery series.