6 min read

#14. Replication Patterns - Consistency vs Latency

The One Thing to Remember

Replication is about trade-offs between consistency, availability, and latency. Synchronous replication = strong consistency but higher latency. Asynchronous replication = lower latency but risk of data loss. Choose based on what you can afford to lose.


Building on Article 13

In Article 13: Sharding Strategies, you learned how to split data across machines. But here's the question: How do you keep copies of that data for availability and performance?

Replication is where I've seen the most subtle bugs—replica lag causes issues that only appear in production under load.

Previous: Article 13 - Sharding Strategies


Why This Matters (A Replica Lag Bug)

I once debugged a production issue where users saw old messages after posting. The mystery: User posts message (write to primary), then refreshes (read from lagging replica). Message not visible yet! The fix? Read-your-writes consistency—track recent writes, route to primary if user wrote recently. Two lines of code, but it took 3 days to find.

This isn't academic knowledge—it's the difference between:

  • Handling failures gracefully vs cascading failures

    • Understanding replication = you design for failures
    • Not understanding = one failure brings down everything
  • Fast reads vs consistent reads

    • Understanding replica lag = you route reads correctly
    • Not understanding = users see stale data, confused
  • Avoiding split-brain vs data corruption

    • Understanding failover = you prevent split-brain
    • Not understanding = two primaries, data divergence

Why Replicate?

┌─────────────────────────────────────────────────────────────────┐
│                    WHY REPLICATE DATA?                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. HIGH AVAILABILITY                                           │
│     Primary dies → Replica takes over                           │
│     Uptime: 99.9% → 99.99%                                     │
│                                                                 │
│  2. READ SCALING                                                │
│     Primary handles writes                                      │
│     Replicas handle reads (scale horizontally)                  │
│                                                                 │
│  3. GEOGRAPHIC DISTRIBUTION                                     │
│     Replica in each region                                      │
│     Lower latency for local users                               │
│                                                                 │
│  4. DISASTER RECOVERY                                           │
│     Data center fire? Replica is safe.                          │
│     Ransomware? Restore from replica.                           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

Replication Topologies

1. Leader-Follower (Primary-Replica)

                    LEADER-FOLLOWER REPLICATION
                    ═══════════════════════════

                      ┌────────────────┐
                      │    PRIMARY     │
    Writes ─────────► │   (Leader)     │
                      └───────┬────────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
        ┌──────────┐   ┌──────────┐   ┌──────────┐
        │ Replica  │   │ Replica  │   │ Replica  │
        │    1     │   │    2     │   │    3     │
        └──────────┘   └──────────┘   └──────────┘
              ▲               ▲               ▲
              └───────────────┴───────────────┘
                           │
    Reads ◄────────────────┘

How it works:

  1. All writes go to primary
  2. Primary logs changes (WAL/binlog)
  3. Replicas pull and apply changes
  4. Reads can go to any node

Trade-offs:

Pros Cons
✅ Simple to understand ❌ Single point of write failure
✅ No write conflicts ❌ Failover complexity
✅ Consistent reads from primary ❌ Replica lag

2. Multi-Leader (Master-Master)

                    MULTI-LEADER REPLICATION
                    ════════════════════════

    Region: US                           Region: EU
    ┌────────────────┐                   ┌────────────────┐
    │    Leader 1    │◄─── Async ───────►│    Leader 2    │
    │   (US Primary) │    Replication    │  (EU Primary)  │
    └───────┬────────┘                   └───────┬────────┘
            │                                    │
   US Writes/Reads                      EU Writes/Reads

Trade-offs:

Pros Cons
✅ Low write latency (local) ❌ Conflict resolution required
✅ Survives region failure ❌ Complex to implement correctly
✅ Geographic distribution ❌ Eventual consistency

3. Leaderless (Quorum-Based)

                    LEADERLESS REPLICATION
                    ══════════════════════

        Client Write                    Client Read
             │                               │
             ▼                               ▼
    ┌────────────────┐              ┌────────────────┐
    │ Write to N=3   │              │ Read from N=3  │
    │ Wait for W=2   │              │ Wait for R=2   │
    └────────┬───────┘              └────────┬───────┘
             │                               │
     ┌───────┼───────┐               ┌───────┼───────┐
     ▼       ▼       ▼               ▼       ▼       ▼
  ┌─────┐ ┌─────┐ ┌─────┐        ┌─────┐ ┌─────┐ ┌─────┐
  │Node1│ │Node2│ │Node3│        │Node1│ │Node2│ │Node3│
  │ ✓   │ │ ✓   │ │ ✗   │        │v=10 │ │v=10 │ │v=9  │
  └─────┘ └─────┘ └─────┘        └─────┘ └─────┘ └─────┘

   W=2 achieved!                   R=2: Return v=10
   Write succeeds                  (highest version)

Quorum Configurations:

Config Meaning Use Case
W=N, R=1 Write-heavy, eventual reads Analytics
W=1, R=N Fast writes, slow consistent reads Logs
W=2, R=2 (N=3) Balanced General purpose
W=3, R=3 (N=3) Strongest consistency Finance

Common Mistakes (I've Made These)

Mistake #1: "Reading from replicas is always safe"

Why it's wrong: Replica lag means you might read stale data. I've seen users post a message, refresh, and not see it because they read from a lagging replica.

Right approach: Use read-your-writes consistency—track recent writes per user, route to primary if user wrote recently.

Mistake #2: "Async replication is fine for everything"

Why it's wrong: Async replication can lose data if primary crashes before replication completes. I've seen financial transactions lost because of this.

Right approach: Use synchronous replication for critical data (money, inventory). Use async for less critical data (logs, analytics).

Mistake #3: "Failover is automatic, we don't need to test it"

Why it's wrong: Failover is complex. I've seen split-brain scenarios where two primaries accepted writes, causing data divergence.

Right approach: Test failover regularly. Use proper fencing (STONITH, quorum-based) to prevent split-brain.


Synchronous vs Asynchronous

SYNCHRONOUS REPLICATION:
════════════════════════

Client ──► Primary ──► Replica ──► ACK ──► Primary ──► Client
                                              │
                                         "Committed!"

✓ No data loss on primary failure
✓ Strong consistency
✗ Higher latency (wait for replica)
✗ Replica failure blocks writes


ASYNCHRONOUS REPLICATION:
═════════════════════════

Client ──► Primary ──► Client
              │      "Committed!"
              │
              └───► Replica (eventually)

✓ Low latency
✓ Replica failure doesn't block
✗ Data loss possible (uncommitted changes)
✗ Stale reads from replica


SEMI-SYNCHRONOUS:
═════════════════

Client ──► Primary ──► [Any 1 of N Replicas] ──► Client
                              │              "Committed!"
                              └──► Other replicas (async)

✓ Balance of safety and speed
✓ At least one replica has data
✗ Still some latency added

Replica Lag: The Silent Problem

PRIMARY                    REPLICA
════════                   ═══════

Time 0: balance = $100     balance = $100

Time 1: UPDATE balance = $50
        ✓ Committed

Time 2: User reads from    balance = $100  ← STALE!
        replica            (lag hasn't caught up)

Time 3:                    Replication catches up
                           balance = $50

Measuring and Handling Lag

PostgreSQL:

SELECT 
    client_addr,
    state,
    sent_lsn - write_lsn AS write_lag,
    sent_lsn - flush_lsn AS flush_lag,
    sent_lsn - replay_lsn AS replay_lag
FROM pg_stat_replication;

MySQL:

SHOW SLAVE STATUS;
-- Look at: Seconds_Behind_Master

Application-level: Read-your-writes consistency

class ReadYourWritesRouter:
    def __init__(self):
        self.recent_writes = {}  # user_id -> timestamp
    
    def record_write(self, user_id):
        self.recent_writes[user_id] = time.time()
    
    def get_connection(self, user_id, is_write=False):
        if is_write:
            return self.primary
        
        # If user wrote recently, read from primary
        last_write = self.recent_writes.get(user_id, 0)
        if time.time() - last_write < 5:  # 5 second window
            return self.primary
        
        return self.random_replica()

Real-World Trade-off Stories

MySQL Replication Lag in Production

Situation: Replication lag is a common production issue that can impact data consistency and application performance. The delay between when transactions execute on the master/primary database and when they're applied on the slave/replica node can cause significant problems.

Common causes:

  • Bad queries lacking primary or unique keys
  • Poor network hardware or malfunctioning network cards
  • Disk I/O issues
  • Physical backups running on the replica
  • Geographic distance between regions

Solutions:

  • Configure parallel replication to run multiple slave threads
  • Implement semi-synchronous replication for synchronous behavior
  • Monitor using Performance_Schema views and metrics (MySQL 8+)
  • Optimize queries to include proper indexing and keys

Key diagnostic tool: SHOW SLAVE STATUS (or SHOW REPLICA STATUS in MySQL 8+) shows Seconds_Behind_Master/Source—how many seconds the replica is behind.

References:

Lesson: Monitor replica lag continuously. High lag (>1 second) means users are seeing stale data. Use read-your-writes consistency or route critical reads to primary.

GitHub: MySQL Replication Disaster (2018)

Situation: Network partition between data centers during maintenance.

What happened:

  1. Primary in DC1 became unreachable
  2. Orchestrator promoted replica in DC2
  3. Network healed - now TWO primaries!
  4. Both accepted writes → data divergence

Impact: 24+ hours to reconcile data.

The problem: Automated failover without robust fencing. When the network partition healed, both nodes thought they were primary.

Lesson: Automated failover needs robust fencing (STONITH, quorum-based). Split-brain is the worst outcome—better to have downtime than data corruption.

Slack: Read Replica Lag (Real User Impact)

Situation: Users saw old messages after posting.

Cause:

  • User posts message (write to primary)
  • User refreshes (read from lagging replica)
  • Message not visible yet!

Solution: Read-your-writes consistency

  • Track recent write timestamps per user
  • Route to primary if user wrote recently (5-second window)
  • Simple fix, but took days to diagnose

Lesson: Replica lag causes real user-facing bugs. Always implement read-your-writes consistency for user-generated content.


Failover: When Primary Dies

AUTOMATIC FAILOVER SEQUENCE:
════════════════════════════

1. DETECTION
   ───────────
   Heartbeat timeout (e.g., 30 seconds)
   Multiple monitors must agree (avoid false positives)

2. ELECTION
   ─────────
   Choose replica with least lag
   (or use consensus algorithm)

3. PROMOTION
   ──────────
   Replica becomes new primary
   Stops accepting replication
   Opens for writes

4. RECONFIGURATION
   ────────────────
   Other replicas point to new primary
   Application routes to new primary
   DNS/load balancer updated

5. OLD PRIMARY HANDLING
   ─────────────────────
   When it comes back: becomes replica
   Or: fence it (prevent split-brain)


RISKS:
══════
• Split-brain: Two nodes think they're primary
• Data loss: Async replication lag
• Application confusion: Cached connections to old primary

Split-Brain Prevention

Fencing strategies:

  1. STONITH (Shoot The Other Node In The Head) - Power off old primary via IPMI/BMC
  2. Quorum-based - Node needs majority of votes to be primary
  3. Shared storage lock - Primary holds lock on shared storage
  4. Network fencing - Block old primary's network access

Decision Framework

□ What's the acceptable data loss (RPO)?
  → Zero: Synchronous replication
  → Seconds: Semi-synchronous
  → Minutes: Asynchronous

□ What's the acceptable downtime (RTO)?
  → Seconds: Automated failover
  → Minutes: Manual failover
  → Hours: Cold standby

□ What's your read/write ratio?
  → Read-heavy: Add read replicas
  → Write-heavy: Consider sharding instead

□ Do you need geographic distribution?
  → Yes: Multi-region replicas
  → Cross-region writes: Consider multi-leader

□ Can your application handle stale reads?
  → Yes: Route reads to replicas freely
  → No: Read-your-writes or read from primary

Key Takeaways

  1. Sync vs Async is the fundamental trade-off (consistency vs latency)
  2. Replica lag causes most application bugs - plan for it (read-your-writes)
  3. Failover is hard - test it regularly, use proper fencing
  4. Split-brain is the worst outcome - use quorum-based or STONITH
  5. Read-your-writes is often necessary for good UX
  6. Multi-leader is complex - use only when necessary (geographic distribution)

What's Next

Now that you understand replication patterns, the next question is: How do distributed systems actually agree on a single value when nodes can fail?

In the next article, Consensus & Raft - Availability vs Strong Consistency, you'll learn:

  • How Raft achieves consensus (simpler than Paxos)
  • Why etcd, Consul, and Kubernetes use Raft
  • The trade-offs between availability and strong consistency
  • How leader election works in practice

This builds on what you learned here—replication keeps copies, but consensus ensures they agree.

Continue to Article 15: Consensus & Raft


This article is part of the Backend Engineering Mastery series.