8 min read

#7. Load Balancing (L4 vs L7) - Speed vs Features

The One Thing to Remember

L4 = blind forwarding, L7 = smart routing. Layer 4 sees only IP addresses and ports (fast but dumb). Layer 7 sees the full HTTP request (smart but slower). Choose based on whether you need content-aware routing.


Building on Article 6

In Article 6: HTTP Evolution, you learned how HTTP/2 and HTTP/3 work. But here's the question: How do you distribute traffic across multiple servers efficiently?

Load balancers need to understand HTTP versions to distribute traffic correctly. Understanding L4 vs L7 helps you choose the right tool.

Previous: Article 6 - HTTP Evolution


Why This Matters (A Production Story)

I once debugged a service where one backend was getting 90% of traffic while others sat idle. The mystery: L4 load balancer using IP hash, but most requests came from a few client IPs (NAT, proxies). The fix? Switch to L7 with least-connections algorithm. Traffic distributed evenly, performance improved.

This isn't academic knowledge—it's the difference between:

  • Choosing the right load balancer

    • Understanding L4 vs L7 = you pick the right tool
    • Not understanding = you use L7 everywhere, wonder why latency is high
  • Debugging uneven load

    • Understanding algorithms = you know why traffic is uneven
    • Not understanding = you blame the servers, everything except the LB
  • Configuring health checks correctly

    • Understanding health check types = you catch failures early
    • Not understanding = cascading failures when one backend dies

Quick Win: Check Your Load Distribution

Before we dive deeper, let's see if your load is balanced:

# Make 100 requests, see distribution
for i in {1..100}; do
    curl -s http://localhost/api/whoami | grep server
done | sort | uniq -c

# Expected (balanced):
#   34 server1
#   33 server2
#   33 server3

# Problematic (uneven):
#   90 server1
#    5 server2
#    5 server3

The Mental Model

Layer 4: The Blind Forwarder

Think of L4 as a receptionist who only knows addresses:

  • "Someone wants to talk to anyone at port 443"
  • "I'll connect them to Server 2 (my turn in round-robin)"
  • "I don't know or care what they're talking about"

What L4 sees: IP addresses and ports ONLY

  • Source IP: 203.0.113.50
  • Dest IP: 10.0.0.100 (LB VIP)
  • Source port: 52431
  • Dest port: 443

What L4 CANNOT see: HTTP headers, URL, body (it's encrypted!)

Layer 7: The Smart Router

Think of L7 as a concierge who reads the request:

  • "Someone wants /api/users with a Premium-User header"
  • "That goes to the Premium API cluster"
  • "Oh, and decrypt their SSL first, then re-encrypt to backend"

What L7 sees: Everything

  • Full HTTP request (method, URL, headers, body)
  • Can route based on content
  • Can terminate SSL, inspect, modify

Visual Comparison

LAYER 4 LOAD BALANCING:
══════════════════════

    Client                  L4 Load Balancer              Servers
       │                         │                           │
       │──── TCP to LB:443 ─────►│                           │
       │                         │── Rewrites dest IP ──►    │
       │                         │   to Server 1:443         │
       │                         │                      [Server 1]
       │◄────────────────────────│◄─────────────────────[Server 2]
       │    (Connection passed   │                      [Server 3]
       │     through directly)   │                           │
    
    Latency added: ~10μs (just packet forwarding)
    Throughput: Millions of connections/sec


LAYER 7 LOAD BALANCING:
══════════════════════

    Client                  L7 Load Balancer              Servers
       │                         │                           │
       │─── HTTPS to LB ────────►│                           │
       │                         │ (SSL terminated here!)    │
       │                         │                           │
       │                         │ Now I can read:           │
       │                         │ GET /api/users HTTP/1.1   │
       │                         │ Host: api.example.com     │
       │                         │ X-User-Tier: premium      │
       │                         │                           │
       │                         │ Route decision:           │
       │                         │   /api/* → API pool       │
       │                         │   premium → fast servers │
       │                         │                           │
       │                         │──── HTTP to Server ─────► [API-1]
       │◄────────────────────────│◄───────────────────────── [API-2]
       │                         │                           [Web-1]
       │                         │                           [Web-2]
    
    Latency added: ~1-5ms (SSL termination, parsing, routing)
    Throughput: Thousands of connections/sec

Common Mistakes (I've Made These)

Mistake #1: "L7 is always better"

Why it's wrong: L7 adds 1-5ms latency and has lower throughput. For non-HTTP protocols or ultra-low latency requirements, L4 is better.

Right approach: Use L4 for:

  • Non-HTTP protocols (databases, Redis, gRPC)
  • Ultra-low latency (<1ms budget)
  • Very high connection counts (millions)

Use L7 for:

  • HTTP/HTTPS traffic
  • Content-based routing needed
  • SSL termination at edge

Mistake #2: "Round-robin is always fair"

Why it's wrong: Round-robin ignores server capacity and current load. If Server 1 is handling slow requests, it still gets the same share as fast servers.

Right approach: Use least-connections for variable request durations. Use weighted round-robin for different server capacities.

Mistake #3: "TCP health checks are enough"

Why it's wrong: TCP health checks only verify the port is open, not that the application works. I've seen backends accept connections but return 500 errors—TCP check says healthy, but the app is broken.

Right approach: Use HTTP health checks for L7. For critical services, use application-level health endpoints that check dependencies.


Trade-offs: The Complete Picture

L4 vs L7 Feature Comparison

Feature L4 L7
Speed ~10μs latency added ~1-5ms latency added
Throughput Millions of connections Thousands of connections
SSL handling Pass-through Terminate & inspect
Routing by URL No Yes
Routing by header No Yes
Health checks TCP port only HTTP endpoints
Sticky sessions IP hash only Cookie-based
Caching No Yes
Logging Limited Full HTTP logs
Connection pooling No Yes (to backends)

When to Use L4

✅ USE L4 FOR:

1. Non-HTTP protocols
   - Database connections (MySQL, PostgreSQL)
   - Redis/Memcached
   - gRPC (though L7 gRPC LBs exist)
   - Custom TCP protocols

2. Ultra-low latency requirements
   - High-frequency trading
   - Gaming servers
   - Real-time applications

3. Pass-through encryption
   - When backend must terminate SSL
   - mTLS where both sides need certs

4. Very high connection counts
   - IoT applications (millions of connections)
   - WebSocket servers

TRADE-OFF ACCEPTED: No content-aware routing, limited visibility

When to Use L7

✅ USE L7 FOR:

1. HTTP/HTTPS traffic (most web apps!)
   - SSL termination at the edge
   - Path-based routing (/api/* vs /*)
   - Header-based routing

2. Multi-tenant applications
   - Route by Host header
   - Route by customer ID in header

3. A/B testing & canary deployments
   - Route 10% to new version
   - Route based on cookie/header

4. Security features needed
   - WAF integration
   - Request filtering
   - Rate limiting

5. Debugging & monitoring
   - Need HTTP status codes
   - Need request logging
   - Need latency per endpoint

TRADE-OFF ACCEPTED: Higher latency, lower throughput, more complexity

Load Balancing Algorithms: Trade-offs

Algorithm Comparison

1. Round Robin

  • Request 1 → Server 1, Request 2 → Server 2, Request 3 → Server 3
  • ✅ Simple, fair distribution
  • ❌ Ignores server capacity and current load
  • Use when: All servers are identical, requests are similar

2. Weighted Round Robin

  • Server 1 (weight=3): Gets 3 of every 5 requests
  • Server 2 (weight=2): Gets 2 of every 5 requests
  • ✅ Handles different server capacities
  • ❌ Weights are static
  • Use when: Servers have different specs

3. Least Connections

  • Current state: Server 1: 5, Server 2: 2 ← Next request goes here, Server 3: 7
  • ✅ Adapts to actual load, handles slow requests well
  • ❌ More overhead to track
  • Use when: Request durations vary significantly (most common!)

4. IP Hash

  • hash(client_ip) % servers = server_index
  • ✅ Session persistence without cookies
  • ❌ Uneven distribution if clients have different request rates
  • Use when: Need basic sticky sessions without L7

5. Consistent Hashing

  • Key hashes to point on ring, walks clockwise to find server
  • ✅ Minimal redistribution on server change, great for caches
  • ❌ More complex to implement
  • Use when: Cache servers, stateful services

6. Least Response Time

  • Track response times, send to fastest server
  • ✅ Optimizes for latency
  • ❌ Can overload fastest server
  • Use when: Latency is critical, servers have varying performance

Health Checks: Critical Trade-offs

The Health Check Spectrum

AGGRESSIVE (Catch failures fast)     PASSIVE (Less overhead)
     │                                    │
     ▼                                    ▼
┌─────────────┐  ┌─────────────┐  ┌─────────────┐
│ Check every │  │ Check every │  │ Check every │
│ 1 second    │  │ 5 seconds   │  │ 30 seconds  │
│             │  │             │  │             │
│ Detection:  │  │ Detection:  │  │ Detection:  │
│ 1-3 seconds │  │ 5-15 seconds│  │ 30-90 sec   │
│             │  │             │  │             │
│ Overhead:   │  │ Overhead:   │  │ Overhead:   │
│ High        │  │ Medium      │  │ Low         │
└─────────────┘  └─────────────┘  └─────────────┘

Health Check Types

TCP Health Check (L4):

  • Check: Can I connect to port 8080?
  • ✅ Very fast, low overhead
  • ❌ Doesn't verify application works
  • Problem: Backend might accept connections but return 500 errors

HTTP Health Check (L7):

  • Check: Does GET /health return 200?
  • ✅ Verifies application code runs, can check dependencies
  • ❌ Slower, more load on backend
  • Example:
@app.route('/health')
def health():
    # Check database
    try:
        db.execute("SELECT 1")
    except:
        return "DB DOWN", 500
    
    # Check cache
    try:
        cache.ping()
    except:
        return "CACHE DOWN", 500
    
    return "OK", 200

Recommendation:

  • Simple /health for LB (fast, frequent)
  • Deep /health/full for monitoring (slower, less frequent)

Real-World Trade-off Stories

GitHub's Load Balancer Outage (2018)

Situation: Network maintenance caused 43 seconds of MySQL primary unreachability.

What went wrong:

  1. Load balancer health checks saw primary as down
  2. Routed writes to replica (which couldn't accept writes)
  3. Data inconsistency when primary came back

Root cause: Health check was TCP-only, didn't verify MySQL was actually primary. The port was open, but MySQL was in read-only replica mode.

Fix: Application-level health checks that verify role (primary vs replica). The health endpoint checks SHOW STATUS LIKE 'wsrep_local_state' or similar to ensure the database can accept writes.

Lesson: TCP health checks aren't enough for stateful services. You need to verify the service is in the correct state, not just that the port is open.

Netflix's Load Shedding Strategy

Situation: During traffic spikes, backends were overwhelmed. Traditional load balancers would queue requests, eventually timeout, but clients already gave up.

Netflix's approach:

  • When backends are slow, actively DROP requests
  • Better to fail fast than queue forever
  • Clients can retry against a different server

Implementation: Custom LB logic that measures backend latency and rejects early. If a backend's p99 latency exceeds threshold, the LB stops sending new requests and returns 503 immediately.

Trade-off accepted:

  • Some requests fail that might have succeeded
  • But: System stays responsive, clients can retry

Lesson: Sometimes the right behavior is to reject requests, not queue them. Queueing can cause cascading failures—better to fail fast and let clients retry.

The Great Sticky Session Debate

Situation: Team wanted sticky sessions for shopping cart data stored in-memory on backend servers.

Option A: Sticky sessions via LB

  • Pros: Simple, no code changes
  • Cons: Uneven load, server failure loses session (lost shopping cart = lost revenue)

Option B: External session store (Redis)

  • Pros: Any server can handle any request, fault tolerant
  • Cons: Additional infrastructure, latency (but Redis is fast—<1ms)

Decision: Redis for session storage, stateless backends.

Why: Server failures are common. Losing a shopping cart = lost revenue. The small latency cost (<1ms) is worth the reliability gain.

Lesson: Sticky sessions are usually a crutch. Externalize state when possible. The reliability gain is worth the small latency cost.


Configuration Examples

Nginx L7 Load Balancer

# Define upstream pools
upstream api_servers {
    least_conn;  # Best for variable request times
    server api1.internal:8080 weight=3;
    server api2.internal:8080 weight=2;
    server api3.internal:8080 backup;  # Only if others down
    
    keepalive 32;  # Connection pool to backends
}

server {
    listen 443 ssl http2;
    
    # SSL termination
    ssl_certificate /etc/ssl/cert.pem;
    ssl_certificate_key /etc/ssl/key.pem;
    
    # Route by path
    location /api/ {
        proxy_pass http://api_servers;
        proxy_http_version 1.1;
        proxy_set_header Connection "";  # Enable keepalive
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        
        # Timeouts
        proxy_connect_timeout 5s;
        proxy_read_timeout 30s;
    }
}

HAProxy L4 Load Balancer

# L4 load balancing for database
frontend mysql_front
    bind *:3306
    mode tcp
    default_backend mysql_back

backend mysql_back
    mode tcp
    balance roundrobin
    
    # TCP health check
    option tcp-check
    
    server mysql1 10.0.0.1:3306 check
    server mysql2 10.0.0.2:3306 check backup

Debugging Load Balancer Issues

Check Backend Health

# Nginx: Check upstream status
curl http://localhost/nginx_status

# HAProxy: Stats socket
echo "show servers state" | socat stdio /var/run/haproxy.sock

# AWS ALB: Check target health
aws elbv2 describe-target-health --target-group-arn <arn>

Test Distribution

# Make 100 requests, see distribution
for i in {1..100}; do
    curl -s http://localhost/api/whoami | grep server
done | sort | uniq -c

# Expected (balanced):
#   34 server1
#   33 server2
#   33 server3

# Problematic (uneven):
#   90 server1
#    5 server2
#    5 server3

Debug Headers

# See what headers LB is adding
curl -v http://localhost/api/test 2>&1 | grep -i "x-"

# Common headers to check:
# X-Forwarded-For: Original client IP
# X-Forwarded-Proto: Original protocol (http/https)
# X-Request-ID: Request tracing ID

Decision Framework

□ Is traffic HTTP/HTTPS?
  → Yes: Default to L7 (more features)
  → No: Must use L4

□ Do I need content-based routing?
  → Path routing (/api vs /web): L7
  → Header routing: L7
  → Just distribute load: L4 is fine

□ What's my latency budget?
  → <1ms added: L4
  → <10ms acceptable: L7 is fine

□ Do I need SSL termination at LB?
  → Yes: L7
  → Backend must terminate: L4 pass-through

□ How do backends handle sessions?
  → Stateless: Any algorithm works
  → Stateful: IP hash, consistent hash, or sticky sessions (but prefer external state!)

□ What health checks are needed?
  → TCP port: L4
  → HTTP endpoint: L7
  → Application logic: Custom health endpoints

Key Takeaways

  1. L4 is fast, L7 is smart - choose based on needs (L4 for non-HTTP, L7 for HTTP)
  2. Health checks must match your failure modes - TCP check won't catch app errors
  3. Sticky sessions are usually a crutch - externalize state instead (Redis, database)
  4. Algorithm matters - least-connections handles variable request times better than round-robin
  5. Debug with metrics - watch distribution, latency, error rates
  6. Fail fast, don't queue - Netflix's load shedding shows rejecting requests can be better than queuing

What's Next

Now that you understand how to distribute traffic, the next question is: How do databases actually find your data efficiently?

In the next article, Database Indexes Deep Dive - Read Speed vs Write Speed, you'll learn:

  • How B-tree indexes work (the most common type)
  • Why indexes speed up reads but slow down writes
  • How to design indexes for your queries
  • When to use different index types (B-tree, hash, GIN, etc.)

This builds on what you learned here—load balancers distribute requests, but databases need indexes to find data quickly.

Continue to Article 8: Database Indexes


This article is part of the Backend Engineering Mastery series. Load balancing knowledge is essential for scaling systems.