25 Jan 2026 8 min read Backend Engineering Mastery

#5. TCP Deep Dive - Reliability vs Latency

The One Thing to Remember

TCP trades latency for reliability. Every feature of TCP—the three-way handshake, acknowledgments, retransmissions—adds delay but guarantees delivery. Understanding this trade-off helps you choose the right protocol and debug network issues.

Building on Article 4

In Article 4: CPU Scheduling & Context Switches, you learned how the OS switches between processes when they block on I/O. But here's the question: What are those processes actually waiting for when they do network I/O?

Understanding TCP helps you understand those I/O waits—and why network issues are among the hardest to debug.

← Previous: Article 4 - CPU Scheduling & Context Switches

Why This Matters (A Production Horror Story)

I once debugged a service that suddenly stopped accepting new connections. The error: "Cannot assign requested address." Investigation showed 60,000 sockets in TIME_WAIT state. The service was creating a new TCP connection for every HTTP request, and each closed connection sat in TIME_WAIT for 60 seconds. Port exhaustion. The fix? HTTP connection pooling. Two lines of code.

This isn't academic knowledge—it's the difference between:

Debugging network issues in hours vs days
- Understanding TCP states = you know what to check (ss -tan)
- Not understanding = you blame the load balancer, the firewall, everything
Choosing the right protocol
- Understanding TCP vs UDP = you pick the right tool
- Not understanding = you use TCP for everything, hit latency limits
Building high-throughput systems
- Understanding connection pooling = you avoid port exhaustion
- Not understanding = your service crashes under load

Quick Win: Check Your TCP Connections

Before we dive deeper, let's see what your system is doing:

# Count connections by state
ss -tan | awk '{print $1}' | sort | uniq -c | sort -rn

# Expected healthy output:
#   1500 ESTAB
#     50 TIME-WAIT
#      1 LISTEN

# Problematic outputs:
#  60000 TIME-WAIT    → Connection pooling issue!
#    500 CLOSE-WAIT   → App not closing connections!
#    500 SYN-SENT     → Server not responding!

The TCP Mental Model

TCP is a Reliable Pipe

Imagine sending letters through an unreliable postal service:

Letters might get lost
Letters might arrive out of order
Letters might arrive twice

TCP transforms this into a reliable phone call:

Everything arrives
Everything arrives in order
Everything arrives exactly once

The cost? Extra paperwork (headers, ACKs) and waiting (retransmissions).

The Three-Way Handshake

CLIENT                                        SERVER
   │                                             │
   │──────── SYN (seq=100) ───────────────────► │
   │         "Hi, I want to connect"             │
   │         "My sequence starts at 100"         │
   │                                             │
   │◄─────── SYN-ACK (seq=300, ack=101) ─────── │
   │         "Hi back! I acknowledge your 100"   │
   │         "My sequence starts at 300"         │
   │                                             │
   │──────── ACK (seq=101, ack=301) ──────────► │
   │         "Great, I acknowledge your 300"     │
   │                                             │
   │         CONNECTION ESTABLISHED              │
   │◄────────────────────────────────────────►  │

Why three steps?
1. Client proves it can send
2. Server proves it can send AND receive
3. Client proves it can receive

Both sides now know: "We can communicate bidirectionally"

Cost: 1-2 round-trip times (RTT). On a local network, this is ~1ms. Across continents, it's 100-200ms. This is why connection pooling matters.

TCP State Machine (The Important States)

                    ┌──────────────┐
                    │    CLOSED    │
                    └──────┬───────┘
                           │
        ┌──────────────────┼──────────────────┐
        │ Server           │           Client │
        │ listen()         │          connect()│
        ▼                  │                  ▼
 ┌──────────────┐          │         ┌──────────────┐
 │   LISTEN     │          │         │   SYN_SENT   │
 └──────┬───────┘          │         └──────┬───────┘
        │ recv SYN         │                │ recv SYN-ACK
        ▼                  │                │ send ACK
 ┌──────────────┐          │                │
 │   SYN_RCVD   │          │                │
 └──────┬───────┘          │                │
        │ recv ACK         │                │
        └──────────────────┼────────────────┘
                           │
                    ┌──────▼───────┐
                    │ ESTABLISHED  │◄─── Normal data transfer
                    └──────┬───────┘
                           │
                           │ close()
                           ▼
                    ┌──────────────┐
                    │   FIN_WAIT   │
                    └──────┬───────┘
                           │ recv ACK + FIN
                           ▼
                    ┌──────────────┐
                    │  TIME_WAIT   │◄─── The famous 2*MSL wait!
                    └──────┬───────┘
                           │ 60-120 seconds
                           ▼
                    ┌──────────────┐
                    │    CLOSED    │
                    └──────────────┘

TCP States: What Each Means

State	What's Happening	Common Problem
LISTEN	Server waiting for connections	None
SYN_SENT	Client waiting for SYN-ACK	Server not responding
SYN_RCVD	Server waiting for ACK	SYN flood attack
ESTABLISHED	Normal data flow	None
FIN_WAIT_1/2	Closing, waiting for ACK	Slow close
TIME_WAIT	*Waiting 2MSL (60-120s)**	Port exhaustion!
CLOSE_WAIT	Received FIN, app hasn't closed	App bug!

Common Mistakes (I've Made These)

Mistake #1: "Creating a new connection per request is fine"

Why it's wrong: Each connection requires a 3-way handshake (1-2 RTT), and each closed connection sits in TIME_WAIT for 60-120 seconds. At high throughput, you exhaust ephemeral ports.

Real example: An application sending 710 HTTP POST requests/second with non-keep-alive connections accumulated ~28,000 TIME_WAIT connections, matching the ephemeral port range (32768-61000) and preventing new connections.

Right approach: Always use connection pooling for:

Database connections
HTTP/1.1 (keep-alive) or HTTP/2
gRPC channels
Redis/Memcached

Mistake #2: "TIME_WAIT is a problem, I should disable it"

Why it's wrong: TIME_WAIT exists for a reason—it prevents old packets from corrupting new connections. Disabling it can cause data corruption.

Right approach: Fix the root cause—use connection pooling. TIME_WAIT is normal, but you shouldn't have thousands of them.

Mistake #3: "CLOSE_WAIT is normal"

Why it's wrong: CLOSE_WAIT means the remote side closed the connection, but your application hasn't called close() yet. This is a bug—you're leaking connections.

Right approach: Find where connections are opened but not closed (often in error handling paths). Use finally blocks or context managers.

Trade-offs: TCP Design Decisions

Trade-off #1: Reliability vs Latency

┌─────────────────────────────────────────────────────────────────┐
│                     RELIABLE (TCP)                              │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Sender ──► [Data 1] ──► [Data 2] ──► [Data 3] ──►             │
│                ◄─── ACK ───┘     ◄─── ACK ───┘                 │
│                                                                 │
│  If packet lost:                                                │
│  ┌────────────────────────────────────────────────────────┐    │
│  │ Sender waits... timeout... retransmit... wait for ACK │    │
│  │ Total delay: 100-500ms for retransmission!            │    │
│  └────────────────────────────────────────────────────────┘    │
│                                                                 │
│  ✓ Every byte delivered                                        │
│  ✗ One lost packet blocks everything behind it (HOL blocking)  │
│                                                                 │
├─────────────────────────────────────────────────────────────────┤
│                    UNRELIABLE (UDP)                             │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Sender ──► [Data 1] ──► [Data 2] ──► [Data 3] ──►             │
│              (lost!)        ✓            ✓                      │
│                                                                 │
│  ✓ No waiting for lost packets                                 │
│  ✓ No head-of-line blocking                                    │
│  ✗ Some data never arrives                                     │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

When to accept UDP's trade-off:
- Video streaming (old frame not useful)
- Gaming (current position > old position)
- DNS (will retry if lost)
- VoIP (missing audio < delayed audio)

Trade-off #2: Connection Setup Cost vs State Management

NEW CONNECTION EVERY REQUEST:               CONNECTION POOLING:
─────────────────────────────               ───────────────────

For 1000 requests:

1000 × (3-way handshake)                    1 × (3-way handshake)
= 1000 × ~30ms = 30 seconds wasted          + Reuse connection 999 times
                                            = ~30ms total overhead

Plus:                                       Complexity:
- 1000 TIME_WAIT sockets                    - Connection pool management
- Port exhaustion risk                      - Health checking
- TLS renegotiation each time               - Proper cleanup

Always use connection pooling for:
- Database connections
- HTTP/2 connections
- gRPC channels
- Redis/Memcached

Trade-off #3: Nagle's Algorithm vs Latency

The Nagle + Delayed ACK Disaster

This is the most common TCP performance bug I've seen:

CLIENT (Nagle ON)                    SERVER (Delayed ACK ON)
─────────────────                    ───────────────────────

write("GET /")                       
  → Sent immediately (buffer empty)
                              ──────►recv: "GET /"
                                     Wait for more data to ACK together
                                     (delayed ACK: wait up to 40ms)

write(" HTTP/1.1")
  → Nagle says: wait for ACK first
  → Waiting...                      
                                     Timer expires (40ms)
                              ◄──────ACK finally sent!

  → Now send " HTTP/1.1"
                              ──────►Request complete after 40ms delay!

TOTAL ADDED LATENCY: 40ms for a simple HTTP request!

FIX: TCP_NODELAY for request-response protocols

Enable TCP_NODELAY for:

Interactive protocols (SSH)
Request-response patterns (HTTP, gRPC)
Gaming

Keep Nagle (default) for:

Bulk transfers
Streaming large data
File transfers

Real-World Trade-off Stories

AppsFlyer: TCP Connections That Refused to Die

Situation: AppsFlyer Engineering documented a production TCP connection leak that required deep Linux networking stack analysis. The service had connections stuck in various states, preventing proper cleanup.

Investigation: Using ss, netstat, and tcpdump, they traced the issue to application-level socket leaks where connections weren't properly closed, leading to CLOSE_WAIT accumulation.

Key insight: Understanding which side enters TIME_WAIT is critical—TIME_WAIT appears on the side that actively closes (sends FIN first), while CLOSE_WAIT appears on the side receiving the FIN. This distinction is often misunderstood in troubleshooting.

References:

AppsFlyer Engineering: The Story of the TCP Connections that Refused to Die

Lesson: Use proper debugging tools (ss, netstat, tcpdump) to monitor connection states. CLOSE_WAIT accumulation means your application has a bug—it's not closing connections properly.

TIME_WAIT Port Exhaustion (Real Production Case)

Situation: An application sending 710 HTTP POST requests/second with non-keep-alive connections accumulated ~28,000 TIME_WAIT connections, matching the ephemeral port range (32768-61000) and preventing new connections.

The math:

710 connections/second closed
Each sits in TIME_WAIT for ~30-60 seconds
710 × 30 = ~21,300 TIME_WAITs at any given time
Ephemeral port range: ~28,000 ports
Result: Port exhaustion, service can't accept new connections

Root causes:

Not reusing connections (no connection pooling)
High connection throughput without pooling
Application bugs: Socket leaks where applications don't properly close sockets

Solutions:

Connection pooling (best solution)
SO_REUSEADDR on server
tcp_tw_reuse=1 (careful—can cause issues)
Increase port range: ip_local_port_range

References:

TCP TIME_WAIT: The Hidden Devil Beneath the Details

Lesson: Never create a new TCP connection per request in high-throughput scenarios. Connection pooling is not optional—it's essential.

Code Examples

Setting TCP Options

import socket

def create_optimized_connection(host, port):
    """Create a TCP connection with optimal settings"""
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    
    # Disable Nagle's algorithm for request-response
    sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
    
    # Enable keepalive to detect dead connections
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)
    
    # Linux-specific: Tune keepalive timing
    sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60)
    sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10)
    sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 5)
    
    sock.connect((host, port))
    return sock

Simple Connection Pool

import socket
import queue

class ConnectionPool:
    """Simple connection pool to avoid TCP overhead"""
    
    def __init__(self, host, port, max_connections=10):
        self.host = host
        self.port = port
        self.pool = queue.Queue(maxsize=max_connections)
        
        # Pre-create connections
        for _ in range(max_connections):
            conn = self._create_connection()
            self.pool.put(conn)
    
    def _create_connection(self):
        sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
        sock.connect((self.host, self.port))
        return sock
    
    def get_connection(self, timeout=5):
        """Borrow a connection from the pool"""
        try:
            return self.pool.get(timeout=timeout)
        except queue.Empty:
            return self._create_connection()
    
    def return_connection(self, conn):
        """Return a connection to the pool"""
        try:
            conn.getpeername()  # Raises if disconnected
            self.pool.put_nowait(conn)
        except (socket.error, queue.Full):
            try:
                conn.close()
            except:
                pass

# Usage
pool = ConnectionPool('localhost', 8080, max_connections=10)

# Instead of: socket.connect() for every request
conn = pool.get_connection()
try:
    conn.send(b"GET / HTTP/1.1\r\n\r\n")
    response = conn.recv(1024)
finally:
    pool.return_connection(conn)  # Don't close, return to pool!

Debugging TCP Issues

See Connection States

# Count connections by state
ss -tan | awk '{print $1}' | sort | uniq -c | sort -rn

# Find process using a port
ss -tlnp | grep :8080

# See all connections to a specific port
ss -tan 'dport = :443'

# Watch connections in real-time
watch -n 1 'ss -tan | head -20'

Analyze TCP Settings

# Check buffer sizes
sysctl net.core.rmem_max
sysctl net.core.wmem_max

# Check TIME_WAIT settings
sysctl net.ipv4.tcp_tw_reuse

# Check port range
sysctl net.ipv4.ip_local_port_range

Decision Framework

□ What's my latency requirement?
  → <10ms: TCP_NODELAY, connection pooling
  → <100ms: Standard TCP is fine
  → <1s: Anything works

□ What's my throughput requirement?
  → >10K requests/sec: Connection pooling mandatory
  → >100K: Consider HTTP/2 or gRPC multiplexing
  
□ Is data loss acceptable?
  → No: TCP
  → Yes (real-time): UDP

□ Am I seeing TIME_WAIT accumulation?
  → Fix: Connection pooling
  → Temporary: SO_REUSEADDR, tcp_tw_reuse

□ Am I seeing CLOSE_WAIT accumulation?  
  → Fix: Find and fix connection leak in your code

Key Numbers to Know

Metric	Typical Value	Notes
Handshake latency	1-2 RTT	3-way handshake
TIME_WAIT duration	60-120s	2 * MSL
Default backlog	128	Increase for high-traffic servers
Ephemeral port range	32768-60999	~28K ports
TCP keepalive default	2 hours	Too long! Customize it

Memory Trick

"SYN-ACK-DATA-FIN" is like a phone call:

SYN: Dialing (ring ring...)
SYN-ACK: "Hello?" (picked up)
ACK: "Hi, it's me" (confirmed)
DATA: The conversation
FIN: "Bye!" "Bye!" (mutual hang up)

Self-Assessment

Before moving on:

[ ] Can you draw the three-way handshake from memory?
[ ] Do you know why TIME_WAIT exists and how to manage it?
[ ] Can you diagnose connection problems from ss -tan output?
[ ] Know when to use TCP_NODELAY?
[ ] Understand the Nagle + Delayed ACK interaction?
[ ] Know the difference between TIME_WAIT and CLOSE_WAIT?

Key Takeaways

TCP trades latency for reliability - every byte delivered, in order, exactly once
Connection pooling is essential for high-throughput systems (not optional!)
TIME_WAIT is normal but can exhaust ports without pooling
CLOSE_WAIT is a bug in your application (not closing connections)
TCP_NODELAY for request-response protocols (avoids Nagle + Delayed ACK trap)
Always measure - network issues are subtle, use ss, tcpdump, netstat

What's Next

Now that you understand TCP, the next question is: How has HTTP evolved to work better over TCP?

In the next article, HTTP Evolution (1.1→2→3) - Simplicity vs Performance, you'll learn:

Why HTTP/2 uses multiplexing (solving TCP head-of-line blocking)
Why HTTP/3 uses UDP (QUIC protocol)
The trade-offs between simplicity and performance
When to use each version in production

This builds on what you learned here—HTTP/2 and HTTP/3 are attempts to work around TCP's limitations while keeping its benefits.

→ Continue to Article 6: HTTP Evolution

This article is part of the Backend Engineering Mastery series. TCP knowledge is fundamental for debugging network issues.