Backend Engineering Mastery - Complete Article Series

A comprehensive guide for backend engineers, engineering managers, and principal engineers
27 Core Articles | ~50,000 words | 8+ hours of reading
Plus: System Design Mastery Series (4 articles)


Series Philosophy

After 10 years of building and breaking systems in production, I've learned one thing: the best engineers don't memorize facts—they understand trade-offs.

This series isn't about theory. It's about the decisions you'll face at 2 AM when your system is on fire. Each article includes:

  • The One Thing to Remember - One insight that changes how you think
  • Why This Matters - Real production incidents I've seen (and caused)
  • Visual Models - ASCII diagrams you can draw on a whiteboard
  • Trade-off Analysis - Decision frameworks, not just definitions
  • Code Examples - Runnable snippets that demonstrate the concept
  • Real-World Stories - War stories from companies you know
  • Self-Assessment - Verify you actually understand, not just memorized

Quick Navigation

Jump to a section:


Part 1: OS & Systems Foundation (Articles 1-4)

Understanding the operating system layer that everything runs on. Skip this at your own peril—I've seen too many engineers blame the database when the real problem was in the OS.

# Article Key Trade-off Time Link
01 Process vs Thread Isolation vs Efficiency 10 min Read Article →
02 Memory Management Virtual vs Physical, Swap vs OOM 10 min Read Article →
03 File I/O & Durability Performance vs Durability (fsync) 10 min Read Article →
04 CPU Scheduling & Context Switches Throughput vs Latency 10 min Read Article →

After Part 1, you'll understand: Why processes are isolated, how memory really works (not what you think), what fsync actually does, and why context switches can kill your performance.

Next: Part 2: Networking


Part 2: Networking (Articles 5-7)

How data moves between machines—the foundation of distributed systems. This is where most engineers get confused, and I don't blame them.

# Article Key Trade-off Time Link
05 TCP Deep Dive Reliability vs Latency 12 min Read Article →
06 HTTP Evolution (1.1→2→3) Simplicity vs Performance 11 min Read Article →
07 Load Balancing (L4 vs L7) Speed vs Features 11 min Read Article →

After Part 2, you'll understand: TCP states and how to debug connection issues, why HTTP/3 uses UDP (it's not what you think), and when to use L4 vs L7 load balancers.

Next: Part 3: Storage & Databases


Part 3: Storage & Databases (Articles 8-11)

How data is stored, indexed, and queried efficiently. I've lost count of how many "slow query" issues were actually index problems.

# Article Key Trade-off Time Link
08 Database Indexes Deep Dive Read Speed vs Write Speed 11 min Read Article →
09 ACID Transactions Explained Consistency vs Performance 10 min Read Article →
10 Isolation Levels & Anomalies Safety vs Concurrency 10 min Read Article →
11 SQL vs NoSQL Decision Guide Flexibility vs Scale 10 min Read Article →

After Part 3, you'll understand: Why indexes slow writes (and when that's okay), what SERIALIZABLE actually means (hint: it's not what most people think), and when to choose NoSQL (it's rarer than you think).

Next: Part 4: Distributed Systems


Part 4: Distributed Systems (Articles 12-16)

Scaling beyond a single machine—where things get interesting. This is where I've made the most mistakes, and learned the most.

# Article Key Trade-off Time Link
12 CAP Theorem Demystified Consistency vs Availability 10 min Read Article →
13 Sharding Strategies Query Flexibility vs Scale 11 min Read Article →
14 Replication Patterns Consistency vs Latency 10 min Read Article →
15 Consensus & Raft Availability vs Strong Consistency 12 min Read Article →
16 Time, Clocks & Ordering Simplicity vs Accuracy 10 min Read Article →

After Part 4, you'll understand: What CAP really means (most people get it wrong), how to choose shard keys (this decision will haunt you), leader election, and why distributed time is harder than it should be.

Next: Part 5: Production Engineering


Part 5: Production Engineering (Articles 17-20)

Running systems reliably in production. This is where theory meets reality, and reality usually wins.

# Article Key Trade-off Time Link
17 Reliability Patterns Availability vs Complexity 11 min Read Article →
18 Caching Strategies Performance vs Consistency 10 min Read Article →
19 Observability (Metrics, Logs, Traces) Coverage vs Overhead 11 min Read Article →
20 Security Fundamentals Security vs Convenience 10 min Read Article →

After Part 5, you'll understand: Circuit breakers (and when they backfire), cache invalidation (the two hard things in CS), the RED/USE methods, and OAuth2 flows (without the confusion).

Next: Part 6: Cloud-Native & Modern Patterns


Part 6: Cloud-Native & Modern Patterns (Articles 21-24)

Building for the cloud era. Containers, orchestration, and message queues—the tools that make distributed systems manageable.

# Article Key Trade-off Time Link
21 Containers & Docker Isolation vs Overhead 10 min Read Article →
22 Kubernetes Essentials Abstraction vs Complexity 12 min Read Article →
23 Message Queues (Kafka vs RabbitMQ) Throughput vs Latency 10 min Read Article →
24 Event-Driven Architecture Decoupling vs Complexity 11 min Read Article →

After Part 6, you'll understand: Container best practices (and anti-patterns), K8s core concepts (without the marketing), when to use Kafka vs RabbitMQ (they're not interchangeable), and event-driven architecture (when it helps, when it hurts).

Next: Part 7: Engineering Leadership


Part 7: Engineering Leadership (Articles 25-27)

Skills for senior engineers, managers, and principal engineers. This is what separates good engineers from great ones.

# Article Audience Time Link
25 Architecture Decision Records Senior+ 12 min Read Article →
26 Technical Debt Strategy Manager/Principal 11 min Read Article →
27 Build vs Buy Decisions Principal/Director 10 min Read Article →

After Part 7, you'll know: How to document decisions (so future you thanks past you), manage tech debt strategically (not reactively), and make build vs buy choices (without regret).


System Design Mastery Series (Separate Series)

Applying everything you've learned to real design problems. This is a separate series because system design deserves its own deep dive.

Note: The System Design Mastery series builds on the Backend Engineering Mastery series. I recommend completing Parts 1-4 before diving into system design.

# Article Focus Time Link
SD-01 System Design Framework 5-step approach for any problem 13 min Read Article →
SD-02 Design: URL Shortener Simple, scalable system 10 min Read Article →
SD-03 Design: Distributed Cache High-performance caching 11 min Read Article →
SD-04 Design: Real-Time Chat WebSockets, ordering, fan-out 12 min Read Article →

After the System Design series, you'll have: A repeatable framework for any system design problem, practice with common patterns, and the confidence to design systems that scale.

View System Design Mastery Series Index


Reading Paths by Role

Junior Engineer (0-2 years)

Focus: Foundations first. Don't skip the basics—I've seen too many engineers try to learn distributed systems without understanding processes and threads.

Week 1-2: OS Foundation (CRITICAL)
├── 01. Process vs Thread
├── 02. Memory Management
├── 03. File I/O & Durability
└── 04. CPU Scheduling

Week 3-4: Networking
├── 05. TCP Deep Dive
├── 06. HTTP Evolution
└── 07. Load Balancing

Week 5-6: Database Fundamentals
├── 08. Database Indexes
├── 09. ACID Transactions
└── 10. Isolation Levels

Week 7-8: Production Patterns
├── 17. Reliability Patterns
├── 18. Caching Strategies
└── 19. Observability

Mid-Level Engineer (2-5 years)

Focus: Distributed systems and system design. This is where you level up.

Week 1: Foundation Review (skim if familiar)
├── 01-04 (OS & Systems)
└── 05-07 (Networking)

Week 2-3: Distributed Systems (MUST DO)
├── 12. CAP Theorem
├── 13. Sharding Strategies
├── 14. Replication Patterns
├── 15. Consensus & Raft
└── 16. Time & Ordering

Week 4: System Design Practice
├── SD-01. System Design Framework
├── SD-02. URL Shortener
├── SD-03. Distributed Cache
└── SD-04. Chat System

Week 5: Production & Cloud
├── 17-20 (Production Engineering)
└── 21-24 (Cloud-Native)

Senior Engineer (5+ years)

Focus: Depth, leadership, and system design mastery. You know the basics—now master the trade-offs.

Week 1: Distributed Systems Mastery
├── 12-16 (all distributed systems)
└── Focus on trade-off analysis

Week 2: System Design Excellence
├── SD-01 to SD-04 (all system design)
└── Practice explaining out loud

Week 3: Leadership Skills
├── 25. Architecture Decision Records
├── 26. Technical Debt Strategy
└── 27. Build vs Buy Decisions

Engineering Manager

Focus: Leadership articles + enough technical depth to guide teams. You don't need to code, but you need to understand the decisions.

Priority 1: Leadership Track
├── 25. Architecture Decision Records
├── 26. Technical Debt Strategy
└── 27. Build vs Buy Decisions

Priority 2: Key Technical Concepts
├── 12. CAP Theorem (for data decisions)
├── 17. Reliability Patterns (for SRE work)
└── SD-01. System Design Framework (for reviews)

Quick Reference: All Trade-offs

Topic Trade-off
Process vs Thread Isolation vs Efficiency
Virtual Memory Flexibility vs Page Fault Cost
fsync() Durability vs Performance
Context Switches Throughput vs Latency
TCP Reliability vs Latency
HTTP versions Simplicity vs Performance
L4 vs L7 LB Speed vs Features
Indexes Read Speed vs Write Speed
ACID Consistency vs Performance
Isolation Levels Safety vs Concurrency
SQL vs NoSQL Flexibility vs Scale
CAP Consistency vs Availability
Sharding Query Flexibility vs Scale
Replication Consistency vs Latency
Consensus Availability vs Strong Consistency
Time/Clocks Simplicity vs Accuracy
Circuit Breaker Availability vs Complexity
Caching Performance vs Consistency
Observability Coverage vs Overhead
Security Security vs Convenience
Containers Isolation vs Overhead
Kubernetes Abstraction vs Complexity
Kafka vs RabbitMQ Throughput vs Latency
Event-Driven Decoupling vs Complexity
Build vs Buy Control vs Speed

How to Use This Series

For Self-Study

  1. Read one article per day (or per sitting—don't rush)
  2. Run all "Try It Yourself" commands (actually do them, don't just read)
  3. Complete self-assessment checkboxes (be honest with yourself)
  4. Revisit after one week to reinforce (spaced repetition works)
  5. Teach concepts to someone else (best way to learn)

For Interview Prep

  1. Focus on System Design series (SD-01 to SD-04)
  2. Memorize trade-off tables in each article (interviewers love these)
  3. Practice drawing diagrams from memory (whiteboard skills matter)
  4. Explain concepts out loud (rubber duck method)
  5. Do 2-3 mock system design sessions (get feedback)

For Team Education

  1. Use as reading group material (1 article/week)
  2. Discuss trade-offs as a team (apply to your systems)
  3. Create team-specific examples (make it relevant)
  4. Build team ADR practice (Article 25)

Files Reference

Article # File Name URL Slug
01 01-process-vs-thread.md process-vs-thread-the-foundation-every-backend-engineer
02 02-memory-management.md memory-management-demystified-virtual-memory-page-faults-performance
03 03-file-io-durability.md file-io-durability-why-fsync-is-your-best-friend-and-worst-enemy
04 04-cpu-scheduling.md cpu-scheduling-context-switches-throughput-vs-latency
05 05-tcp-deep-dive.md tcp-deep-dive-reliability-vs-latency
06 06-http-evolution.md http-evolution-1-1-2-3-simplicity-vs-performance
07 07-load-balancing.md load-balancing-l4-vs-l7-speed-vs-features
08 08-database-indexes.md database-indexes-deep-dive-read-speed-vs-write-speed
09 09-acid-transactions.md acid-transactions-explained-consistency-vs-performance
10 10-isolation-levels.md isolation-levels-anomalies-safety-vs-concurrency
11 11-sql-vs-nosql.md sql-vs-nosql-decision-guide-flexibility-vs-scale
12 12-cap-theorem.md cap-theorem-demystified-consistency-vs-availability
13 13-sharding-strategies.md sharding-strategies-query-flexibility-vs-scale
14 14-replication-patterns.md replication-patterns-consistency-vs-latency
15 15-consensus-raft.md consensus-raft-availability-vs-strong-consistency
16 16-time-clocks-ordering.md time-clocks-ordering-simplicity-vs-accuracy
17 17-reliability-patterns.md reliability-patterns-availability-vs-complexity
18 18-caching-strategies.md caching-strategies-performance-vs-consistency
19 19-observability.md observability-metrics-logs-traces-coverage-vs-overhead
20 20-security-fundamentals.md security-fundamentals-security-vs-convenience
21 21-containers-docker.md containers-docker-isolation-vs-overhead
22 22-kubernetes-essentials.md kubernetes-essentials-abstraction-vs-complexity
23 23-message-queues.md message-queues-kafka-vs-rabbitmq-throughput-vs-latency
24 24-event-driven-architecture.md event-driven-architecture-decoupling-vs-complexity
25 25-architecture-decision-records.md architecture-decision-records-making-decisions-legible
26 26-technical-debt-strategy.md technical-debt-strategy-intentional-debt-not-accidental
27 27-build-vs-buy.md build-vs-buy-decisions-control-vs-speed
SD-01 ../system-design/01-system-design-framework.md system-design-framework-5-step-approach-any-problem
SD-02 ../system-design/02-design-url-shortener.md design-url-shortener-simple-scalable-system
SD-03 ../system-design/03-design-distributed-cache.md design-distributed-cache-high-performance-caching
SD-04 ../system-design/04-design-chat-system.md design-real-time-chat-websockets-ordering-fan-out

Contributing

Found an error? Have a better example? This series is continuously improved based on feedback from engineers who use it in production.


Congratulations on exploring the Backend Engineering Mastery series! This comprehensive guide covers everything from OS fundamentals to engineering leadership. Bookmark it, share it with your team, and return to it throughout your career.

Remember: understanding trade-offs beats memorizing facts every time.