Skip to content
5-YEAR PROGRAM · YEAR 1 · PHASE 2
UPCOMING

Networking: TCP/IP to L7

Second phase. Networking from cable to HTTP. The patterns survive whatever the next-decade L7 protocol looks like. ~7 weeks, ~80 hrs.

Phase 1 gave you a single machine you can reason about end-to-end. Phase 2 is the moment that reasoning has to cross machine boundaries — and almost every distributed-systems pathology in Years 2-5 traces back to a network behavior you didn’t fully understand here.

The frame is the same one the Master Plan committed to: HTTP/2 will give way to HTTP/3, gRPC will be challenged by something on QUIC, the next mesh will not be Istio. But the patternslayering, routing-and-addressing, reliability via acknowledgement, defense-in-depth — survive every cycle. Learn the patterns; treat each tool as a snapshot.


Prerequisites

  • Phase 1 complete — process model, syscalls, filesystem internalized
  • You accept: you’re not learning “how to run nginx.” You’re learning what L4 vs L7 actually mean, why TCP exists, what a load balancer is doing under the hood.

Why this phase exists

Every distributed system is a network problem. Year 2’s CAP theorem rests on network partition behavior. Year 3’s lakehouse depends on cheap L4 to object storage. Year 4’s llm-gateway is an L7 routing problem. Year 5’s zero-trust mesh is L4+L7+identity. If you can’t read tcpdump output, you’ll plateau the moment a system crosses a machine boundary.

This phase also begins the zero-trust mesh mental model — the architectural choice for the homelab’s network, and the substrate that Phase 7 Kubernetes Services and NetworkPolicy will sit on top of.

→ Pattern: layering-and-abstraction (the OSI/TCP layered model is the canonical example, deepened from Phase 1)


1. PROBLEM

You have multiple machines on a wire. They want to exchange data. Wires are unreliable, machines are heterogeneous, addresses change, attackers exist. The network stack solves: addressing, routing, reliability, ordering, congestion, encryption, application semantics.

OSI gives you 7 layers; TCP/IP collapses them into 4. Both are mental tools — neither is the truth. The truth is “bytes moving over physical media with patterns at each abstraction level.”


2. PRINCIPLES

2.1 Layered protocols

Each layer adds a header, hands the rest to the layer below, then the receiver peels headers off. Layering keeps each problem solvable in isolation.

→ Pattern: layering-and-abstraction

Investigate:

  • Capture an HTTP GET with tcpdump -X. Count the headers (Ethernet, IP, TCP, TLS, HTTP). Identify each.
  • What happens if a layer assumes the layer below works? (Spoiler: TCP-over-bad-Wi-Fi is misery.)

2.2 Addressing and routing

Every host has an L2 address (MAC), an L3 address (IP), and a port (L4). Routing decides which next-hop to send a packet to.

→ Pattern: routing-and-addressing

Investigate:

  • What’s in your machine’s routing table? ip route. Why those entries?
  • ARP — how does a machine find its neighbor’s MAC?
  • Trace a packet from your ThinkPad to google.com via traceroute + mtr. What can you infer about the path?

2.3 Reliability + ordering (TCP)

UDP is fire-and-forget. TCP adds: connection establishment, ordered delivery, retransmission, flow control, congestion control.

Investigate:

  • Three-way handshake — capture with tcpdump and identify SYN, SYN-ACK, ACK
  • What’s the difference between ESTABLISHED, TIME_WAIT, CLOSE_WAIT? When does each become a problem?
  • BBR vs CUBIC vs Reno — three congestion-control algorithms. Why have multiple?

2.4 Names (DNS)

IPs change. Names stay. DNS is the directory.

Investigate:

  • Walk through a DNS lookup: stub resolver → recursive → authoritative → response. Use dig +trace.
  • What’s a TTL? What’s negative caching? Why does “I changed DNS but it’s not propagating” happen?

DNS will reappear in Phase 7 as service-discovery inside Kubernetes (CoreDNS + kube-dns). Same pattern, different scope: instead of resolving public names through root → TLD → authoritative, you’ll resolve cluster-local names through CoreDNS → API server → Service objects. Internalizing DNS as a directory pattern now means K8s service discovery isn’t novel later — it’s just DNS aimed inward.

2.5 Application protocols (HTTP, gRPC)

Above TCP, applications speak something. HTTP is text-based, gRPC is binary on HTTP/2, SSE streams. Each makes trade-offs.

Investigate:

  • HTTP/1.1 vs HTTP/2 vs HTTP/3 — what’s different at each layer?
  • gRPC over HTTP/2 — why? What does it buy?
  • Server-Sent Events vs WebSockets — when each wins.

2.6 Defense at L4 + L7

Encryption (TLS), authentication (mutual TLS, OIDC tokens), filtering (firewalls, WAFs).

→ Pattern: defense-in-depth (first encounter)

Investigate:

  • What does a TLS handshake actually do? Capture one; identify ClientHello, server cert, key exchange.
  • What’s the homelab firewall (nftables on bastion + Proxmox), and why default-deny?

3. TRADE-OFFS

DecisionOption AOption BCost
L4 protocolTCP (reliable, ordered)UDP (fast, lossy)QUIC (UDP + TCP-features)
HTTP versionHTTP/1.1 (universal)HTTP/2 (multiplex)HTTP/3 (QUIC)
LB layerL4 (TCP)L7 (HTTP)L4: blind, fast; L7: content-aware, more CPU
TLS terminationAt edge (LB)At service (app)Mixed
Network modelNAT’d homeBridged + DHCP reservationsDHCP makes lab reproducible

Each row is a real architectural decision you’ll face by Year 2. The L4-vs-L7 row in particular shows up again in Phase 7 — a Kubernetes Service is L4, an Ingress/Gateway is L7, and choosing which to expose is exactly this trade-off scaled to a cluster.


4. TOOLS (as of 2025-10)

  • ip(8), ss(8) — modern interface + socket inspection
  • tcpdump, wireshark — packet capture
  • dig, mtr, traceroute — DNS + path
  • curl -v, httpie — HTTP introspection
  • iperf3, qperf — throughput
  • nmap — surface mapping
  • iptables / nftables — packet filtering
  • wireguard / tailscale — modern VPN/mesh

5. MASTERY

5.1 Reading list

RequiredWhy
”Computer Networking: A Top-Down Approach” (Kurose & Ross) Ch. 1-5The textbook
Beej’s Guide to Network ProgrammingSockets at the API level
RFC 793 (TCP) — sections you can stomachThe actual spec
RecommendedWhy
”High Performance Browser Networking” (Grigorik)L7 + TLS depth

5.2 Operational depth checklist

[ ] Capture + decode a full HTTP request with tcpdump (Ethernet → HTTP)
[ ] Capture a TLS handshake; identify ClientHello, cert, key exchange, finished
[ ] Trace `dig +trace example.com` end-to-end; identify each step
[ ] Use mtr to find the slow hop on a path; explain what causes it
[ ] Configure nftables on the bastion: default-deny + explicit allow for SSH + Proxmox web UI
[ ] Set up Tailscale across ThinkPad + Mac + bastion; verify zero-trust mesh
[ ] Run iperf3 between two homelab VMs; tune kernel buffers; observe throughput change
[ ] Build an HTTP/1.1 client in C with raw sockets (no libcurl) — `connect`, `write`, `read`
[ ] Diagnose "this connection is hung" — is it SYN-SENT, ESTABLISHED-no-data, TIME_WAIT exhaustion, or DNS?
[ ] Configure rate-limiting on nginx; observe 429s under load

The raw-sockets HTTP client is the exercise most people skip and most regret skipping. Writing connect/write/read by hand makes HTTP stop being a black box — you’ve literally typed the bytes on the wire. That muscle memory is what lets you read a tcpdump capture later without flinching.


6. COMPARE: nginx vs Caddy vs HAProxy

Pick two. Configure each as a reverse proxy in front of a small backend. Compare:

  • Config style (declarative vs imperative; YAML vs DSL vs config-as-code)
  • Defaults (TLS, HTTP/2, security headers)
  • Operational ergonomics (reload, hot-config, zero-downtime restart)

Write 300 words: which would you reach for and why?

This is your first real practice with the pattern-first reflex: all three are L7 reverse proxies implementing the same job (terminate TLS, route by host/path, balance to upstreams). The differences live in trade-offs, not capability. By Phase 7 you’ll meet a fourth implementation of the same idea (the Kubernetes Ingress controller — which is often nginx or Traefik repackaged with a control loop on top).


7. OPERATE

  • 3+ runbooks in ops-handbook/runbooks/networking/ (DNS broken, TLS cert expired, “the internet is slow”, firewall debug)
  • Weekly log

8. CONTRIBUTE

Networking-adjacent OSS to consider: iperf3, mtr, tcpdump docs, BIND or PowerDNS docs, nftables wiki.


Validation criteria

[ ] All 10 operational depth checks
[ ] nginx vs Caddy vs HAProxy comparison written up
[ ] 3+ networking runbooks; 1+ postmortem if you broke the homelab network
[ ] 7+ weekly log entries
[ ] Pattern entries deepened:
- routing-and-addressing → OUTLINE
- layering-and-abstraction → reinforced (already OUTLINE from Phase 1)
[ ] Exit Test passed

Exit Test

Time: 3 hours.

  1. Build (60 min) — set up a clean reverse-proxy config for triage.local (placeholder for Phase 7) with TLS, HTTP/2, default-deny except 80/443.
  2. Debug (90 min) — scenario from Phase 2 catalog (DNS resolution broken, or TLS cert misconfigured, or asymmetric routing).
  3. Articulate (30 min) — 600 words: “Why is TCP slow on bad Wi-Fi? Walk through the layers.”

Anti-patterns

Anti-patternWhy
”Just allow all on the firewall”Defeats the whole point; default-deny is a one-time cost
Skipping tcpdump because “it’s hard to read”The hard thing is the actual skill; build the muscle
Treating DNS as black magicDNS is just hierarchical key-value with caching; demystify it

Patterns touched this phase


→ Next: Phase 3: Databases