Networking: TCP/IP to L7
Second phase. Networking from cable to HTTP. The patterns survive whatever the next-decade L7 protocol looks like. ~7 weeks, ~80 hrs.
Phase 1 gave you a single machine you can reason about end-to-end. Phase 2 is the moment that reasoning has to cross machine boundaries — and almost every distributed-systems pathology in Years 2-5 traces back to a network behavior you didn’t fully understand here.
The frame is the same one the Master Plan committed to: HTTP/2 will give way to HTTP/3, gRPC will be challenged by something on QUIC, the next mesh will not be Istio. But the patterns — layering, routing-and-addressing, reliability via acknowledgement, defense-in-depth — survive every cycle. Learn the patterns; treat each tool as a snapshot.
Prerequisites
- Phase 1 complete — process model, syscalls, filesystem internalized
- You accept: you’re not learning “how to run nginx.” You’re learning what L4 vs L7 actually mean, why TCP exists, what a load balancer is doing under the hood.
Why this phase exists
Every distributed system is a network problem. Year 2’s CAP theorem rests on network partition behavior. Year 3’s lakehouse depends on cheap L4 to object storage. Year 4’s llm-gateway is an L7 routing problem. Year 5’s zero-trust mesh is L4+L7+identity. If you can’t read tcpdump output, you’ll plateau the moment a system crosses a machine boundary.
This phase also begins the zero-trust mesh mental model — the architectural choice for the homelab’s network, and the substrate that Phase 7 Kubernetes Services and NetworkPolicy will sit on top of.
→ Pattern: layering-and-abstraction (the OSI/TCP layered model is the canonical example, deepened from Phase 1)
1. PROBLEM
You have multiple machines on a wire. They want to exchange data. Wires are unreliable, machines are heterogeneous, addresses change, attackers exist. The network stack solves: addressing, routing, reliability, ordering, congestion, encryption, application semantics.
OSI gives you 7 layers; TCP/IP collapses them into 4. Both are mental tools — neither is the truth. The truth is “bytes moving over physical media with patterns at each abstraction level.”
2. PRINCIPLES
2.1 Layered protocols
Each layer adds a header, hands the rest to the layer below, then the receiver peels headers off. Layering keeps each problem solvable in isolation.
→ Pattern: layering-and-abstraction
Investigate:
- Capture an HTTP GET with
tcpdump -X. Count the headers (Ethernet, IP, TCP, TLS, HTTP). Identify each. - What happens if a layer assumes the layer below works? (Spoiler: TCP-over-bad-Wi-Fi is misery.)
2.2 Addressing and routing
Every host has an L2 address (MAC), an L3 address (IP), and a port (L4). Routing decides which next-hop to send a packet to.
→ Pattern: routing-and-addressing
Investigate:
- What’s in your machine’s routing table?
ip route. Why those entries? - ARP — how does a machine find its neighbor’s MAC?
- Trace a packet from your ThinkPad to
google.comviatraceroute+mtr. What can you infer about the path?
2.3 Reliability + ordering (TCP)
UDP is fire-and-forget. TCP adds: connection establishment, ordered delivery, retransmission, flow control, congestion control.
Investigate:
- Three-way handshake — capture with tcpdump and identify SYN, SYN-ACK, ACK
- What’s the difference between
ESTABLISHED,TIME_WAIT,CLOSE_WAIT? When does each become a problem? - BBR vs CUBIC vs Reno — three congestion-control algorithms. Why have multiple?
2.4 Names (DNS)
IPs change. Names stay. DNS is the directory.
Investigate:
- Walk through a DNS lookup: stub resolver → recursive → authoritative → response. Use
dig +trace. - What’s a TTL? What’s negative caching? Why does “I changed DNS but it’s not propagating” happen?
DNS will reappear in Phase 7 as service-discovery inside Kubernetes (CoreDNS + kube-dns). Same pattern, different scope: instead of resolving public names through root → TLD → authoritative, you’ll resolve cluster-local names through CoreDNS → API server → Service objects. Internalizing DNS as a directory pattern now means K8s service discovery isn’t novel later — it’s just DNS aimed inward.
2.5 Application protocols (HTTP, gRPC)
Above TCP, applications speak something. HTTP is text-based, gRPC is binary on HTTP/2, SSE streams. Each makes trade-offs.
Investigate:
- HTTP/1.1 vs HTTP/2 vs HTTP/3 — what’s different at each layer?
- gRPC over HTTP/2 — why? What does it buy?
- Server-Sent Events vs WebSockets — when each wins.
2.6 Defense at L4 + L7
Encryption (TLS), authentication (mutual TLS, OIDC tokens), filtering (firewalls, WAFs).
→ Pattern: defense-in-depth (first encounter)
Investigate:
- What does a TLS handshake actually do? Capture one; identify ClientHello, server cert, key exchange.
- What’s the homelab firewall (nftables on bastion + Proxmox), and why default-deny?
3. TRADE-OFFS
| Decision | Option A | Option B | Cost |
|---|---|---|---|
| L4 protocol | TCP (reliable, ordered) | UDP (fast, lossy) | QUIC (UDP + TCP-features) |
| HTTP version | HTTP/1.1 (universal) | HTTP/2 (multiplex) | HTTP/3 (QUIC) |
| LB layer | L4 (TCP) | L7 (HTTP) | L4: blind, fast; L7: content-aware, more CPU |
| TLS termination | At edge (LB) | At service (app) | Mixed |
| Network model | NAT’d home | Bridged + DHCP reservations | DHCP makes lab reproducible |
Each row is a real architectural decision you’ll face by Year 2. The L4-vs-L7 row in particular shows up again in Phase 7 — a Kubernetes Service is L4, an Ingress/Gateway is L7, and choosing which to expose is exactly this trade-off scaled to a cluster.
4. TOOLS (as of 2025-10)
ip(8),ss(8)— modern interface + socket inspectiontcpdump,wireshark— packet capturedig,mtr,traceroute— DNS + pathcurl -v,httpie— HTTP introspectioniperf3,qperf— throughputnmap— surface mappingiptables/nftables— packet filteringwireguard/tailscale— modern VPN/mesh
5. MASTERY
5.1 Reading list
| Required | Why |
|---|---|
| ”Computer Networking: A Top-Down Approach” (Kurose & Ross) Ch. 1-5 | The textbook |
| Beej’s Guide to Network Programming | Sockets at the API level |
| RFC 793 (TCP) — sections you can stomach | The actual spec |
| Recommended | Why |
|---|---|
| ”High Performance Browser Networking” (Grigorik) | L7 + TLS depth |
5.2 Operational depth checklist
[ ] Capture + decode a full HTTP request with tcpdump (Ethernet → HTTP)[ ] Capture a TLS handshake; identify ClientHello, cert, key exchange, finished[ ] Trace `dig +trace example.com` end-to-end; identify each step[ ] Use mtr to find the slow hop on a path; explain what causes it[ ] Configure nftables on the bastion: default-deny + explicit allow for SSH + Proxmox web UI[ ] Set up Tailscale across ThinkPad + Mac + bastion; verify zero-trust mesh[ ] Run iperf3 between two homelab VMs; tune kernel buffers; observe throughput change[ ] Build an HTTP/1.1 client in C with raw sockets (no libcurl) — `connect`, `write`, `read`[ ] Diagnose "this connection is hung" — is it SYN-SENT, ESTABLISHED-no-data, TIME_WAIT exhaustion, or DNS?[ ] Configure rate-limiting on nginx; observe 429s under loadThe raw-sockets HTTP client is the exercise most people skip and most regret skipping. Writing connect/write/read by hand makes HTTP stop being a black box — you’ve literally typed the bytes on the wire. That muscle memory is what lets you read a tcpdump capture later without flinching.
6. COMPARE: nginx vs Caddy vs HAProxy
Pick two. Configure each as a reverse proxy in front of a small backend. Compare:
- Config style (declarative vs imperative; YAML vs DSL vs config-as-code)
- Defaults (TLS, HTTP/2, security headers)
- Operational ergonomics (reload, hot-config, zero-downtime restart)
Write 300 words: which would you reach for and why?
This is your first real practice with the pattern-first reflex: all three are L7 reverse proxies implementing the same job (terminate TLS, route by host/path, balance to upstreams). The differences live in trade-offs, not capability. By Phase 7 you’ll meet a fourth implementation of the same idea (the Kubernetes Ingress controller — which is often nginx or Traefik repackaged with a control loop on top).
7. OPERATE
- 3+ runbooks in
ops-handbook/runbooks/networking/(DNS broken, TLS cert expired, “the internet is slow”, firewall debug) - Weekly log
8. CONTRIBUTE
Networking-adjacent OSS to consider: iperf3, mtr, tcpdump docs, BIND or PowerDNS docs, nftables wiki.
Validation criteria
[ ] All 10 operational depth checks[ ] nginx vs Caddy vs HAProxy comparison written up[ ] 3+ networking runbooks; 1+ postmortem if you broke the homelab network[ ] 7+ weekly log entries[ ] Pattern entries deepened: - routing-and-addressing → OUTLINE - layering-and-abstraction → reinforced (already OUTLINE from Phase 1)[ ] Exit Test passedExit Test
Time: 3 hours.
- Build (60 min) — set up a clean reverse-proxy config for
triage.local(placeholder for Phase 7) with TLS, HTTP/2, default-deny except 80/443. - Debug (90 min) — scenario from Phase 2 catalog (DNS resolution broken, or TLS cert misconfigured, or asymmetric routing).
- Articulate (30 min) — 600 words: “Why is TCP slow on bad Wi-Fi? Walk through the layers.”
Anti-patterns
| Anti-pattern | Why |
|---|---|
| ”Just allow all on the firewall” | Defeats the whole point; default-deny is a one-time cost |
| Skipping tcpdump because “it’s hard to read” | The hard thing is the actual skill; build the muscle |
| Treating DNS as black magic | DNS is just hierarchical key-value with caching; demystify it |
Patterns touched this phase
- routing-and-addressing — first deepening to OUTLINE
- layering-and-abstraction — reinforced
- defense-in-depth — first touch (STUB)
→ Next: Phase 3: Databases