Containers: Namespaces + cgroups + UnionFS
Sixth phase. Build a container from scratch using Linux primitives. Docker as the canonical implementation. The magic dissolves; what’s left is just clever process isolation. ~6 weeks, ~70 hrs.
Phase 6 is the phase where the abstractions you’ve been building toward all year — process isolation from Phase 1, layering from Phase 1 + Phase 2 — collapse into a single artifact you can build with unshare(1). There is no magic in containers. There is just clever composition of Linux primitives you’ve already met. The phase exists to make that statement true for you.
The pattern frame is unusually clean here: a container is a process under privilege-separation (namespaces + capabilities + seccomp), running on a layered filesystem (OverlayFS), bounded by resource virtualization (cgroups v2). All four ingredients are concepts you internalized 10+ weeks ago in Phase 1 — what’s new is the composition. By Phase 6 end, “Docker” should feel like a convenient frontend over kernel features, not a mysterious daemon.
This is also the phase that unlocks Phase 7. Kubernetes assumes you know what a container is. Without Phase 6 you’re driving K8s blind.
Prerequisites
Why this phase exists
Phase 7’s K8s is built on containers. Year 4’s llm-gateway runs in containers. mlship (Year 5 capstone) auto-builds containers. All Year 3 data tools (Spark, Trino, Iceberg) ship as containers. If containers are magic, every higher-level system is partial magic.
The principle is privilege-separation revisited — containers are a process’s view of the system, scoped down via kernel primitives.
1. PROBLEM
You want to package and run software in a way that’s:
- Reproducible — same image, same behavior, anywhere
- Isolated — one container’s mistakes don’t break another
- Lightweight — faster than VMs (no separate kernel)
- Distributable — pull from a registry, run anywhere
Linux containers solve this with three building blocks: namespaces (isolation), cgroups (resource limits), UnionFS (layered filesystems).
2. PRINCIPLES
2.1 Namespaces: the “what can this process see?” boundary
Linux has 7+ namespace types: PID, mount, net, user, UTS, IPC, cgroup, time. Each restricts what a process sees of that resource.
→ Pattern: privilege-separation (revisited)
Investigate:
- Use
unshare -p -m -n -f /bin/bashto enter a namespaced shell; observeps,ip link,mount - Read
man 7 namespaces; map each type to “what would break if it weren’t there?” - Why is the PID namespace particularly important?
2.2 cgroups: the “how much can this process use?” boundary
cgroups v2 (unified hierarchy) controls CPU, memory, IO, PIDs.
Investigate:
- Create a cgroup manually under
/sys/fs/cgroup/; pin a process; cap memory - What happens when memory cap is exceeded? (OOM kill within cgroup, kernel intact)
- What’s the difference between cgroups v1 and v2?
This is the same /sys/fs/cgroup/ you played with in Phase 1 when bounding a single process. Phase 6’s contribution isn’t a new primitive — it’s the composition with namespaces and an overlay rootfs that turns “bounded process” into “container.”
2.3 UnionFS: the layered filesystem
A container image is a stack of read-only layers + one writable layer on top. OverlayFS is the canonical Linux implementation.
→ Pattern: layering-and-abstraction (reinforced)
Investigate:
- Use
mount -t overlayto create your own overlay manually; understand lowerdir/upperdir/workdir - Why is the writable layer copy-on-write? What’s the cost?
- What’s a Docker layer in image manifest terms? Read a manifest with
crane manifest.
2.4 Capabilities: fine-grained root
root was historically all-or-nothing. Capabilities split root into ~40 specific permissions (CAP_NET_BIND_SERVICE, CAP_SYS_ADMIN, etc.). Containers should drop everything not needed.
Investigate:
setcap cap_net_bind_service+ep ./mybinary— let a non-root binary bind port 80docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE— minimal-cap container- Why is
--privilegedalmost always wrong?
2.5 seccomp: syscall filtering
seccomp-bpf restricts which syscalls a process can make. Default Docker profile blocks ~50 dangerous ones.
Investigate:
- Read the default Docker seccomp profile (it’s JSON)
- Run a container with
--security-opt seccomp=unconfined; observe which syscalls become available
2.6 Image building: Dockerfile and beyond
A Dockerfile is the imperative recipe; the image is the declarative result. Multi-stage builds + distroless minimize attack surface.
Investigate:
- Multi-stage Go build: builder image → distroless runtime; observe size delta
- Why does
COPYorder matter for cache? Layer caching mental model. - Compare Dockerfile vs
buildah/kaniko/nixpkgs— different paths to the same OCI image
The “imperative recipe → declarative result” framing also appears in Phase 7 at a different level — Helm charts are imperative templates that produce declarative manifests, just like Dockerfiles produce immutable images. Same shape, different artifact.
3. TRADE-OFFS
| Decision | Option A | Option B | Cost |
|---|---|---|---|
| Runtime | docker | podman (rootless-friendly) | containerd direct |
| Builder | Dockerfile | Buildah | Kaniko |
| Base image | distroless (Google) | scratch | alpine |
| User in container | root | non-root with capabilities | non-root: defense in depth, harder configs |
The Alpine row hides a real bug class — musl-libc is almost glibc-compatible, but the differences (DNS resolution behavior, thread-local storage, dynamic linking) bite at 3am in ways that are notoriously hard to debug. For Go binaries, distroless or scratch are usually the right call. For Python, paying the size cost of a Debian-slim base is often worth not chasing musl-libc compatibility issues.
4. TOOLS (as of 2025-10)
docker25+ orpodman5+unshare,nsenter(util-linux) — for the from-scratch exercisesbuildah,skopeo— image manipulationcrane— registry interaction without Dockerdive— image-layer inspectiontrivy— vulnerability scanning (warm-up for Year 2 supply chain)- distroless base images (gcr.io/distroless)
5. MASTERY
5.1 Reading list
| Required | Why |
|---|---|
| ”Container Security” (Liz Rice) | The principles + the pitfalls |
man 7 namespaces, man 7 capabilities | The actual contracts |
| Docker docs (Build, Storage, Networking) | The implementation |
| Recommended | Why |
|---|---|
| ”Kubernetes the Hard Way” (Hightower) — read it now, you’ll do it Phase 7 | Bridge to Phase 7 |
5.2 Operational depth checklist
[ ] Build a container from scratch — `unshare -p -m -n -f -U /bin/bash`, mount overlay, run a process. No Docker.[ ] Multi-stage Go build for `pulse`: builder + distroless runtime; observe size (~10MB final)[ ] Run a container with --cap-drop=ALL + only what's needed; verify with `getpcaps` from inside[ ] Configure cgroups v2 manually for a docker run with `--cpus=0.5 --memory=100m`; force OOM[ ] Read a Docker image manifest via `crane manifest`; identify layer SHAs[ ] Use `dive` on `pulse` image; identify wasted space; reduce[ ] Run `trivy image` on `pulse`; address any HIGH/CRITICAL CVEs[ ] Set up a local registry with `registry:2`; push/pull `pulse`[ ] Containerize `triage`'s Postgres + Redis dependencies (foreshadow [Phase 7](/program/year-1/phase-7/))[ ] Read Linux kernel source for one namespace type (e.g., PID — `kernel/pid_namespace.c`); 1 hourThe “build a container from scratch” item is the load-bearing exercise of the entire phase. If you skip it and rely on Docker the whole way through, the abstraction never dissolves and Phase 7’s K8s stays partially mysterious. Spend an entire afternoon on it. Watch ps from inside the namespace and from outside; reconcile the two views.
5.3 Containerize the Year 1 services
By phase end, you have container images for:
pulse(you ship this anyway)triage(Phase 7 will deploy this)rxp,konfig(CLIs containerized for CI use)
These all live in Dockerfiles in their respective project repos. Multi-stage. Distroless or scratch where possible. Trivy-clean. By the end of Phase 6 you have everything K8s would need to deploy in Phase 7 — only the orchestration is missing.
6. COMPARE: Docker vs Podman (rootless)
Run the same Dockerfile under Docker (root-daemon) and Podman (rootless). Compare:
- Setup complexity
- Permission model
- Network behavior
- CI/CD ergonomics
400 words.
The rootless Podman exercise is also a foreshadowing of Phase 7 and Year 2 supply-chain hygiene — don’t run privileged daemons you don’t need. Podman’s daemonless, rootless model is closer to the security posture you want for production K8s nodes than Docker’s root daemon is.
7. OPERATE
- 3+ runbooks (
container-build-failed,container-runs-locally-fails-in-prod,image-too-large) - 1+ ADR (e.g., “Why distroless over alpine for Go services”)
- Weekly log
8. CONTRIBUTE
Container-adjacent OSS — buildah, podman, crane, dive, trivy, distroless. Lots of “good first issue” tickets.
Validation criteria
[ ] All 10 operational depth checks[ ] Container-from-scratch exercise documented[ ] Docker vs Podman comparison written up[ ] All Year 1 projects containerized (pulse, triage, rxp, konfig)[ ] 3+ runbooks; 1+ ADR; 6+ weekly log entries[ ] Pattern entries: - privilege-separation → reinforced (now extends to namespace + capability scope) - layering-and-abstraction → reinforced (UnionFS layers as concrete example)[ ] Exit Test passedExit Test
Time: 2 hours.
- Build (60 min) — given a Go binary, write a multi-stage Dockerfile producing a < 20MB distroless image with non-root user, cap-drop ALL, healthcheck. Run it locally with explicit cgroup limits.
- Articulate (60 min) — 600 words: “Walk a
docker runfrom CLI to running process. Cover: image pull, layer extraction, namespace creation, cgroup setup, exec.”
The articulation prompt is the exact composition the phase teaches: image (UnionFS) + namespaces + cgroups + exec. If you can describe each step crisply in your own words, the abstraction has dissolved. If any step still feels like “and then Docker does some stuff”, go back and redo the from-scratch exercise.
Anti-patterns
| Anti-pattern | Why |
|---|---|
docker run --privileged “to make it work” | Defeats the entire isolation model |
| Single-stage Dockerfile with build deps in final image | Bloated + larger attack surface |
latest tag in production references | Unreproducible; pin to digests |
Running as root inside the container | One escape and you’re root on the host (until user namespaces) |
Skipping trivy because “it’s just dev” | Dev images become prod images |
Patterns touched this phase
- privilege-separation — reinforced (now scoped per-process via namespaces + capabilities)
- layering-and-abstraction — reinforced (OverlayFS as canonical layered filesystem)
- immutable-infrastructure — first touch (containers as immutable unit of deployment; deepens Year 2)
→ Next: Phase 7: Kubernetes + GitOps