Skip to content
5-YEAR PROGRAM · YEAR 1 · PHASE 1
UPCOMING

OS Foundations: Linux as Implementation

First phase of ROOT. Sets the pattern-first frame every later phase follows. ~8 weeks, ~100 hrs.

This is where the program actually starts — not at Kubernetes, not at the cloud, not at AI infrastructure. At the kernel. Every higher tier of basecamp eventually resolves to processes, file descriptors, page tables, and syscalls. If those primitives are fuzzy, every later abstraction is partial magic.

Phase 1 is also where you internalize the pattern-first scaffold — PROBLEM → PRINCIPLES → TRADE-OFFS → TOOLS → MASTERY → COMPARE → OPERATE → CONTRIBUTE — by living it for 8 weeks. Every subsequent phase in the Year 1 plan follows the same shape, so the discipline you build here pays interest for the next 60 months.

The bet is the same one the Master Plan makes: tools change, patterns don’t. Linux is the implementation you’ll know best. FreeBSD is the comparison that proves you understand the category, not just one tool.


Prerequisites

  • Working homelab (Proxmox + bastion VM) — see homelab/hardware for the build
  • A second small VM available for OS comparison work (FreeBSD VM in Proxmox is fine)
  • 12 hrs/week budget reserved
  • ops-handbook repo initialized — every runbook, incident, weekly log lands there from Phase 1 forward
  • You’ve read the Master Plan and the Year 1 overview
  • You accept the contract: you are not learning Linux. You are learning what an operating system is, with Linux as one implementation. Anything you learn here should transfer to FreeBSD, macOS, illumos, or whatever replaces Linux in 15 years.

Why this phase exists

Everything in computing runs on top of an operating system. K8s schedules processes. A database is a process. A container is a process with a smaller view of the filesystem. eBPF runs in the kernel. Your AI model serves predictions from a process. If you don’t deeply understand what a process is, what a filesystem actually does, how memory is laid out, and how the kernel mediates everything — you’ll plateau as an operator. You’ll learn tools forever and never learn engineering.

Staff/Principal engineers debug any layer of the stack because they understand what’s underneath the abstraction. This phase is where that habit starts: every abstraction has an implementation. Implementations have trade-offs. Trade-offs come from physics, history, and incentives. The pattern survives the tool.


The pattern-first frame (every phase uses this)

PROBLEM What category of human need exists?
PRINCIPLES The timeless patterns any solution must implement
TRADE-OFFS The decisions every implementation makes (and why)
TOOLS Current implementations (time-stamped — they age)
MASTERY Pick one tool, go to operational depth
COMPARE Re-implement the same problem in a second tool
(this is the proof that the pattern transferred)
OPERATE Run it in your homelab, take real incidents
CONTRIBUTE Ship one fix upstream

This is not a recipe. It’s a learning instrument. Each phase doc gives you the framing; you do the investigation. If a phase doc reads like a copy-paste tutorial, the doc is wrong — flag it.


1. PROBLEM

You have hardware: CPU, RAM, I/O devices (disks, network cards, keyboard). You want to run multiple programs on it, give each program the illusion that it has the whole machine, prevent any program from crashing or spying on the others, and abstract away the messy details of every specific disk model and NIC model. You also want all of this to survive a crash and reboot cleanly.

That is the problem an operating system solves.

It is not a problem about Linux. Linux is one implementation. Windows NT is another. macOS/XNU is another (BSD + Mach hybrid). FreeBSD is another. seL4 is another (with formal verification). Unikernels (MirageOS, Unikraft) are a different shape of the same problem.

Throughout this phase, when you’re learning a Linux concept, ask: what problem is this solving, and how does another OS solve the same problem?


2. PRINCIPLES

2.1 Resource virtualization

Every process believes it has the whole machine — its own CPU, its own memory, its own file descriptors. Virtualization makes this illusion convincing while the actual hardware is shared.

→ Pattern: resource-virtualization

Investigate:

  • What is a virtual address space and why does each process get its own? Starter hints: cat /proc/self/maps, pmap(1), Kerrisk TLPI Ch. 6.5
  • What happens when a process accesses a memory address — physically, end to end? Starter hints: MMU, TLB, page-table walk; any OS textbook
  • What goes wrong when virtualization breaks (segfault, OOM kill, swap thrashing)? Starter hints: dmesg after triggering OOM in a cgroup; vmstat 1

2.2 Privilege separation

The kernel runs in privileged mode; processes run in user mode. Every dangerous operation (touching hardware, allocating memory, networking, file I/O) requires the process to ask the kernel.

→ Pattern: privilege-separation — revisited and deepened in Phase 6 (Containers) when namespaces and capabilities become the unit of scope.

Investigate:

  • What is a system call, mechanically? Starter hints: strace -c on a small program; Kerrisk TLPI Ch. 3
  • Why does the privilege boundary exist — what attack would be possible without it? Starter hints: x86 ring 0 vs ring 3; Meltdown/Spectre as cautionary tales
  • What’s the cost of crossing the boundary, and why is it a perf concern at scale? Starter hints: strace -c shows syscall counts + time; benchmark write(2) to /dev/null vs an in-memory counter

2.3 Mediation

The kernel mediates every interaction between processes and hardware. Filesystems, networking, timers, signals — all go through the kernel.

→ Pattern: mediation — reinforced in Phase 7 when K8s Services become the userspace mediator between clients and pods.

Investigate:

  • Why isn’t there a “let processes write to disk directly” shortcut? Starter hints: page cache, write barriers, fsync semantics
  • What’s the cost of mediation? When do you bypass it (io_uring, DPDK)? Starter hints: user-space networking; io_uring design papers

2.4 Layering and abstraction

The OS is layered: hardware → kernel → libc → application. Each layer hides complexity from the layer above and exposes a contract.

→ Pattern: layering-and-abstraction — the pattern recurs in Phase 2 (TCP/IP layers), Phase 6 (OverlayFS layers), and Phase 7 (the K8s API layered over etcd).

Investigate:

  • Trace a printf from your C program to bytes on the screen. How many layers? Starter hints: libc → syscall → VFS → tty driver → framebuffer
  • What happens when a layer’s contract is wrong (libc bug, kernel regression)? Starter hints: glibc CVEs as examples; kernel ABI stability rules

2.5 The process as the unit of execution

A process is the OS’s unit of accountability — its own address space, file descriptors, signal handlers, scheduling priority. Threads share an address space; processes don’t.

Investigate:

  • What’s in /proc/PID/? Walk through every file for one of your processes.
  • What does fork(2) actually do? Why is vfork a thing? Why is clone(2) more general?
  • What does the scheduler decide when there are 1000 runnable processes and 4 cores?

2.6 The filesystem as a namespace

Files are an abstraction over blocks on disk. Directories are an abstraction over names. The VFS layer makes ext4, XFS, ZFS, NFS, FUSE, and tmpfs all look the same to applications.

Investigate:

  • What’s an inode and what’s a dentry?
  • Why does /proc look like a filesystem when there’s no disk behind it?
  • What’s the kernel’s path lookup algorithm (path → inode)?

3. TRADE-OFFS

DecisionOption AOption BCost
Kernel designMonolithic (Linux)Microkernel (seL4, Mach)Hybrid (XNU)
Process modelfork + exec (Unix)spawn (Windows)both work; fork is cheaper for shells; spawn is cleaner for IDEs
SchedulingCFS (Linux)BSD schedulerround-robin
Filesystemext4 (default)XFS (large files)ZFS (data integrity + COW)
Init systemsystemdOpenRC, runit, s6systemd: feature-rich, controversial. Alternatives: minimal, niche

The decisions in this table aren’t aesthetic preferences — they’re forced choices each kernel team made to prioritize one axis (throughput, latency, integrity, simplicity) at the cost of another. Reading any one row carefully and being able to articulate why the trade-off exists is the actual learning outcome.


4. TOOLS (as of 2025-10)

Distributions

  • Ubuntu 24.04 LTS — homelab default; well-documented
  • Debian 12 — stability-first; smaller surface
  • Alpine — for containers; musl libc compare
  • NixOS — declarative; the OS as code (try at end of phase)

FreeBSD

  • FreeBSD 14 — the compare target

Investigation tools

  • strace, ltrace, lsof, pmap, tcpdump — system call + library tracing
  • bpftrace, bcc-tools — eBPF for live kernel observability (taste; deepen Year 3)
  • perf, flamegraph — profiling
  • procfs/sysfs direct read — /proc/PID/*, /sys/*

Reading

  • “The Linux Programming Interface” (Kerrisk) — the definitive reference
  • “How Linux Works” (Brian Ward, 3rd ed.) — readable orientation
  • “Operating Systems: Three Easy Pieces” — the textbook (free online)

5. MASTERY: Linux at depth

5.1 Reading list

RequiredWhy
TLPI Ch. 1-7 (introduction, processes, memory, files)The principle layer
”How Linux Works” (Ward) Ch. 1-8Orientation + practical
OSTEP Ch. 4-9 (processes + scheduling)Theory layer
RecommendedWhy
TLPI Ch. 13-17 (file I/O depth)When you hit a Phase 3 storage question
man 7 capabilitiesSetup for Phase 6 containers

5.2 Operational depth checklist

[ ] Walk through every file in /proc/self/ — know what each represents
[ ] strace -c on `ls`, `cat`, a small Python script — count syscalls; explain top 5
[ ] Trigger and diagnose: a segfault (write a tiny C program), an OOM kill (cgroup-bounded loop), swap thrashing (allocate 2× RAM)
[ ] Build a process tree with fork/exec in C; observe with `pstree` and `/proc`
[ ] Read /proc/PID/maps for a running process; identify text, heap, stack, mmap regions
[ ] Use `bpftrace -e 'tracepoint:syscalls:sys_enter_openat { @[comm] = count(); }'` to count opens by process for 30 seconds
[ ] Set up cgroups v2 manually (no systemd-run); pin a process to 1 CPU and 100MB RAM
[ ] Mount a filesystem (ext4, XFS, tmpfs); inspect with `dumpe2fs`/`xfs_info`/`mount`
[ ] Write a small init system in shell (PID 1 fundamentals — reaping zombies, signal handling)
[ ] Boot a Linux VM with custom kernel cmdline; observe `dmesg` and trace boot sequence

The cgroups exercise is the one that pays the most interest later — cgroups v2 is the exact same primitive that Phase 6 uses to bound containers and that Phase 7 Kubernetes uses to enforce pod resource requests. The PID-1 init exercise pays interest in Phase 6 too: a container’s entrypoint IS PID 1 inside its namespace, and zombie reaping is a real production failure mode.

5.3 ops-handbook starts here

ops-handbook/runbooks/linux/ gets its first 3-5 entries this phase:

  • “Diagnose a process eating CPU” (top, perf top, strace)
  • “Diagnose disk-full or inode-exhaustion”
  • “Diagnose memory pressure” (vmstat, /proc/meminfo, OOM scoring)
  • “Recover a system that won’t boot” (recovery shell, fsck, init=/bin/bash)

Each runbook follows meta/runbook-template.md. Test each by handing it to Claude in “play the runner” mode. The Year 1 overview explains why ops-handbook is the load-bearing artifact of the whole program — Phase 1 is where it stops being empty.


6. COMPARE: FreeBSD VM

Spin up a FreeBSD 14 VM in Proxmox. Re-do 3 of the operational checklist items there:

  • Process inspection (ps, procstat instead of /proc)
  • System-call tracing (truss instead of strace)
  • Filesystem mounting (UFS or ZFS instead of ext4)

Write a 400-word reflection in ops-handbook/: what’s the same? what’s different? what’s the underlying principle?

This is the phase’s primary proof-of-pattern-transfer. If you can’t articulate “fork-exec is the same; the syscalls differ” you haven’t internalized the pattern yet.


7. OPERATE

  • 3-5 runbooks in ops-handbook/runbooks/linux/
  • 1+ ADR if you make a real decision (e.g., “Why ext4 over XFS for the homelab data disk”)
  • Weekly log every Sunday — by phase end you should have ~8 entries

8. CONTRIBUTE

This phase is the warm-up for upstream contribution. Year 1 deadline for first merged PR is end of Phase 7, but you should attempt one this phase.

Approachable targets:

  • Linux kernel docs — typo or clarification in Documentation/
  • man-pages project — examples or clarifications
  • util-linux — small bug or doc fix
  • procps-ngps, top family

Keep notes in ops-handbook/contributions/.


Validation criteria

[ ] All 10 operational depth checks
[ ] FreeBSD compare exercise written up
[ ] 3-5 Linux runbooks in ops-handbook
[ ] 1+ ADR (or explicit "no decisions warranted ADR this phase")
[ ] 8+ weekly log entries
[ ] Pattern entries deepened STUB → OUTLINE:
- resource-virtualization
- privilege-separation
- mediation
- layering-and-abstraction
[ ] Exit Test passed

Exit Test

Time: 3 hours.

Part 1: Diagnose (90 min)

A scenario from the root-exam Phase 1 catalog (e.g., “this VM’s load average is 30 with 0% CPU usage” — disk I/O wait; or “this process won’t die on SIGTERM” — uninterruptible sleep). Find root cause + write a runbook covering the diagnosis path.

Part 2: Articulate (90 min)

~1000 words: “Walk a read(fd, buf, 1024) syscall from the C function call to data in buf. Cover: user→kernel transition, VFS lookup, page cache, block layer, device driver, return path. Cite the patterns at each layer.”

The articulation answer should reference at least three of the four patterns this phase deepens — that’s the proof that the principle layer landed, not just the trivia layer.


Anti-patterns

Anti-patternWhy
”I’ll learn Linux by following a tutorial”You’ll know commands; you won’t know what they do
Skipping the FreeBSD compareWithout it, “OS” and “Linux” stay confused in your head
Memorizing /proc/* paths instead of understanding themThe path is just a file; the abstraction is the point
Treating strace output as noiseIt’s the truth — when you can read it, you can debug anything

Patterns touched this phase


→ Next: Phase 2: Networking