OS Foundations: Linux as Implementation
First phase of ROOT. Sets the pattern-first frame every later phase follows. ~8 weeks, ~100 hrs.
This is where the program actually starts — not at Kubernetes, not at the cloud, not at AI infrastructure. At the kernel. Every higher tier of basecamp eventually resolves to processes, file descriptors, page tables, and syscalls. If those primitives are fuzzy, every later abstraction is partial magic.
Phase 1 is also where you internalize the pattern-first scaffold — PROBLEM → PRINCIPLES → TRADE-OFFS → TOOLS → MASTERY → COMPARE → OPERATE → CONTRIBUTE — by living it for 8 weeks. Every subsequent phase in the Year 1 plan follows the same shape, so the discipline you build here pays interest for the next 60 months.
The bet is the same one the Master Plan makes: tools change, patterns don’t. Linux is the implementation you’ll know best. FreeBSD is the comparison that proves you understand the category, not just one tool.
Prerequisites
- Working homelab (Proxmox + bastion VM) — see homelab/hardware for the build
- A second small VM available for OS comparison work (FreeBSD VM in Proxmox is fine)
- 12 hrs/week budget reserved
ops-handbookrepo initialized — every runbook, incident, weekly log lands there from Phase 1 forward- You’ve read the Master Plan and the Year 1 overview
- You accept the contract: you are not learning Linux. You are learning what an operating system is, with Linux as one implementation. Anything you learn here should transfer to FreeBSD, macOS, illumos, or whatever replaces Linux in 15 years.
Why this phase exists
Everything in computing runs on top of an operating system. K8s schedules processes. A database is a process. A container is a process with a smaller view of the filesystem. eBPF runs in the kernel. Your AI model serves predictions from a process. If you don’t deeply understand what a process is, what a filesystem actually does, how memory is laid out, and how the kernel mediates everything — you’ll plateau as an operator. You’ll learn tools forever and never learn engineering.
Staff/Principal engineers debug any layer of the stack because they understand what’s underneath the abstraction. This phase is where that habit starts: every abstraction has an implementation. Implementations have trade-offs. Trade-offs come from physics, history, and incentives. The pattern survives the tool.
The pattern-first frame (every phase uses this)
PROBLEM What category of human need exists?PRINCIPLES The timeless patterns any solution must implementTRADE-OFFS The decisions every implementation makes (and why)TOOLS Current implementations (time-stamped — they age)MASTERY Pick one tool, go to operational depthCOMPARE Re-implement the same problem in a second tool (this is the proof that the pattern transferred)OPERATE Run it in your homelab, take real incidentsCONTRIBUTE Ship one fix upstreamThis is not a recipe. It’s a learning instrument. Each phase doc gives you the framing; you do the investigation. If a phase doc reads like a copy-paste tutorial, the doc is wrong — flag it.
1. PROBLEM
You have hardware: CPU, RAM, I/O devices (disks, network cards, keyboard). You want to run multiple programs on it, give each program the illusion that it has the whole machine, prevent any program from crashing or spying on the others, and abstract away the messy details of every specific disk model and NIC model. You also want all of this to survive a crash and reboot cleanly.
That is the problem an operating system solves.
It is not a problem about Linux. Linux is one implementation. Windows NT is another. macOS/XNU is another (BSD + Mach hybrid). FreeBSD is another. seL4 is another (with formal verification). Unikernels (MirageOS, Unikraft) are a different shape of the same problem.
Throughout this phase, when you’re learning a Linux concept, ask: what problem is this solving, and how does another OS solve the same problem?
2. PRINCIPLES
2.1 Resource virtualization
Every process believes it has the whole machine — its own CPU, its own memory, its own file descriptors. Virtualization makes this illusion convincing while the actual hardware is shared.
→ Pattern: resource-virtualization
Investigate:
- What is a virtual address space and why does each process get its own?
Starter hints:
cat /proc/self/maps,pmap(1), Kerrisk TLPI Ch. 6.5 - What happens when a process accesses a memory address — physically, end to end? Starter hints: MMU, TLB, page-table walk; any OS textbook
- What goes wrong when virtualization breaks (segfault, OOM kill, swap thrashing)?
Starter hints:
dmesgafter triggering OOM in a cgroup;vmstat 1
2.2 Privilege separation
The kernel runs in privileged mode; processes run in user mode. Every dangerous operation (touching hardware, allocating memory, networking, file I/O) requires the process to ask the kernel.
→ Pattern: privilege-separation — revisited and deepened in Phase 6 (Containers) when namespaces and capabilities become the unit of scope.
Investigate:
- What is a system call, mechanically?
Starter hints:
strace -con a small program; Kerrisk TLPI Ch. 3 - Why does the privilege boundary exist — what attack would be possible without it? Starter hints: x86 ring 0 vs ring 3; Meltdown/Spectre as cautionary tales
- What’s the cost of crossing the boundary, and why is it a perf concern at scale?
Starter hints:
strace -cshows syscall counts + time; benchmarkwrite(2)to/dev/nullvs an in-memory counter
2.3 Mediation
The kernel mediates every interaction between processes and hardware. Filesystems, networking, timers, signals — all go through the kernel.
→ Pattern: mediation — reinforced in Phase 7 when K8s Services become the userspace mediator between clients and pods.
Investigate:
- Why isn’t there a “let processes write to disk directly” shortcut? Starter hints: page cache, write barriers, fsync semantics
- What’s the cost of mediation? When do you bypass it (
io_uring,DPDK)? Starter hints: user-space networking;io_uringdesign papers
2.4 Layering and abstraction
The OS is layered: hardware → kernel → libc → application. Each layer hides complexity from the layer above and exposes a contract.
→ Pattern: layering-and-abstraction — the pattern recurs in Phase 2 (TCP/IP layers), Phase 6 (OverlayFS layers), and Phase 7 (the K8s API layered over etcd).
Investigate:
- Trace a
printffrom your C program to bytes on the screen. How many layers? Starter hints: libc → syscall → VFS → tty driver → framebuffer - What happens when a layer’s contract is wrong (libc bug, kernel regression)? Starter hints: glibc CVEs as examples; kernel ABI stability rules
2.5 The process as the unit of execution
A process is the OS’s unit of accountability — its own address space, file descriptors, signal handlers, scheduling priority. Threads share an address space; processes don’t.
Investigate:
- What’s in
/proc/PID/? Walk through every file for one of your processes. - What does
fork(2)actually do? Why isvforka thing? Why isclone(2)more general? - What does the scheduler decide when there are 1000 runnable processes and 4 cores?
2.6 The filesystem as a namespace
Files are an abstraction over blocks on disk. Directories are an abstraction over names. The VFS layer makes ext4, XFS, ZFS, NFS, FUSE, and tmpfs all look the same to applications.
Investigate:
- What’s an inode and what’s a dentry?
- Why does
/proclook like a filesystem when there’s no disk behind it? - What’s the kernel’s path lookup algorithm (path → inode)?
3. TRADE-OFFS
| Decision | Option A | Option B | Cost |
|---|---|---|---|
| Kernel design | Monolithic (Linux) | Microkernel (seL4, Mach) | Hybrid (XNU) |
| Process model | fork + exec (Unix) | spawn (Windows) | both work; fork is cheaper for shells; spawn is cleaner for IDEs |
| Scheduling | CFS (Linux) | BSD scheduler | round-robin |
| Filesystem | ext4 (default) | XFS (large files) | ZFS (data integrity + COW) |
| Init system | systemd | OpenRC, runit, s6 | systemd: feature-rich, controversial. Alternatives: minimal, niche |
The decisions in this table aren’t aesthetic preferences — they’re forced choices each kernel team made to prioritize one axis (throughput, latency, integrity, simplicity) at the cost of another. Reading any one row carefully and being able to articulate why the trade-off exists is the actual learning outcome.
4. TOOLS (as of 2025-10)
Distributions
- Ubuntu 24.04 LTS — homelab default; well-documented
- Debian 12 — stability-first; smaller surface
- Alpine — for containers; musl libc compare
- NixOS — declarative; the OS as code (try at end of phase)
FreeBSD
- FreeBSD 14 — the compare target
Investigation tools
strace,ltrace,lsof,pmap,tcpdump— system call + library tracingbpftrace,bcc-tools— eBPF for live kernel observability (taste; deepen Year 3)perf,flamegraph— profilingprocfs/sysfsdirect read —/proc/PID/*,/sys/*
Reading
- “The Linux Programming Interface” (Kerrisk) — the definitive reference
- “How Linux Works” (Brian Ward, 3rd ed.) — readable orientation
- “Operating Systems: Three Easy Pieces” — the textbook (free online)
5. MASTERY: Linux at depth
5.1 Reading list
| Required | Why |
|---|---|
| TLPI Ch. 1-7 (introduction, processes, memory, files) | The principle layer |
| ”How Linux Works” (Ward) Ch. 1-8 | Orientation + practical |
| OSTEP Ch. 4-9 (processes + scheduling) | Theory layer |
| Recommended | Why |
|---|---|
| TLPI Ch. 13-17 (file I/O depth) | When you hit a Phase 3 storage question |
man 7 capabilities | Setup for Phase 6 containers |
5.2 Operational depth checklist
[ ] Walk through every file in /proc/self/ — know what each represents[ ] strace -c on `ls`, `cat`, a small Python script — count syscalls; explain top 5[ ] Trigger and diagnose: a segfault (write a tiny C program), an OOM kill (cgroup-bounded loop), swap thrashing (allocate 2× RAM)[ ] Build a process tree with fork/exec in C; observe with `pstree` and `/proc`[ ] Read /proc/PID/maps for a running process; identify text, heap, stack, mmap regions[ ] Use `bpftrace -e 'tracepoint:syscalls:sys_enter_openat { @[comm] = count(); }'` to count opens by process for 30 seconds[ ] Set up cgroups v2 manually (no systemd-run); pin a process to 1 CPU and 100MB RAM[ ] Mount a filesystem (ext4, XFS, tmpfs); inspect with `dumpe2fs`/`xfs_info`/`mount`[ ] Write a small init system in shell (PID 1 fundamentals — reaping zombies, signal handling)[ ] Boot a Linux VM with custom kernel cmdline; observe `dmesg` and trace boot sequenceThe cgroups exercise is the one that pays the most interest later — cgroups v2 is the exact same primitive that Phase 6 uses to bound containers and that Phase 7 Kubernetes uses to enforce pod resource requests. The PID-1 init exercise pays interest in Phase 6 too: a container’s entrypoint IS PID 1 inside its namespace, and zombie reaping is a real production failure mode.
5.3 ops-handbook starts here
ops-handbook/runbooks/linux/ gets its first 3-5 entries this phase:
- “Diagnose a process eating CPU” (
top,perf top,strace) - “Diagnose disk-full or inode-exhaustion”
- “Diagnose memory pressure” (
vmstat,/proc/meminfo, OOM scoring) - “Recover a system that won’t boot” (recovery shell, fsck,
init=/bin/bash)
Each runbook follows meta/runbook-template.md. Test each by handing it to Claude in “play the runner” mode. The Year 1 overview explains why ops-handbook is the load-bearing artifact of the whole program — Phase 1 is where it stops being empty.
6. COMPARE: FreeBSD VM
Spin up a FreeBSD 14 VM in Proxmox. Re-do 3 of the operational checklist items there:
- Process inspection (
ps,procstatinstead of/proc) - System-call tracing (
trussinstead ofstrace) - Filesystem mounting (UFS or ZFS instead of ext4)
Write a 400-word reflection in ops-handbook/: what’s the same? what’s different? what’s the underlying principle?
This is the phase’s primary proof-of-pattern-transfer. If you can’t articulate “fork-exec is the same; the syscalls differ” you haven’t internalized the pattern yet.
7. OPERATE
- 3-5 runbooks in
ops-handbook/runbooks/linux/ - 1+ ADR if you make a real decision (e.g., “Why ext4 over XFS for the homelab data disk”)
- Weekly log every Sunday — by phase end you should have ~8 entries
8. CONTRIBUTE
This phase is the warm-up for upstream contribution. Year 1 deadline for first merged PR is end of Phase 7, but you should attempt one this phase.
Approachable targets:
- Linux kernel docs — typo or clarification in
Documentation/ man-pagesproject — examples or clarificationsutil-linux— small bug or doc fixprocps-ng—ps,topfamily
Keep notes in ops-handbook/contributions/.
Validation criteria
[ ] All 10 operational depth checks[ ] FreeBSD compare exercise written up[ ] 3-5 Linux runbooks in ops-handbook[ ] 1+ ADR (or explicit "no decisions warranted ADR this phase")[ ] 8+ weekly log entries[ ] Pattern entries deepened STUB → OUTLINE: - resource-virtualization - privilege-separation - mediation - layering-and-abstraction[ ] Exit Test passedExit Test
Time: 3 hours.
Part 1: Diagnose (90 min)
A scenario from the root-exam Phase 1 catalog (e.g., “this VM’s load average is 30 with 0% CPU usage” — disk I/O wait; or “this process won’t die on SIGTERM” — uninterruptible sleep). Find root cause + write a runbook covering the diagnosis path.
Part 2: Articulate (90 min)
~1000 words: “Walk a read(fd, buf, 1024) syscall from the C function call to data in buf. Cover: user→kernel transition, VFS lookup, page cache, block layer, device driver, return path. Cite the patterns at each layer.”
The articulation answer should reference at least three of the four patterns this phase deepens — that’s the proof that the principle layer landed, not just the trivia layer.
Anti-patterns
| Anti-pattern | Why |
|---|---|
| ”I’ll learn Linux by following a tutorial” | You’ll know commands; you won’t know what they do |
| Skipping the FreeBSD compare | Without it, “OS” and “Linux” stay confused in your head |
Memorizing /proc/* paths instead of understanding them | The path is just a file; the abstraction is the point |
Treating strace output as noise | It’s the truth — when you can read it, you can debug anything |
Patterns touched this phase
- resource-virtualization — first deepening to OUTLINE
- privilege-separation — first deepening to OUTLINE
- mediation — first deepening to OUTLINE
- layering-and-abstraction — first deepening to OUTLINE
→ Next: Phase 2: Networking