Skip to content
5-YEAR PROGRAM · YEAR 4
UPCOMING

ML & AI Infrastructure

Months 37-48. Add the ML/AI layer on top of Year 3’s data layer. By year-end, basecamp serves models, runs notebooks-as-a-service, hosts services/llm-gateway/ (the Y4 flagship), and dogfoods notes-rag over your own 4 years of weekly logs. Exit ramp: ML Platform Engineer / AI Infrastructure Engineer


Where Year 4 sits in the program

Year 4 is the inflection. Up to here, ROOT has been a generic platform program — Linux, networking, Kubernetes, IaC, lakehouse, observability. Year 3 ended with a public basecamp running a credible data platform. Year 4 is where ROOT stops being a platform program and starts being an AI platform program. The brand pillar — “Building an AI Platform in Public” — is finally earned, not just declared.

The Year 3 → Year 4 transition is small in surface area (you’re still operating Kubernetes, still committing to basecamp) but large in identity. Y3’s deliverable was a data engineer’s platform. Y4’s deliverable is an ML platform engineer’s platform: training, registries, serving, features, GPUs, LLMs, RAG. The same nine tiers, with Tiers 5-7 of the basecamp plan lit up for the first time.

This is also the year services/llm-gateway/ grows up. Started as a stub that proxies one backend in P21, it ends Year 4 as a substantial Go service with RAG, streaming, multi-model routing, drift detection, and auto-rollback — moving real (homelab-scale) traffic. Everything else in Year 4 is in service of getting that gateway production-shaped, then connecting it into the rest of basecamp.


What you’ll know at the end of Year 4

  • MLOps lifecycle as a pattern — train → register → deploy → monitor → retrain, abstractly. MLflow + KServe + Ray are implementations of the model-lifecycle pattern, not the pattern itself.
  • Inference-shape fluency — online vs batch vs streaming inference, and the latency/cost/freshness trade-offs each shape commits to.
  • Feature stores in practice — Feast on top of Iceberg; the offline/online dual-store pattern; train/serve skew as a measurable production failure mode, not a textbook concept.
  • Pipeline orchestration for ML — Kubeflow Pipelines composing the train→register→deploy flow; Katib for hyperparameter search.
  • LLM infrastructure end-to-end — vLLM serving with quantization, RAG ingestion + retrieval + generation as three separable subsystems, vector DBs (pgvector, Qdrant), fine-tuning workflows.
  • GPU scheduling discipline — multi-tenant resource sharing, spot economics, the “pay $20 to learn the pattern” mindset that keeps cloud spend honest.
  • mlship v0/v0.5 — usable enough to deploy your own models and power the train→deploy composition recipe. The polished v2 OSS launch is a Year 5 capstone deliverable, not a Y4 outcome.

You’ll be deployable as ML Platform Engineer / AI Infrastructure Engineer. Year 4 flagship: services/llm-gateway/ inside basecamp.


Phase map

PhaseTitleApprox. weeksApprox. hoursPattern depth focus
20MLOps Foundations895model-lifecycle (first touch)
21ML Serving + mlship v08100inference-shapes, model-lifecycle (deepens)
22Feature Store: Feast780feature-store, train-serve-skew
23Kubeflow Operations8100(composition recipe lands here)
24LLM Infrastructure + RAG + llm-gateway v18100rag-as-pattern
25GPU Infrastructure (capstone)895all Y4 patterns reach DEEP
Year 4 Final Exam224
Total~49 weeks~594 hrs~5 patterns deepened

12 hrs/week × 52 weeks = 624 hrs. Year 4 fits with ~30 hrs slack. Year 4 deepens fewer patterns than Y3 (~5 vs ~12) but the patterns are denser — model-lifecycle alone touches everything from data validation to drift detection to canary rollouts. See the Master Plan’s pattern depth ladder for what STUB → OUTLINE → DEEP means in practice.


What ships during Year 4

ProjectPhaseStatus by Y4 end
services/llm-gateway/ (inside basecamp)P21 → P24 → P25Year 4 flagship. v1 with RAG + streaming + multi-model + drift detection + auto-rollback. Public via basecamp’s repo.
mlshipP21 (v0), P25 (v0.5)Private. Usable enough for the train→deploy composition recipe. Polished v2 launch is Y5 P30 capstone.
notes-rag personal serviceP24Private. Vector-indexes 4 years of your weekly logs; queryable via llm-gateway. The cinematic Y4 moment.
terralabscontinuousAdds GPU node group modules + Kubeflow bootstrap
platform-ctlcontinuousAdds model deploy / list / rollback (wraps mlship under the hood)

llm-gateway is the year’s serious code — substantial Go service, real (homelab-scale) traffic by Y4 end. It’s the artifact that proves you can build production AI infrastructure, not just operate someone else’s.


Why llm-gateway is staged across 3 phases

Building a production-grade LLM gateway in 8 weeks is unrealistic. Year 4 stages it deliberately:

P21 (ML Serving) ─→ llm-gateway v0:
minimal Go service, /v1/chat/completions endpoint,
routes to ONE backend (vLLM stub or OpenAI).
REST only, no streaming, no RAG.
P24 (LLM Infra) ─→ llm-gateway v1:
RAG pipeline (vector search → context injection),
streaming SSE, multi-model routing (small-vs-large),
per-user rate limit, cost+latency tracking,
OpenAI-compatible API surface.
P25 (Capstone) ─→ llm-gateway v1.5 (production-shaped):
drift detection (KS-test on input embeddings),
auto-rollback on drift alert,
quantization-aware deployment.

Same gateway, three growth steps. Each phase ships something useful that the next phase builds on. By Y4 end, the codebase is a living example of the rag-as-pattern entry — not a sample, not a tutorial, but running infrastructure.


Studio composition recipes that land this year (3 of 5)

Recipe 1: Train → register → deploy in one flow (lands P21 + P22 + P23):

JupyterHub notebook
→ Ray cluster (distributed training)
→ MLflow (model registry)
→ KServe (online serving)
→ mlship v0 (one-command deploy)

Recipe 2: Personal RAG over your weekly logs (lands P24):

Notebook (chunk 4 years of weekly logs from Iceberg via Spark)
→ embeddings (sentence-transformers)
→ pgvector (Tier 7)
→ llm-gateway v1 (RAG endpoint)
→ notes-rag UI

Recipe 3: AI-assisted on-call (foreshadowed P25; full version Y5 P28):

PagerDuty / Prometheus alert
→ llm-gateway v1.5 (with drift-detection + cost tracking)
→ returns triage hypothesis
→ posted to Slack

These recipes live as runnable code in basecamp/examples/. Each = one demo video, one blog post. Recipe 2 is the cinematic Y4 moment: by P24 you have four full years of weekly logs to embed — a corpus no demo dataset can fake.


Patterns deepened in Year 4

By Y4 end, these reach DEEP (see the ml-and-ai pattern category and the broader agents category that Year 5 builds on):

  • ml-and-ai/model-lifecycle (P20-P21 — train/register/deploy/monitor/retrain as pattern)
  • ml-and-ai/inference-shapes (P21 — online vs batch vs streaming inference)
  • ml-and-ai/feature-store (P22 — Feast on top of Iceberg)
  • ml-and-ai/train-serve-skew (P22 + P25 — drift detection makes this concrete)
  • ml-and-ai/rag-as-pattern (P24 — RAG ingest + retrieve + generate as 3 sub-systems)

That’s only 5, but each is a major pattern. The category was nearly empty going into Y4; Y4 fills it. The agents category stays at OUTLINE through Y4 — those entries (agent-loop, tool-use-as-capability, prompt-as-program) reach DEEP in Year 5 when services/aiops/ operates the platform.


Cloud requirements

Year 4 cloud spend: $20-50
Phase 25 (GPU Infrastructure):
└── Cloud spot GPU (g5.xlarge or T4) for ~3-5 hours of work
└── Strict destroy-at-end-of-session discipline
└── Budget alert at $20
Otherwise: $0 — homelab CPU is enough for all training (small models),
vLLM with quantization on a 64GB host serves Phi-3-mini comfortably.

The point of P25 is GPU-pattern fluency, not GPU mastery. You’ll never have a GPU fleet at home; you’ll demonstrate enough operational depth to interview convincingly. Homelab specs and the optional Month-49 second-node upgrade are documented in homelab/hardware.


Reading order

  1. This index
  2. Phase 20: MLOps Foundations — sets the lifecycle frame everything else hangs off
  3. Phase 21Phase 22Phase 23Phase 24Phase 25 in order — each builds on the last; llm-gateway accumulates across P21, P24, P25
  4. final-exam.md ~2 weeks before end of Phase 25

Year 4 reading spine: “Designing Machine Learning Systems” (Chip Huyen, 2022) Ch. 1-10, plus “AI Engineering” (Chip Huyen, 2024) for the LLM half. Pace 1 chapter / 4 weeks alongside the phase work — same cadence as DDIA across Year 3.


Year 4 graduation

You can:
- Operate the ML platform end-to-end (train → register → serve → monitor → retrain)
- Deploy LLM infrastructure (vLLM + RAG + vector DB) production-shaped
- Manage GPU resources + cost
- Detect + respond to model drift
- Ship OSS in the ML space (llm-gateway in basecamp)
- Run personal RAG over your own writing — the homelab as your second brain
Exit ramp: ML Platform Engineer / AI Infrastructure Engineer
Confidence: ~40 patterns DEEP, ML platform operational, llm-gateway in production,
mlship v0.5 ready for Y5 capstone polish

Year 5: AI Platform + Capstone