ML & AI Infrastructure

Months 37-48. Add the ML/AI layer on top of Year 3’s data layer. By year-end, basecamp serves models, runs notebooks-as-a-service, hosts services/llm-gateway/ (the Y4 flagship), and dogfoods notes-rag over your own 4 years of weekly logs. Exit ramp: ML Platform Engineer / AI Infrastructure Engineer

Where Year 4 sits in the program

Year 4 is the inflection. Up to here, ROOT has been a generic platform program — Linux, networking, Kubernetes, IaC, lakehouse, observability. Year 3 ended with a public basecamp running a credible data platform. Year 4 is where ROOT stops being a platform program and starts being an AI platform program. The brand pillar — “Building an AI Platform in Public” — is finally earned, not just declared.

The Year 3 → Year 4 transition is small in surface area (you’re still operating Kubernetes, still committing to basecamp) but large in identity. Y3’s deliverable was a data engineer’s platform. Y4’s deliverable is an ML platform engineer’s platform: training, registries, serving, features, GPUs, LLMs, RAG. The same nine tiers, with Tiers 5-7 of the basecamp plan lit up for the first time.

This is also the year services/llm-gateway/ grows up. Started as a stub that proxies one backend in P21, it ends Year 4 as a substantial Go service with RAG, streaming, multi-model routing, drift detection, and auto-rollback — moving real (homelab-scale) traffic. Everything else in Year 4 is in service of getting that gateway production-shaped, then connecting it into the rest of basecamp.

What you’ll know at the end of Year 4

MLOps lifecycle as a pattern — train → register → deploy → monitor → retrain, abstractly. MLflow + KServe + Ray are implementations of the model-lifecycle pattern, not the pattern itself.
Inference-shape fluency — online vs batch vs streaming inference, and the latency/cost/freshness trade-offs each shape commits to.
Feature stores in practice — Feast on top of Iceberg; the offline/online dual-store pattern; train/serve skew as a measurable production failure mode, not a textbook concept.
Pipeline orchestration for ML — Kubeflow Pipelines composing the train→register→deploy flow; Katib for hyperparameter search.
LLM infrastructure end-to-end — vLLM serving with quantization, RAG ingestion + retrieval + generation as three separable subsystems, vector DBs (pgvector, Qdrant), fine-tuning workflows.
GPU scheduling discipline — multi-tenant resource sharing, spot economics, the “pay $20 to learn the pattern” mindset that keeps cloud spend honest.
mlship v0/v0.5 — usable enough to deploy your own models and power the train→deploy composition recipe. The polished v2 OSS launch is a Year 5 capstone deliverable, not a Y4 outcome.

You’ll be deployable as ML Platform Engineer / AI Infrastructure Engineer. Year 4 flagship: services/llm-gateway/ inside basecamp.

Phase map

Phase	Title	Approx. weeks	Approx. hours	Pattern depth focus
20	MLOps Foundations	8	95	model-lifecycle (first touch)
21	ML Serving + `mlship` v0	8	100	inference-shapes, model-lifecycle (deepens)
22	Feature Store: Feast	7	80	feature-store, train-serve-skew
23	Kubeflow Operations	8	100	(composition recipe lands here)
24	LLM Infrastructure + RAG + `llm-gateway` v1	8	100	rag-as-pattern
25	GPU Infrastructure (capstone)	8	95	all Y4 patterns reach DEEP
	Year 4 Final Exam	2	24	—
Total		~49 weeks	~594 hrs	~5 patterns deepened

12 hrs/week × 52 weeks = 624 hrs. Year 4 fits with ~30 hrs slack. Year 4 deepens fewer patterns than Y3 (~5 vs ~12) but the patterns are denser — model-lifecycle alone touches everything from data validation to drift detection to canary rollouts. See the Master Plan’s pattern depth ladder for what STUB → OUTLINE → DEEP means in practice.

What ships during Year 4

Project	Phase	Status by Y4 end
`services/llm-gateway/` (inside basecamp)	P21 → P24 → P25	Year 4 flagship. v1 with RAG + streaming + multi-model + drift detection + auto-rollback. Public via basecamp’s repo.
`mlship`	P21 (v0), P25 (v0.5)	Private. Usable enough for the train→deploy composition recipe. Polished v2 launch is Y5 P30 capstone.
`notes-rag` personal service	P24	Private. Vector-indexes 4 years of your weekly logs; queryable via llm-gateway. The cinematic Y4 moment.
`terralabs`	continuous	Adds GPU node group modules + Kubeflow bootstrap
`platform-ctl`	continuous	Adds `model deploy / list / rollback` (wraps mlship under the hood)

llm-gateway is the year’s serious code — substantial Go service, real (homelab-scale) traffic by Y4 end. It’s the artifact that proves you can build production AI infrastructure, not just operate someone else’s.

Why `llm-gateway` is staged across 3 phases

Building a production-grade LLM gateway in 8 weeks is unrealistic. Year 4 stages it deliberately:

P21 (ML Serving) ─→ llm-gateway v0:
                    minimal Go service, /v1/chat/completions endpoint,
                    routes to ONE backend (vLLM stub or OpenAI).
                    REST only, no streaming, no RAG.

P24 (LLM Infra) ─→ llm-gateway v1:
                    RAG pipeline (vector search → context injection),
                    streaming SSE, multi-model routing (small-vs-large),
                    per-user rate limit, cost+latency tracking,
                    OpenAI-compatible API surface.

P25 (Capstone)  ─→ llm-gateway v1.5 (production-shaped):
                    drift detection (KS-test on input embeddings),
                    auto-rollback on drift alert,
                    quantization-aware deployment.

Same gateway, three growth steps. Each phase ships something useful that the next phase builds on. By Y4 end, the codebase is a living example of the rag-as-pattern entry — not a sample, not a tutorial, but running infrastructure.

Studio composition recipes that land this year (3 of 5)

Recipe 1: Train → register → deploy in one flow (lands P21 + P22 + P23):

JupyterHub notebook
  → Ray cluster (distributed training)
  → MLflow (model registry)
  → KServe (online serving)
  → mlship v0 (one-command deploy)

Recipe 2: Personal RAG over your weekly logs (lands P24):

Notebook (chunk 4 years of weekly logs from Iceberg via Spark)
  → embeddings (sentence-transformers)
  → pgvector (Tier 7)
  → llm-gateway v1 (RAG endpoint)
  → notes-rag UI

Recipe 3: AI-assisted on-call (foreshadowed P25; full version Y5 P28):

PagerDuty / Prometheus alert
  → llm-gateway v1.5 (with drift-detection + cost tracking)
  → returns triage hypothesis
  → posted to Slack

These recipes live as runnable code in basecamp/examples/. Each = one demo video, one blog post. Recipe 2 is the cinematic Y4 moment: by P24 you have four full years of weekly logs to embed — a corpus no demo dataset can fake.

Patterns deepened in Year 4

By Y4 end, these reach DEEP (see the ml-and-ai pattern category and the broader agents category that Year 5 builds on):

ml-and-ai/model-lifecycle (P20-P21 — train/register/deploy/monitor/retrain as pattern)
ml-and-ai/inference-shapes (P21 — online vs batch vs streaming inference)
ml-and-ai/feature-store (P22 — Feast on top of Iceberg)
ml-and-ai/train-serve-skew (P22 + P25 — drift detection makes this concrete)
ml-and-ai/rag-as-pattern (P24 — RAG ingest + retrieve + generate as 3 sub-systems)

That’s only 5, but each is a major pattern. The category was nearly empty going into Y4; Y4 fills it. The agents category stays at OUTLINE through Y4 — those entries (agent-loop, tool-use-as-capability, prompt-as-program) reach DEEP in Year 5 when services/aiops/ operates the platform.

Cloud requirements

Year 4 cloud spend: $20-50

  Phase 25 (GPU Infrastructure):
    └── Cloud spot GPU (g5.xlarge or T4) for ~3-5 hours of work
    └── Strict destroy-at-end-of-session discipline
    └── Budget alert at $20

  Otherwise: $0 — homelab CPU is enough for all training (small models),
             vLLM with quantization on a 64GB host serves Phi-3-mini comfortably.

The point of P25 is GPU-pattern fluency, not GPU mastery. You’ll never have a GPU fleet at home; you’ll demonstrate enough operational depth to interview convincingly. Homelab specs and the optional Month-49 second-node upgrade are documented in homelab/hardware.

Reading order

This index
Phase 20: MLOps Foundations — sets the lifecycle frame everything else hangs off
Phase 21 → Phase 22 → Phase 23 → Phase 24 → Phase 25 in order — each builds on the last; llm-gateway accumulates across P21, P24, P25
final-exam.md ~2 weeks before end of Phase 25

Year 4 reading spine: “Designing Machine Learning Systems” (Chip Huyen, 2022) Ch. 1-10, plus “AI Engineering” (Chip Huyen, 2024) for the LLM half. Pace 1 chapter / 4 weeks alongside the phase work — same cadence as DDIA across Year 3.

Year 4 graduation

You can:
- Operate the ML platform end-to-end (train → register → serve → monitor → retrain)
- Deploy LLM infrastructure (vLLM + RAG + vector DB) production-shaped
- Manage GPU resources + cost
- Detect + respond to model drift
- Ship OSS in the ML space (llm-gateway in basecamp)
- Run personal RAG over your own writing — the homelab as your second brain

Exit ramp: ML Platform Engineer / AI Infrastructure Engineer
Confidence: ~40 patterns DEEP, ML platform operational, llm-gateway in production,
            mlship v0.5 ready for Y5 capstone polish

→ Year 5: AI Platform + Capstone